Wrangling Computing Environments: Using Docker for Research from Coursera

Goal of this course:

Equip learners with basics skills and confidence to utilize containers within the context of scientific software analyses.

Expectations:

This course is not meant to teach learners how to create complex containers, but instead introduce learners to basic fundamentals of continuous integration and continuous deployment (CI/CD). This course focuses on containers (Docker or Podman) and will not cover any other (perfectly fine) tools for CI/CD.

Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. By recognizing that biological data analysis code is a form of software development, we can try to adapt good development practices in scientific analyses and software contexts.

Scientific software projects may include (but aren’t limited to):

- Software built as tools to be utilized by others to analyze biologically derived data

- Code that is built primarily for analyzing one project’s data

- Code that is built as a workflow for a series of steps and analyses that might be reused among collaborators or within a lab

- Any scripts and code that are built to handle data in a research setting

- Any scripts and code a researcher might interact with

Containers are one tool among many for creating reproducible analyses. A container is a lightweight, portable, and isolated environment that encapsulates an application and its dependencies, enabling it to run consistently across different computing environments. Many individuals performing analyses on cancer data may not have formal training in software development and may be unfamiliar with the idea of containers.

Unique Features of This Course

- Hands-on exercises exploring real uses of containers for scientific research and software

- Activities to demonstrate the common pitfalls using containers

- Information about how to use two of the most common tools for containers: Docker and Podman

Key Words

Reproducibility, Containers, Podman, Docker, Scientific Software Development, Biomedical Research

Intended Audience/Required Knowledge

- The course is intended for researchers and research staff who might be interested in learning about using containers to make their research or scientific software more reproducible.

- Some familiarity with biomedical or health-related research, as well as some familiarity with programming (including bash and command line) is required.

Learning Objectives

- Understand that computing environments are moving targets

- Use containers to share a controlled computing environment

- Pull and use a Docker image from online

- Modify a Docker image

- Build a Docker image from scratch

- Troubleshoot the most common Docker related errors

Accessibility

We are committed to making our content accessible and available to all. We welcome any feedback you might have at https://forms.gle/SzuZjct4ZQyt3Cos7. Questions related to accessibility accommodations should be directed to https://studentserviceportal.force.com/s/.

What's inside

Syllabus

Introduction

In this first module, we will cover how this course will work and the motivation for using containers for reproducible research

Getting started with Containers

Career center

Learners who complete Wrangling Computing Environments: Using Docker for Research will develop knowledge and skills that may be useful to these careers:

Bioinformatics Scientist

A Bioinformatics Scientist applies computational methods to understand biological data, often developing and utilizing complex analysis pipelines. This course helps build a foundation for a Bioinformatics Scientist by directly addressing the critical need for reproducible research. With hands-on exercises using Docker and Podman, it equips learners with essential skills to manage computing environments, ensuring analyses are consistent and shareable. Understanding how to pull, modify, build, and troubleshoot containers will be particularly helpful in deploying scientific software and collaborating effectively within research teams. The course is especially relevant given its focus on adapting good development practices to scientific analyses, a key aspect of modern bioinformatics. This role typically requires an advanced degree.

See salaries and explore the career path for Bioinformatics Scientist

Research Software Engineer

As a Research Software Engineer, you bridge the gap between scientific research and robust software development, creating tools and platforms that enable cutting-edge discoveries. This course is highly relevant for a Research Software Engineer because it focuses on adapting good development practices to scientific software contexts, emphasizing reproducibility. The practical experience gained in understanding and using containers like Docker and Podman provides a strong foundation for managing complex software dependencies and ensuring consistent execution environments. Learning to troubleshoot container-related errors, modify images, and implement CI/CD fundamentals directly supports the development of reliable and shareable scientific software. This role often requires an advanced degree or extensive experience.

See salaries and explore the career path for Research Software Engineer

Computational Biologist

A Computational Biologist develops and applies computational models and algorithms to solve biological problems, often requiring advanced programming and data analysis skills. This course helps a Computational Biologist by equipping you with the fundamental skills to utilize containers for creating reproducible data analyses, which is crucial for validating and sharing your models. The hands-on work with Docker and Podman, including learning to use, modify, and build images, provides a portable solution for encapsulating your computational environments. Understanding container best practices and troubleshooting common issues will enhance your ability to maintain consistency across different computing setups, a key aspect of scientific software development in this field. This role typically requires a Master's or PhD degree.

See salaries and explore the career path for Computational Biologist

Bioinformatics Software Developer

A Bioinformatics Software Developer designs, builds, and maintains specialized software tools and pipelines for analyzing biological and genomic data, contributing significantly to scientific research. This course is exceptionally relevant for a Bioinformatics Software Developer because it emphasizes adapting good development practices to scientific software contexts, with a strong focus on containerization. The hands-on experience with Docker and Podman, including learning to build and modify images, use them as a development space, and troubleshoot issues, directly translates to creating robust, reproducible, and shareable bioinformatics tools. These skills are crucial for ensuring the reliability and consistency of complex analytical workflows. This role may require an advanced degree, or strong development experience.

See salaries and explore the career path for Bioinformatics Software Developer

Genomics Data Scientist

A Genomics Data Scientist specializes in analyzing large-scale genomic data, applying computational and statistical methods to uncover biological insights and contribute to precision medicine. For a Genomics Data Scientist, mastering reproducible data analyses is paramount due to the complexity and volume of genomic information. This course directly addresses this by providing hands-on experience with Docker and Podman, allowing you to create stable and shareable computing environments for your intricate pipelines. The ability to pull, modify, and build Docker images, alongside troubleshooting skills, will be particularly helpful in managing diverse bioinformatics tools and ensuring consistent results across research projects. This role usually requires a Master's or PhD.

See salaries and explore the career path for Genomics Data Scientist

Biomedical Data Analyst

A Biomedical Data Analyst extracts insights from complex health-related datasets, such as cancer data, playing a crucial role in scientific discovery and clinical translation. This course is particularly valuable for a Biomedical Data Analyst as it directly addresses the challenge of ensuring reproducible data analyses within research settings. By gaining hands-on experience with Docker and Podman, you will develop the capacity to create and share controlled computing environments, ensuring that your analytical pipelines yield consistent results regardless of the platform. The focus on troubleshooting containers and understanding common pitfalls will also be very helpful in maintaining robust and reliable data analysis workflows. This role often requires a Master's degree.

See salaries and explore the career path for Biomedical Data Analyst

Scientific Programmer

A Scientific Programmer develops specialized software and scripts to support scientific research, translating complex algorithms into functional and efficient code. For a Scientific Programmer, this course helps build a foundation in adapting good development practices to scientific analyses and software contexts, particularly through the use of containers. The hands-on exercises with Docker and Podman, focusing on how to use, modify, and troubleshoot containers, directly empower you to create isolated and reproducible environments for your code. This ensures that your software runs consistently and is easily shareable among collaborators, increasing efficiency and reliability in research settings where scripts and code are built to handle data. This role may require an advanced degree depending on the specialization.

See salaries and explore the career path for Scientific Programmer

Data Scientist

A Data Scientist extracts insights from vast datasets, building models and performing analyses to inform decisions across various industries and research fields. This course helps a Data Scientist by addressing the fundamental challenge of ensuring reproducible data analyses, which is vital for maintaining the integrity and trustworthiness of your work. The practical skills gained in utilizing Docker and Podman to manage computing environments mean you can confidently share your analytical setups and models, knowing they will perform consistently elsewhere. Learning about CI/CD fundamentals and troubleshooting container errors will be particularly helpful in streamlining your workflow and deploying robust analytical solutions. This role often requires a Master's degree, and sometimes a PhD.

See salaries and explore the career path for Data Scientist

DevOps Engineer

A DevOps Engineer focuses on optimizing the software development lifecycle, emphasizing continuous integration, continuous delivery, and infrastructure as code. This course may be useful for a DevOps Engineer as it introduces learners to basic fundamentals of continuous integration and continuous deployment, with a strong focus on containers. The hands-on experience with Docker and Podman, including learning to use, modify, and troubleshoot these environments, provides a foundational understanding of key containerization technologies. While not a comprehensive DevOps curriculum, the course offers valuable skills in managing reproducible computing environments, which are essential for creating efficient and reliable deployment pipelines. This role typically does not require an advanced degree.

See salaries and explore the career path for DevOps Engineer

Systems Administrator Research Computing

A Systems Administrator Research Computing manages and maintains the computing infrastructure specifically tailored for scientific research, ensuring optimal performance and support for complex computational workflows. This course is highly relevant for a Systems Administrator Research Computing, as it provides crucial skills in utilizing containers to manage diverse scientific software. The hands-on experience with Docker and Podman, from pulling and modifying images to troubleshooting common errors, directly equips you to provide stable, isolated, and reproducible computing environments for researchers. This capability significantly streamlines software deployment and enhances collaborative efforts within a research setting. This role typically does not require an advanced degree, but deep technical knowledge is essential.

See salaries and explore the career path for Systems Administrator Research Computing

Cloud Engineer

As a Cloud Engineer, you design, implement, and manage cloud infrastructure and services, often leveraging virtualization and containerization for scalable applications. This course may be useful for a Cloud Engineer because it provides hands-on experience with Docker and Podman, which are foundational technologies in modern cloud environments. Learning to pull, modify, and build container images, as well as troubleshoot common errors, directly supports deploying and managing containerized applications on cloud platforms. While not covering the full scope of cloud infrastructure, the skills in creating reproducible and isolated computing environments are directly transferable to building robust and efficient cloud solutions. This role typically does not require an advanced degree.

See salaries and explore the career path for Cloud Engineer

Data Engineer

A Data Engineer designs, builds, and maintains robust data pipelines and infrastructure, ensuring data is accessible, reliable, and optimized for analysis and machine learning. This course may be useful for a Data Engineer seeking to enhance reproducibility within data processing workflows. The practical skills gained in creating and managing containerized environments using Docker and Podman are directly applicable to isolating dependencies and ensuring consistent execution of data transformations and analytical jobs. Learning about basic CI/CD fundamentals and troubleshooting containers will be helpful in building more reliable and portable data solutions, especially when dealing with complex scientific data. This role typically does not require an advanced degree.

See salaries and explore the career path for Data Engineer

Technical Support Specialist Scientific Software

A Technical Support Specialist Scientific Software provides crucial assistance to researchers, helping them resolve issues with specialized scientific applications and computing environments. This course may be useful for a Technical Support Specialist Scientific Software by equipping you with a foundational understanding of containers, which are often at the core of complex scientific software deployments. The hands-on training in troubleshooting the most common Docker-related errors, recognizing pitfalls, and understanding how to share and modify containerized environments directly enhances your ability to diagnose and resolve user issues related to software reproducibility and environment configuration. This role typically does not require an advanced degree, but domain-specific knowledge is important.

See salaries and explore the career path for Technical Support Specialist Scientific Software

Machine Learning Engineer

A Machine Learning Engineer builds and deploys machine learning models, ensuring they are robust, scalable, and operate effectively in production environments. This course may be useful for a Machine Learning Engineer, particularly in research settings, by helping to address the critical need for reproducible experimental environments. The practical skills learned with Docker and Podman, including creating, modifying, and sharing controlled computing environments, are directly applicable to packaging and deploying machine learning models and their complex dependencies. Understanding troubleshooting for container issues will be helpful in maintaining consistent model performance and facilitating collaboration. This role often requires a Master's degree.

See salaries and explore the career path for Machine Learning Engineer

Quantitative Analyst

A Quantitative Analyst uses mathematical, statistical, and computational methods to develop models and analyze data, often in complex domains like finance or scientific research. This course may be useful for a Quantitative Analyst, particularly one involved in research, by providing essential skills for ensuring the reproducibility of computational models and analyses. The hands-on experience with Docker and Podman enables you to create isolated and consistent environments for your simulations and data processing, which is crucial for validation and collaboration. Understanding how to manage and troubleshoot containerized setups will be helpful in maintaining the integrity and reliability of your quantitative workflows. This role typically requires an advanced degree, such as a Master's or PhD.

See salaries and explore the career path for Quantitative Analyst