Introduction to Reproducibility in Cancer Informatics from Coursera

The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research and have not had training in reproducibility tools and methods.

This course is written for individuals who:

- Have some familiarity with R or Python - have written some scripts.

- Have not had formal training in computational methods.

- Have limited or no familiar with GitHub, Docker, or package management tools.

Motivation

Data analyses are generally not reproducible without direct contact with the original researchers and a substantial amount of time and effort (BeaulieuJones et al, 2017). Reproducibility in cancer informatics (as with other fields) is still not monitored or incentivized despite that it is fundamental to the scientific method. Despite the lack of incentive, many researchers strive for reproducibility in their own work but often lack the skills or training to do so effectively.

Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. Reproducible analyses are more likely to be understood, applied, and replicated by others. This helps expedite the scientific process by helping researchers avoid false positive dead ends. Open source clarity in reproducible methods also saves researchers' time so they don't have to reinvent the proverbial wheel for methods that everyone in the field is already performing.

Curriculum

This course introduces the concepts of reproducibility and replicability in the context of cancer informatics. It uses hands-on exercises to demonstrate in practical terms how to increase the reproducibility of data analyses. The course also introduces tools relevant to reproducibility including analysis notebooks, package managers, git and GitHub.

The course includes hands-on exercises for how to apply reproducible code concepts to their code. Individuals who take this course are encouraged to complete these activities as they follow along with the course material to help increase the reproducibility of their analyses.

**Goal of this course:**

Equip learners with reproducibility skills they can apply to their existing analyses scripts and projects. This course opts for an "ease into it" approach. We attempt to give learners doable, incremental steps to increase the reproducibility of their analyses.

**What is not the goal**

This course is meant to introduce learners to the reproducibility tools, but _it does not necessarily represent the absolute end-all, be-all best practices for the use of these tools_. In other words, this course gives a starting point with these tools, but not an ending point. The advanced version of this course is the next step toward incrementally "better practices".

How to use the course

This course is designed with busy professional learners in mind -- who may have to pick up and put down the course when their schedule allows.

Each exercise has the option for you to continue along with the example files as you've been editing them in each chapter, OR you can download fresh chapter files that have been edited in accordance with the relative part of the course. This way, if you decide to skip a chapter or find that your own files you've been working on no longer make sense, you have a fresh starting point at each exercise.

What's inside

Syllabus

Introduction to this Course

In this first section, we will discuss the goals of this course and define what we mean by reproducibility.

Organizing your project

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Teaches methods and skills for enhancing the reproducibility of data analyses in biomedical sciences and research, making findings more robust and reliable

Suitable for individuals with basic familiarity with R or Python but limited experience with computational methods and reproducibility tools, making it an accessible starting point

Guided by Candace Savonen, MS, an expert in biomedical sciences, ensuring the course content is up-to-date and aligns with industry best practices

Provides practical, hands-on exercises to reinforce concepts and empower learners to apply reproducibility principles to their own research

Covers essential reproducibility tools and techniques, including analysis notebooks, package managers, git, and GitHub, providing a solid foundation for enhancing reproducibility

Focuses on an "ease into it" approach, breaking down concepts into manageable steps to facilitate gradual adoption of reproducibility practices

Reviews summary

Practical reproducibility for cancer informatics

According to students, this course offers a highly practical and foundational introduction to reproducibility in cancer informatics. Learners commend its clear lectures and hands-on exercises, particularly for Git and GitHub, analysis notebooks, and package management. Many found the “ease into it” modular design ideal for busy professionals, allowing seamless integration of concepts into existing R/Python scripts and research workflows. While providing a solid starting point, some learners noted the course is an introduction, suggesting those with prior experience may find some content basic or wish for more advanced troubleshooting and deeper tool mastery, especially with Docker.

Provides a solid introduction to reproducibility concepts.

"This course was an excellent introduction to reproducibility in cancer informatics."

"I gained a solid foundational understanding. The coverage of Git/GitHub was particularly useful, and the project organization tips were great."

"I found it very helpful for understanding the importance of reproducibility. The topics like durable code and code review are crucial."

Designed with a flexible, modular structure for busy learners.

"The modular design made it easy to pick up and put down. A must-take for anyone in the field!"

"As a busy researcher, the 'ease into it' approach was perfect. I could seamlessly integrate the practices into my work."

"This course is designed with busy professional learners in mind and its structure is excellent for self-paced learning."

Focuses on hands-on application of reproducibility tools.

"The lectures were clear and the hands-on exercises with GitHub and Docker were incredibly helpful. I especially appreciated the focus on practical application."

"The hands-on coding and projects are the strongest part of the course for me, helping me apply these concepts to my existing R scripts."

"I particularly liked the discussion on structuring projects and using notebooks for transparency. It's a pragmatic guide."

Some tools, like Docker, could use more in-depth coverage.

"I found some of the explanations for specific tools, like Docker, a bit rushed... I often had to look up external resources."

"I felt that while it introduced Docker, it didn't really show 'best practices' for its full implementation in a large project."

"The course could benefit from more detailed troubleshooting for environmental setup issues, which is often the biggest hurdle for new users with tools like Docker."

Best for beginners, may feel basic for advanced users.

"While it says 'introduction,' some parts, especially with Docker, might be a bit challenging for absolute beginners without any prior command-line exposure."

"It's an 'introduction,' and it certainly feels like one. If I already have some experience with Git or package managers, much of it might feel repetitive."

"This course gives a starting point with these tools, but not an ending point. I was hoping for more advanced strategies or case studies."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Introduction to Reproducibility in Cancer Informatics with these activities:

Review R in preparation for course

Show steps

Review and practice basic R concepts to refresh your memory and enhance your understanding.

Browse courses on R Programming

Show steps

Go over R syntax and data structures.
Practice writing simple R code to manipulate and analyze data.
Review R packages commonly used in data analysis.

Review R Programming Basics

Show steps

Provides stronger foundational skills in R, making concepts introduced in the early portions of this course easier to grasp.

Browse courses on R Programming

Show steps

Practice writing and executing simple R scripts
Review basic R data types and structures

Join a peer study group on reproducibility

Show steps

Engage with peers in a study group dedicated to reproducibility, fostering collaboration and enhancing your understanding through shared experiences.

Show steps

Identify and join a peer study group focused on reproducibility.
Actively participate in discussions and share insights with group members.
Collaborate on projects or exercises related to reproducibility.

Six other activities

Expand to see all activities and additional details

Show all nine activities

Create a reproducible data analysis project

Show steps

Immediately puts into practice concepts introduced by the course, such as project organization and version control.

Show steps

Define a research question and gather relevant data
Create a project directory and set up version control
Write reproducible data analysis scripts
Document your analysis and share your project

Follow tutorials on GitHub and Docker

Show steps

Enhance your understanding of GitHub and Docker through guided tutorials, enabling you to manage code and create reproducible environments.

Browse courses on GitHub

Show steps

Complete tutorials on GitHub basics.
Practice using GitHub for version control and collaboration.
Explore Docker tutorials to learn about containerization.

Explore tutorials on code review and documentation

Show steps

Expand your knowledge of code review and documentation through guided tutorials, enhancing your ability to produce high-quality, well-documented code.

Browse courses on Code Review

Show steps

Follow tutorials on code review techniques.
Practice reviewing code for clarity, efficiency, and reproducibility.
Review tutorials on effective documentation practices.

Practice writing reproducible code

Show steps

Strengthen your skills in writing reproducible code through repetitive exercises, improving the reliability and transparency of your analyses.

Browse courses on Data Analysis

Show steps

Review principles of reproducible code.
Complete coding exercises that focus on writing reproducible code.
Practice using tools and techniques for debugging and testing reproducible code.

Write a blog post on best practices for reproducible analyses

Show steps

Solidify your understanding and share your knowledge by creating a blog post that summarizes best practices for conducting reproducible analyses, benefiting both yourself and others.

Browse courses on Reproducibility

Show steps

Research and gather information on best practices for reproducible analyses.
Organize your thoughts and outline the content of your blog post.
Write the blog post clearly and concisely.

Contribute to an open-source project related to reproducibility

Show steps

Immerse yourself in the principles of reproducibility by contributing to an open-source project dedicated to developing tools or resources for reproducible analyses.

Browse courses on Open Source

Show steps

Identify an open-source project focused on reproducibility.
Review the project's documentation and contribution guidelines.
Make a contribution to the project, such as reporting a bug, suggesting a feature, or writing code.

Career center

Learners who complete Introduction to Reproducibility in Cancer Informatics will develop knowledge and skills that may be useful to these careers:

Bioinformatician

As a Bioinformatician, you will use your skills in computer programming and biology to analyze biological data. This course can help you develop the skills needed to ensure the reproducibility of your analyses, which is critical for ensuring the accuracy and reliability of biological research.

See salaries and explore the career path for Bioinformatician

Biostatistician

As a Biostatistician, you will use your training in computer programming and statistics to manage and analyze medical data. This course can help you develop the skills needed to ensure the reproducibility of your analyses, which is critical for ensuring the accuracy and reliability of medical research.

See salaries and explore the career path for Biostatistician

Data Engineer

As a Data Engineer, you will use your skills in data management and engineering to build and maintain data pipelines. This course can help you develop the skills needed to create reproducible data pipelines, which are essential for ensuring the accuracy and reliability of your data.

See salaries and explore the career path for Data Engineer

Data Scientist

As a Data Scientist, you will use your programming skills and knowledge of data analysis to solve complex problems. This course can help you build a foundation in reproducible data analysis techniques, which are essential for working with large datasets and ensuring the accuracy and reliability of your results.

See salaries and explore the career path for Data Scientist

Research Analyst

As a Research Analyst, you will use your skills in data analysis and research to provide insights to businesses and organizations. This course can help you develop the skills needed to create reproducible analyses, which are essential for ensuring the accuracy and reliability of your findings.

See salaries and explore the career path for Research Analyst

Data Analyst

As a Data Analyst, you will use your skills in data analysis to extract insights from data. This course can help you build a foundation in reproducible data analysis techniques, which are essential for working with large datasets and ensuring the accuracy and reliability of your results.

See salaries and explore the career path for Data Analyst

Statistician

As a Statistician, you will use your training in statistics to design, analyze, and interpret data. This course can help you develop the skills needed to create reproducible analyses, which are essential for ensuring the accuracy and reliability of your findings.

See salaries and explore the career path for Statistician

Financial Analyst

As a Financial Analyst, you will use your skills in mathematics and finance to analyze financial data. This course can help you build a foundation in reproducible data analysis techniques, which are essential for working with large datasets and ensuring the accuracy and reliability of your results.

See salaries and explore the career path for Financial Analyst

Quantitative Analyst

As a Quantitative Analyst, you will use your skills in mathematics and statistics to develop and implement quantitative models. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your models.

See salaries and explore the career path for Quantitative Analyst

Computer Scientist

As a Computer Scientist, you will use your skills in computer programming and algorithms to solve complex problems. This course can help you build a foundation in reproducible programming techniques, which are essential for ensuring the accuracy and reliability of your software.

See salaries and explore the career path for Computer Scientist

Operations Research Analyst

As an Operations Research Analyst, you will use your skills in mathematics and optimization to solve complex problems. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your solutions.

See salaries and explore the career path for Operations Research Analyst

Actuary

As an Actuary, you will use your skills in mathematics and statistics to assess risk and uncertainty. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your assessments.

See salaries and explore the career path for Actuary

Machine Learning Engineer

As a Machine Learning Engineer, you will use your skills in machine learning to build and deploy machine learning models. This course can help you develop the skills needed to create reproducible machine learning models, which are essential for ensuring the accuracy and reliability of your models.

See salaries and explore the career path for Machine Learning Engineer

Risk Analyst

As a Risk Analyst, you will use your skills in mathematics and statistics to assess risk and uncertainty. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your assessments.

See salaries and explore the career path for Risk Analyst

Software Engineer

As a Software Engineer, you will use your programming skills to design, develop, and test software applications. This course can help you develop the skills needed to create reproducible code, which is essential for ensuring the accuracy and reliability of your software.

See salaries and explore the career path for Software Engineer