We may earn an affiliate commission when you visit our partners.
Course image
Course image
Coursera logo

Introduction to Reproducibility in Cancer Informatics

Candace Savonen, MS

The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research and have not had training in reproducibility tools and methods.

This course is written for individuals who:

Read more

The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research and have not had training in reproducibility tools and methods.

This course is written for individuals who:

- Have some familiarity with R or Python - have written some scripts.

- Have not had formal training in computational methods.

- Have limited or no familiar with GitHub, Docker, or package management tools.

Motivation

Data analyses are generally not reproducible without direct contact with the original researchers and a substantial amount of time and effort (BeaulieuJones et al, 2017). Reproducibility in cancer informatics (as with other fields) is still not monitored or incentivized despite that it is fundamental to the scientific method. Despite the lack of incentive, many researchers strive for reproducibility in their own work but often lack the skills or training to do so effectively.

Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. Reproducible analyses are more likely to be understood, applied, and replicated by others. This helps expedite the scientific process by helping researchers avoid false positive dead ends. Open source clarity in reproducible methods also saves researchers' time so they don't have to reinvent the proverbial wheel for methods that everyone in the field is already performing.

Curriculum

This course introduces the concepts of reproducibility and replicability in the context of cancer informatics. It uses hands-on exercises to demonstrate in practical terms how to increase the reproducibility of data analyses. The course also introduces tools relevant to reproducibility including analysis notebooks, package managers, git and GitHub.

The course includes hands-on exercises for how to apply reproducible code concepts to their code. Individuals who take this course are encouraged to complete these activities as they follow along with the course material to help increase the reproducibility of their analyses.

**Goal of this course:**

Equip learners with reproducibility skills they can apply to their existing analyses scripts and projects. This course opts for an "ease into it" approach. We attempt to give learners doable, incremental steps to increase the reproducibility of their analyses.

**What is not the goal**

This course is meant to introduce learners to the reproducibility tools, but _it does not necessarily represent the absolute end-all, be-all best practices for the use of these tools_. In other words, this course gives a starting point with these tools, but not an ending point. The advanced version of this course is the next step toward incrementally "better practices".

How to use the course

This course is designed with busy professional learners in mind -- who may have to pick up and put down the course when their schedule allows.

Each exercise has the option for you to continue along with the example files as you've been editing them in each chapter, OR you can download fresh chapter files that have been edited in accordance with the relative part of the course. This way, if you decide to skip a chapter or find that your own files you've been working on no longer make sense, you have a fresh starting point at each exercise.

Enroll now

What's inside

Syllabus

Introduction to this Course
In this first section, we will discuss the goals of this course and define what we mean by reproducibility.
Organizing your project
Read more
In this section we discuss motivation and strategies for project organization.
Using notebooks
In this section we discuss the motivation for using notebooks and integrated development environments to enhance the reproducibility of your project.
Making your project open source with GitHub
In this section we will describe how GitHub can make a project open source and encourage reproducibility.
Managing package versions
In this section we discuss two strategies for managing package versions in a project.
Writing durable code
In this section we discuss aspects of code that can make it more durable to enhance the reproducibility of a project.
Code review
This section discusses the importance of code review for creating reproducible analyses.
Documenting analysis
This section discusses how to document analyses to enhance their reproducibility.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches methods and skills for enhancing the reproducibility of data analyses in biomedical sciences and research, making findings more robust and reliable
Suitable for individuals with basic familiarity with R or Python but limited experience with computational methods and reproducibility tools, making it an accessible starting point
Guided by Candace Savonen, MS, an expert in biomedical sciences, ensuring the course content is up-to-date and aligns with industry best practices
Provides practical, hands-on exercises to reinforce concepts and empower learners to apply reproducibility principles to their own research
Covers essential reproducibility tools and techniques, including analysis notebooks, package managers, git, and GitHub, providing a solid foundation for enhancing reproducibility
Focuses on an "ease into it" approach, breaking down concepts into manageable steps to facilitate gradual adoption of reproducibility practices

Save this course

Save Introduction to Reproducibility in Cancer Informatics to your list so you can find it easily later:
Save

Activities

Coming soon We're preparing activities for Introduction to Reproducibility in Cancer Informatics. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Introduction to Reproducibility in Cancer Informatics will develop knowledge and skills that may be useful to these careers:
Biostatistician
As a Biostatistician, you will use your training in computer programming and statistics to manage and analyze medical data. This course can help you develop the skills needed to ensure the reproducibility of your analyses, which is critical for ensuring the accuracy and reliability of medical research.
Bioinformatician
As a Bioinformatician, you will use your skills in computer programming and biology to analyze biological data. This course can help you develop the skills needed to ensure the reproducibility of your analyses, which is critical for ensuring the accuracy and reliability of biological research.
Data Scientist
As a Data Scientist, you will use your programming skills and knowledge of data analysis to solve complex problems. This course can help you build a foundation in reproducible data analysis techniques, which are essential for working with large datasets and ensuring the accuracy and reliability of your results.
Statistician
As a Statistician, you will use your training in statistics to design, analyze, and interpret data. This course can help you develop the skills needed to create reproducible analyses, which are essential for ensuring the accuracy and reliability of your findings.
Data Analyst
As a Data Analyst, you will use your skills in data analysis to extract insights from data. This course can help you build a foundation in reproducible data analysis techniques, which are essential for working with large datasets and ensuring the accuracy and reliability of your results.
Financial Analyst
As a Financial Analyst, you will use your skills in mathematics and finance to analyze financial data. This course can help you build a foundation in reproducible data analysis techniques, which are essential for working with large datasets and ensuring the accuracy and reliability of your results.
Data Engineer
As a Data Engineer, you will use your skills in data management and engineering to build and maintain data pipelines. This course can help you develop the skills needed to create reproducible data pipelines, which are essential for ensuring the accuracy and reliability of your data.
Research Analyst
As a Research Analyst, you will use your skills in data analysis and research to provide insights to businesses and organizations. This course can help you develop the skills needed to create reproducible analyses, which are essential for ensuring the accuracy and reliability of your findings.
Machine Learning Engineer
As a Machine Learning Engineer, you will use your skills in machine learning to build and deploy machine learning models. This course can help you develop the skills needed to create reproducible machine learning models, which are essential for ensuring the accuracy and reliability of your models.
Quantitative Analyst
As a Quantitative Analyst, you will use your skills in mathematics and statistics to develop and implement quantitative models. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your models.
Actuary
As an Actuary, you will use your skills in mathematics and statistics to assess risk and uncertainty. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your assessments.
Software Engineer
As a Software Engineer, you will use your programming skills to design, develop, and test software applications. This course can help you develop the skills needed to create reproducible code, which is essential for ensuring the accuracy and reliability of your software.
Computer Scientist
As a Computer Scientist, you will use your skills in computer programming and algorithms to solve complex problems. This course can help you build a foundation in reproducible programming techniques, which are essential for ensuring the accuracy and reliability of your software.
Operations Research Analyst
As an Operations Research Analyst, you will use your skills in mathematics and optimization to solve complex problems. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your solutions.
Risk Analyst
As a Risk Analyst, you will use your skills in mathematics and statistics to assess risk and uncertainty. This course can help you develop the skills needed to create reproducible models, which are essential for ensuring the accuracy and reliability of your assessments.

Reading list

We've selected nine books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Introduction to Reproducibility in Cancer Informatics.
Provides a practical guide to software carpentry, which set of skills that are essential for reproducible research. It covers topics such as version control, data management, and scripting.
Comprehensive introduction to R, a statistical programming language that is widely used in data science. It covers topics such as data manipulation, visualization, and modeling.
Comprehensive introduction to Python, a programming language that is widely used in data science. It covers topics such as data manipulation, visualization, and modeling.
Provides a comprehensive introduction to Git, a version control system that is widely used in software development. It covers topics such as versioning, branching, and merging.
Provides a comprehensive introduction to Bayesian statistics, a statistical approach that is becoming increasingly popular in scientific research. It covers topics such as probability, modeling, and inference.
Provides a comprehensive introduction to deep learning with R. It covers topics such as neural networks, convolutional neural networks, and recurrent neural networks.
Provides a practical introduction to machine learning for non-programmers. It covers topics such as supervised learning, unsupervised learning, and model evaluation.
Provides a comprehensive introduction to deep learning. It covers topics such as neural networks, convolutional neural networks, and recurrent neural networks.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser