We may earn an affiliate commission when you visit our partners.
Course image
Coursera logo

Genomic Data Science and Clustering (Bioinformatics V)

Pavel Pevzner and Phillip Compeau

How do we infer which genes orchestrate various processes in the cell? How did humans migrate out of Africa and spread around the world? In this class, we will see that these two seemingly different questions can be addressed using similar algorithmic and machine learning techniques arising from the general problem of dividing data points into distinct clusters.

Read more

How do we infer which genes orchestrate various processes in the cell? How did humans migrate out of Africa and spread around the world? In this class, we will see that these two seemingly different questions can be addressed using similar algorithmic and machine learning techniques arising from the general problem of dividing data points into distinct clusters.

In the first half of the course, we will introduce algorithms for clustering a group of objects into a collection of clusters based on their similarity, a classic problem in data science, and see how these algorithms can be applied to gene expression data.

In the second half of the course, we will introduce another classic tool in data science called principal components analysis that can be used to preprocess multidimensional data before clustering in an effort to greatly reduce the number dimensions without losing much of the "signal" in the data.

Finally, you will learn how to apply popular bioinformatics software tools to solve a real problem in clustering.

Enroll now

What's inside

Syllabus

Week 1: Introduction to Clustering Algorithms

Welcome to class!

At the beginning of the class, we will see how algorithms for clustering a set of data points will help us determine how yeast became such good wine-makers. At the bottom of this email is the Bioinformatics Cartoon for this chapter, courtesy of Randall Christopher and serving as a chapter header in the Specialization's bestselling print companion. How did the monkey lose a wine-drinking contest to a tiny mammal?  Why have Pavel and Phillip become cavemen? And will flipping a coin help them escape their eternal boredom until they can return to the present? Start learning to find out!

Read more
Week 2: Advanced Clustering Techniques

Welcome to week 2 of class!

This week, we will see how we can move from a "hard" assignment of points to clusters toward a "soft" assignment that allows the boundaries of the clusters to blend. We will also see how to adapt the Lloyd algorithm that we encountered in the first week in order to produce an algorithm for soft clustering. We will also see another clustering algorithm called "hierarchical clustering" that groups objects into larger and larger clusters.

Week 3: Introductory Algorithms in Population Genetics

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Explores various clustering and principal components analysis techniques, which are established in data science
Demonstrates applications of clustering techniques in both gene expression analysis and human migration studies
Covers fundamental algorithms in population genetics
Provides foundational knowledge for further exploration in these scientific domains

Save this course

Save Genomic Data Science and Clustering (Bioinformatics V) to your list so you can find it easily later:
Save

Reviews summary

Genomic data science and bioinformatics

Learners say this course covers genomic data science and bioinformatics. Students largely agree that the lectures are engaging, find the assignments helpful, and appreciate the practical exercises. Many who mentioned difficulty with the course also noted the instructor's clear explanations and stated that they learned challenging material well. Students who had a positive experience recommend this course.
Concepts linked to real-world applications
"Good course to exercise you data science skills"
"clustering chapters and the practical exercises with the yeast dataset"
"Highly recommend the course and the specializations to all learners who are serious about learning algorithms"
Clear explanations of challenging concepts
"explanation to the content was very well given"
"They show an algorithm but don't show an example problem using that algorithm very often."
"I felt like there was a textbook I was missing, but I read every slide and watched every video"
Course can be challenging
"Too hard, need more instructor guidance please."
"Very tough course."
"I also found difficulties in solving quiz 3."

Career center

Learners who complete Genomic Data Science and Clustering (Bioinformatics V) will develop knowledge and skills that may be useful to these careers:
Biostatistician
A Biostatistician applies statistical principles to the design of studies collecting biological data, as well as to the analysis and interpretation of such data. The Genomic Data Science and Clustering course provides foundational knowledge in algorithms for clustering data points into distinct clusters. This knowledge directly supports the analysis and interpretation of biological data, making this course highly relevant to the role of a Biostatistician. By taking this course, you will gain a strong foundation in the methodologies used by Biostatisticians to extract meaningful insights from complex biological datasets.
Population Geneticist
A Population Geneticist studies the genetic variation within and between populations. The Genomic Data Science and Clustering course provides foundational knowledge in algorithms for clustering data points into distinct clusters. This knowledge is directly applicable to the analysis of genetic variation, making this course highly relevant to the role of a Population Geneticist. By taking this course, you will gain a strong foundation in the methodologies used by Population Geneticists to study genetic diversity and evolution.
Computational Biologist
A Computational Biologist uses computational tools and techniques to analyze and interpret biological data. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are widely used in computational biology, making this course essential for anyone seeking to enter this field. By taking this course, you will gain foundational skills in data analysis and interpretation, which are critical for success in computational biology.
Geneticist
A Geneticist studies genes, their inheritance, and their role in health and disease. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are widely used in genetics to analyze and interpret genetic data. By taking this course, you will gain a strong foundation in data analysis and interpretation, which are essential for success in genetics.
Data Scientist
A Data Scientist uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for data scientists working with large and complex datasets. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are vital for success in data science.
Quantitative Analyst
A Quantitative Analyst uses mathematical and statistical models to analyze financial data and make investment decisions. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for quantitative analysts working with large and complex datasets. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are vital for success as a quantitative analyst.
Machine Learning Engineer
A Machine Learning Engineer designs, develops, and deploys machine learning models to solve real-world problems. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for machine learning engineers working with large and complex datasets. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are critical for success as a machine learning engineer.
Data Architect
A Data Architect designs and manages the architecture of data systems. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for data architects working on large and complex data systems. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are critical for success as a data architect.
Statistician
A Statistician collects, analyzes, interprets, and presents data. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for statisticians working with large and complex datasets. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are critical for success as a statistician.
Operations Research Analyst
An Operations Research Analyst uses advanced analytical techniques to solve complex problems in various industries, including healthcare, transportation, and finance. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are commonly used by operations research analysts to analyze and interpret data. By taking this course, you will gain a strong foundation in data analysis and interpretation, which are essential for success as an operations research analyst.
Software Engineer
A Software Engineer designs, develops, and maintains software systems. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for software engineers working on data-intensive applications. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are valuable for success as a software engineer.
Systems Analyst
A Systems Analyst analyzes and designs systems to meet the needs of an organization. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for systems analysts working on data-intensive systems. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are valuable for success as a systems analyst.
Web Developer
A Web Developer designs and develops websites and web applications. The Genomic Data Science and Clustering course provides an introduction to algorithms for clustering data points into distinct clusters and principal components analysis, a technique for preprocessing multidimensional data before clustering. These techniques are essential for web developers working on data-driven websites and applications. By taking this course, you will gain a solid foundation in data analysis and interpretation, which are valuable for success as a web developer.
Research Scientist
A Research Scientist conducts scientific research in a specific field. The Genomic Data Science and Clustering course provides foundational knowledge in algorithms for clustering data points into distinct clusters. This knowledge is applicable to various scientific disciplines. By taking this course, you will gain a strong foundation in data analysis and interpretation, which are essential for success as a Research Scientist.
Teacher
A Teacher educates students in a specific subject area. The Genomic Data Science and Clustering course provides foundational knowledge in algorithms for clustering data points into distinct clusters. This knowledge can be applied to teaching students about data analysis and interpretation. By taking this course, you will gain a strong foundation in data analysis and interpretation, which can support your teaching in this area.

Reading list

We've selected 20 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Genomic Data Science and Clustering (Bioinformatics V).
This is the companion textbook to the course. It offers a bit more depth than what is offered in the course.
This comprehensive textbook covers a wide range of bioinformatics topics, including sequence analysis, protein structure prediction, and gene expression profiling. It provides a thorough reference for both students and professionals.
This textbook delves into the algorithmic foundations of bioinformatics, covering advanced topics such as sequence alignment, phylogenetic tree construction, and gene finding. It provides a deeper understanding of the algorithms discussed in the course.
Provides a comprehensive overview of Bayesian data analysis, covering topics such as Bayesian inference, model selection, and Bayesian computation. It would be a valuable resource for students and researchers in the field.
Provides a comprehensive overview of causal inference in statistics, covering topics such as causal models, causal effects, and causal inference methods. It would be a valuable resource for students and researchers in the field.
Provides a comprehensive overview of molecular biology, covering topics such as DNA structure and function, gene expression, and cell signaling. It would be a valuable resource for students and researchers in the field.
Provides a comprehensive overview of genetic analysis, covering topics such as Mendelian genetics, molecular genetics, and population genetics. It would be a valuable resource for students and researchers in the field.
Provides a practical guide to data analysis using regression and multilevel/hierarchical models, covering topics such as model building, model selection, and model checking. It would be a valuable resource for students and researchers in the field.
This textbook provides a comprehensive overview of machine learning algorithms and techniques, including supervised and unsupervised learning, dimensionality reduction, and model evaluation. It provides a broader context for understanding the machine learning methods used in bioinformatics.
This textbook provides a rigorous mathematical treatment of probabilistic models used in sequence analysis. It covers topics such as hidden Markov models, multiple sequence alignment, and phylogenetic tree reconstruction.
Provides a practical guide to reproducible research in bioinformatics, covering topics such as data management, statistical analysis, and visualization. It would be a valuable resource for students and researchers in the field.
Provides a comprehensive overview of population genetics, which field that uses mathematical and statistical tools to study the genetic variation within populations.
Introduces machine learning concepts and techniques used in bioinformatics, such as supervised and unsupervised learning, feature selection, and model evaluation. It provides a broader perspective on data analysis methods beyond clustering.
Covers statistical methods commonly used in bioinformatics, such as hypothesis testing, regression analysis, and Bayesian inference. It provides a solid foundation for understanding the statistical principles underlying bioinformatics research.
This textbook provides a comprehensive introduction to bioinformatics, covering core concepts and techniques used in the field. It provides a strong foundation for understanding the algorithms and techniques covered in the course.
Provides a non-technical overview of evolution, covering topics such as natural selection, genetic drift, and the evolution of humans. It would be a valuable resource for students and researchers in the field.
This textbook provides a concise overview of essential bioinformatics concepts, including sequence analysis, gene expression analysis, and genome annotation. It provides a good starting point for those new to the field.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser