Introduction to Unix/Linux & Command Line Data Analysis

Matthew Peterson

Introduction to Unix/Linux

This part introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CGRB research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.

Command-Line Data Analysis

Introduction to Unix/Linux

This part introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CGRB research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.

Command-Line Data Analysis

The Linux command-line environment has long been used for analyzing text-based and scientific data, and there are a large number of tools pre-installed for data analysis. These can be chained together to form powerful pipelines. Material in this part will cover these and related tools (including grep, sort, awk, sed, etc.) driven by examples of biological data in a problem-solving context that introduces programmatic thinking. This part also covers regular expressions, a useful syntax for matching and substituting string and sequence data. Individuals who complete both parts will receive a Certificate of Completion and a Digital Badge detailing the course information.

Leave with the ability to navigate and operate a Linux computational infrastructure via the command-line.

Understand the installation, functioning, and use of common bioinformatics analysis software packages on a Linux infrastructure.

Navigate and use the Unix/Linux file system, including understanding directory structure/permissions, and creating/editing/removing files and directories.

Locate and download bioinformatics data sets along with the installation and use of bioinformatics utilities such as HMMER, BLAST, and MUSCLE.

Use `sort` and `uniq` to build filtering pipelines for bioinformatics data.

Use the utilities `sed` and `awk` along with POSIX compliant “regular expressions” (regex) to perform complex pattern matching and extraction on bioinformatics data.

Submit batch jobs to a computational infrastructure to run (non-interactively) on cluster nodes.

This course is no longer available. Find a similar course by searching these:

bioinformatics command-line environment unix linux computational biology data analysis

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.

Save

Activities

Coming soon We're preparing activities for Introduction to Unix/Linux & Command Line Data Analysis. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Introduction to Unix/Linux & Command Line Data Analysis will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Facebook

X

Copy Link

Email

Similar courses

Similar courses are unavailable at this time. Please try again later.