Introduction to workflow management systems – building a bioinformatics pipeline with Snakemake
Bioinformatic analyses often require a variety of heterogeneous software in various stage of maturity to process the high-dimensional and complex omics data. Given the large volume of data produced, high performance computing resources are often necessary to conduct such analyses. A complete analysis workflow requires several bioinformatics tools that differ in execution environments as well as CPU and memory requirements. Using separate scripts for each step in a pipeline and manual intervention for data management is laborious and error-prone.
Workflow management systems such as Snakemake have become invaluable for resource-efficient computational analyses. Snakemake, Nextflow and BigDataScript provide an easy-to-develop and easy-to-use framework, while other pipeline framworks have an advantage in performance.
In the context of this workshop the participants will get an overview of workflow management systems and their application in the bioinformatics context. After a short introduction to the basic syntax of Snakemake, the participants will get the opportunity to write a short Snakemake pipeline using down sampled data to gain firsthand experience with a workflow management system.