Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis - Johannes Köster

Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

Buch | Softcover
132 Seiten
2015
epubli (Verlag)
978-3-7375-3777-3 (ISBN)
10,00 inkl. MwSt
  • Titel leider nicht mehr lieferbar
  • Artikel merken
This PhD thesis provides novel solutions to major topics within the analysis of next-generation sequencing data, focusing on parallelization, scalability and reproducibility.
The analysis of next-generation sequencing (NGS) data is a major topic in bioinfor-
matics: short reads obtained from DNA, the molecule encoding the genome of living
organisms, are processed to provide insight into biological or medical questions. This
thesis provides novel solutions to major topics within the analysis of NGS data, focusing
on parallelization, scalability and reproducibility.
The read mapping problem is to find the origin of the short reads within a given reference
genome. We contribute the q-group index, a novel data structure for read mapping with
particularly small memory footprint. The q-group index comes with massively parallel
build and query algorithms targeted towards modern graphics processing units (GPUs).
On top, the read mapping software PEANUT is presented, which outperforms state of
the art read mappers in speed while maintaining their accuracy.
The variant calling problem is to infer (i.e., call) genetic variants of individuals compared
to a reference genome using mapped reads. It is usually solved in a Bayesian way.
In this work, we show how to integrate filtering of variants into the calling with an
algebraic approach and provide an intuitive solution for controlling the false discovery
rate along with solving other challenges of variant calling like scaling with a growing
set of biological samples.

Depending on the research question, the analysis of NGS data entails many other steps,
typically involving diverse tools, data transformations and aggregation of results. These
steps can be orchestrated by workflow management. We present the general purpose
workflow system Snakemake, which provides an easy to read domain-specific language
for defining and documenting workflows. Snakemake provides an execution environment
that allows to scale a workflow to available resources, including parallelization across
CPU cores or cluster nodes, restricting memory usage or the number of available
coprocessors like GPUs.

Johannes Köster is a computer scientist with a focus on algorithm engineering and data analysis in bioinformatics. Currently, he works as a Postdoctoral Research Fellow in the groups of Shirley Liu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health and Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute.

Erscheint lt. Verlag 12.4.2015
Sprache englisch
Maße 148 x 210 mm
Gewicht 213 g
Themenwelt Naturwissenschaften
Recht / Steuern
Schlagworte algorithms • Bioinformatics • data structures
ISBN-10 3-7375-3777-1 / 3737537771
ISBN-13 978-3-7375-3777-3 / 9783737537773
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
eine Einführung

von Hans Karl Wytrzens; Elisabeth Schauppenlehner-Kloyber …

Buch | Softcover (2023)
Facultas (Verlag)
21,40
Eine kurze Geschichte | Über die Regeln unseres Lebens

von Lorraine Daston

Buch | Hardcover (2023)
Suhrkamp (Verlag)
34,00
Bedeutung, Anwendung und Interpretation von Symbolen aus der …

von Katie Steckles; Nathan Adams

Buch | Hardcover (2024)
Librero (Verlag)
14,95