Comparative Gene Finding (eBook)

Models, Algorithms and Implementation
eBook Download: PDF
2010 | 2010
XV, 304 Seiten
Springer London (Verlag)
978-1-84996-104-2 (ISBN)

Lese- und Medienproben

Comparative Gene Finding - Marina Axelson-Fisk
Systemvoraussetzungen
128,39 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Comparative genomics is a new and emerging ?eld, and with the explosion of ava- able biological sequences the requests for faster, more ef?cient and more robust algorithms to analyze all this data are immense. This book is meant to serve as a self-contained instruction of the state-of-the-art of computational gene ?nding in general and of comparative approaches in particular. It is meant as an overview of the various methods that have been applied in the ?eld, and a quick introduction into how computational gene ?nders are built in general. A beginner to the ?eld could use this book as a guide through to the main points to think about when constructing a gene ?nder, and the main algorithms that are in use. On the other hand, the more experienced gene ?nder should be able to use this book as a reference to different methods and to the main components incorporated in these methods. I have focused on the main uses of the covered methods and avoided much of the technical details and general extensions of the models. In exchange I have tried to supply references to more detailed accounts of the different research areas touched upon. The book, however, makes no claim on being comprehensive.
Comparative genomics is a new and emerging ?eld, and with the explosion of ava- able biological sequences the requests for faster, more ef?cient and more robust algorithms to analyze all this data are immense. This book is meant to serve as a self-contained instruction of the state-of-the-art of computational gene ?nding in general and of comparative approaches in particular. It is meant as an overview of the various methods that have been applied in the ?eld, and a quick introduction into how computational gene ?nders are built in general. A beginner to the ?eld could use this book as a guide through to the main points to think about when constructing a gene ?nder, and the main algorithms that are in use. On the other hand, the more experienced gene ?nder should be able to use this book as a reference to different methods and to the main components incorporated in these methods. I have focused on the main uses of the covered methods and avoided much of the technical details and general extensions of the models. In exchange I have tried to supply references to more detailed accounts of the different research areas touched upon. The book, however, makes no claim on being comprehensive.

Preface 7
Acknowledgments 9
Contents 10
Acronyms 14
Introduction 15
Some Basic Genetics 15
The Central Dogma 17
The Structure of a Gene 19
How Many Genes Do We Have? 21
Problems of Gene Definitions 25
The Gene Finding Problem 26
Comparative Gene Finding 28
History of Algorithm Development 29
To Build a Gene Finder 32
References 35
Single Species Gene Finding 41
Hidden Markov Models (HMMs) 41
Markov Chains 42
Discrete-Time Markov Chains 42
Stationarity and Reversibility 48
Continuous-Time Markov Chains 50
Hidden Markov Models 53
Dynamic Programming 56
Silent Begin and End States 58
The Forward Algorithm 59
The Backward Algorithm 59
The Viterbi Algorithm 61
EasyGene: A Prokaryotic Gene Finder 63
Posterior Decoding 65
Statistical Significance of Predictions 65
Generalized Hidden Markov Models (GHMMs) 66
Preliminaries 66
The Forward and Backward Algorithms 68
The Forward Variables 68
The Backward Variables 70
The Viterbi Algorithm 70
Genscan: A GHMM-Based Gene Finder 71
Sequence Generation Algorithm 74
Reducing Computational Complexity 74
Exon Probabilities 78
Interpolated Markov Models (IMMs) 81
Preliminaries 81
Linear and Rational Interpolation 82
GLIMMER: A Microbial Gene Finder 83
Gene Prediction 84
Training the IMM 85
GlimmerM 86
Neural Networks 86
Biological Neurons 87
Artificial Neurons and the Perceptron 88
Multi-Layer Neural Networks 90
GRAIL: A Neural Network-Based Gene Finder 91
Decision Trees 93
Classification 94
Decision Tree Learning 95
MORGAN: A Decision Tree-Based Gene Finder 99
References 100
Sequence Alignment 103
Pairwise Sequence Alignment 103
Dot Plot Matrix 105
Nucleotide Substitution Models 106
The Jukes-Cantor Model 108
The Kimura Model 109
The Felsenstein Model 110
The Tamura and Nei Model 111
General Time-Reversible (GTR) Model 111
Amino Acid Substitution Models 112
The PAM Matrix 113
The BLOSUM Matrix 117
The GONNET matrix 120
Gap Models 120
The Needleman-Wunsch Algorithm 122
Needleman-Wunsch Using Affine Gaps 124
The Smith-Waterman Algorithm 126
Pair Hidden Markov Models (PHMMs) 128
Preliminaries 128
The Forward, Backward, and Viterbi Algorithms 130
Database Similarity Searches 132
FASTA 132
BLAST 134
Gapped BLAST 136
PSI-BLAST 136
The Significance of Alignment Scores 137
Multiple Sequence Alignment 138
Scoring Schemes 139
Sum-of-Pairs (SP) 141
Weighted Sum-of-Pairs (WSP) 141
Minimum Entropy 141
Gap Costs 142
Phylogenetic Trees 143
The Neighbor-Joining Method 143
Fitch-Margoliash 144
Dynamic Programming 145
The MSA Package 145
Progressive Alignments 147
Iterative Methods 150
Hidden Markov Models 153
SAM-Sequence Alignment and Modeling 153
Genetic Algorithms 155
Simulated Annealing 158
Alignment Profiles 161
Standard Profiles 161
Profile HMMs 163
Scoring a New Sequence 164
References 165
Comparative Gene Finding 171
Similarity-Based Gene Finding 171
GenomeScan: GHMM-Based Gene Finding Using Homology 172
Twinscan: GHMM-Based Gene Finding Using Informant Sequences 174
Heuristic Cross-Species Gene Finding 176
ROSETTA 176
Pair Hidden Markov Models (PHMMs) 177
DoubleScan: A PHMM-Based Comparative Gene Finder 178
The State Space 178
The Stepping Stone Algorithm 180
Generalized Pair Hidden Markov Models (GPHMMs) 181
Preliminaries 181
The Forward, Backward and Viterbi Algorithms 182
SLAM: A GPHMM-Based Comparative Gene Finder 184
The State Space 184
Reducing Computational Complexity 186
Gene Mapping 188
Projector: A Gene Mapping Tool 188
GeneMapper-Reference Based Annotation 189
Multiple Sequence Gene Finding 190
N-SCAN: A Multiple Informant-Based Gene Finder 191
References 193
Gene Structure Submodels 195
The State Space 195
The Exon States 196
Splice Sites 198
Introns and Intergenic Regions 199
Untranslated Regions (UTRs) 200
Promoters and PolyA-signals 201
State Length Distributions 202
Geometric and Negative Binomial Lengths 202
Empirical Length Distributions 205
Acyclic Discrete Phase Type Distributions 206
Sequence Content Sensors 210
GC-Content Binning 210
Start Codon Recognition 211
Codon and Amino Acid Usage 212
K-Tuple Frequency Analysis 214
Markov Chain Content Sensors 215
Interpolated Markov Models 217
Splice Site Detection 218
Weight Matrices and Weight Array Models 218
Variable-Length Markov Models (VLMMs) 221
Maximal Dependence Decomposition (MDD) 223
The Position with the Strongest Influence 224
Score a New Sequence 228
Neural Networks 229
Linear Discriminant Analysis 231
Quadratic Discriminant Analysis (QDA) 231
Linear Discriminant Analysis (LDA) 232
Maximum Entropy 235
The Maximum Entropy Method 236
Application to Splice Site Detection 239
Iterative Scaling 240
Bayesian Networks 241
Preliminaries 241
Some Bayesian Theory 242
Training a Bayesian Network 245
Application to Splice Site Detection 246
Support Vector Machines 247
Linearly Separable Classes 248
Nearly Linear SVMs 251
Nonlinear SVMs 251
SVMs in Splice Site Detection 254
References 256
Parameter Training 259
Introduction 259
Pseudocounts 260
The SAM Regularizer 261
Maximum Likelihood Estimation 262
HMM Training on Labeled Sequences 265
The Expectation-Maximization (EM) Algorithm 268
The Baum-Welch Algorithm 275
The Forward-Backward Algorithm 275
The Baum-Welch Algorithm 277
Gradient Ascent/Descent 279
The Backpropagation Algorithm 282
The Feed-Forward Step 284
The Backpropagation Step 286
The Gradient Descent Step 287
Several Training Patterns 287
Discriminative Training 288
Conditional Maximum Likelihood 288
Maximum Mutual Information 289
Minimum Classification Error 290
Gibbs Sampling 292
Gibbs Sampling for HMM Training 293
Simulated Annealing 294
Simulated Annealing for Training of HMMs 297
References 297
Implementation of a Comparative Gene Finder 299
Program Structure 299
Command Line Arguments 300
Parameter Files 302
Candidate Exon Boundaries 304
Output Files 305
The GPHMM Model 306
Modeling Intron and Intergenic Pairs 306
Modeling Exon Pairs 308
Approximate Alignment 309
Accuracy Assessment 310
Possible Model Extensions 311
References 312
Index 313

Erscheint lt. Verlag 30.1.2010
Reihe/Serie Computational Biology
Computational Biology
Zusatzinfo XV, 304 p.
Verlagsort London
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Theorie / Studium
Studium 1. Studienabschnitt (Vorklinik) Biochemie / Molekularbiologie
Naturwissenschaften Biologie Genetik / Molekularbiologie
Technik
Schlagworte algorithms • Bioinformatics • Biological sequence analysis • comparative genomics • Computational Biology • Computational Gene Finding • genes • Genetics • Information Theory • sequence alignment
ISBN-10 1-84996-104-2 / 1849961042
ISBN-13 978-1-84996-104-2 / 9781849961042
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 4,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Das Lehrbuch für das Medizinstudium

von Florian Horn

eBook Download (2020)
Georg Thieme Verlag KG
69,99
Das Lehrbuch für das Medizinstudium

von Florian Horn

eBook Download (2020)
Georg Thieme Verlag KG
69,99