Computational and Statistical Approaches to Genomics (eBook)

Wei Zhang, Ilya Shmulevich (Herausgeber)

eBook Download: PDF
2007 | 2nd ed. 2006
X, 416 Seiten
Springer US (Verlag)
978-0-387-26288-8 (ISBN)

Lese- und Medienproben

Computational and Statistical Approaches to Genomics -
Systemvoraussetzungen
149,79 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

The second edition of this book adds eight new contributors to reflect a modern cutting edge approach to genomics. It contains the newest research results on genomic analysis and modeling using state-of-the-art methods from engineering, statistics, and genomics. These tools and models are then applied to real biological and clinical problems. The book's original seventeen chapters are also updated to provide new initiatives and directions.


The 2nd edition of this book adds 8 new contributors to reflect a modern cutting edge approach to genomics. The expanded scope includes coverage of statistical issues on single nucleotide polymorphism analysis array, CGH analysis, SAGE analysis, gene shaving and related methods for microarray data analysis, and cross-hybridization issues on oligo arrays. The authors of the 17 original chapters have updated the contents of their chapters, including references, on such topics as the development of novel engineering, statistical and computational principles, as well as methods, models, and tools from these disciplines applied to genomics.

Contents 6
Preface 8
MICROARRAY IMAGE ANALYSIS AND GENE EXPRESSION RATIO STATISTICS 9
1. Introduction 9
2. Microarray Image Analysis 11
2.1 Target Segmentation and Clone Information Assignment 12
2.2 Background Detection 14
2.3 Target Detection 16
2.4 Intensity Measurement and Ratio Calculation 18
3. Ratio Statistics 19
3.1 Constant Coefficient of Variation 19
3.2 Ratio Confidence Interval 21
3.3 Ratio Normalization 22
3.4 Ratio Statistics for Low Signal-to-Noise Ratio 23
3.5 Measurement Quality Assessment 23
4. Conclusions 25
References 25
STATISTICAL CONSIDERATIONS IN THE ASSESSMENT OF CDNA MICROARRAY DATA OBTAINED USING AMPLIFICATION 28
1. Introduction 28
2. Amplification Methods 29
2.1 RNA Amplification 29
2.2 Fluorescent Signal Amplification 30
3. Data Analysis Strategy 31
4. An Example 35
4.1 Data Preprocessing 36
4.2 Data Analysis 37
5. Discussion 40
Acknowledgments 42
References 42
SOURCES OF VARIATION IN MICROARRAY EXPERIMENTS 44
Introduction 44
1. The Experiment 45
2. Experimental Design 45
3. Data Analysis 47
4. Discussion 52
Acknowledgments 53
References 53
STUDENTIZING MICROARRAY DATA 55
1. Introduction 55
2. Fold Differences and Error Models 56
3. A Case Study 56
3.1 Array Layout and Preprocessing 56
3.2 Single Channel Images 58
3.3 Replicate Ratios 60
3.4 Variance Fitting and Studentization 61
4. Discussion 65
Acknowledgments 65
References 65
EXPLORATORY CLUSTERING OF GENE EXPRESSION PROFILES OF MUTATED YEAST STRAINS 66
1. Introduction 66
2. The Data 67
3. Choosing the Metric 68
3.1 Methods 70
3.2 Results 70
4. Self-Organizing Map-Based Exploratory Clustering 71
4.1 Self-Organizing Maps 71
4.2 Overview of the Cluster Structure of the Data 72
4.3 Interpretation of the Clusters 74
5. Conclusions 77
Acknowledgments 77
Notes 77
References 78
SELECTING INFORMATIVE GENES FOR CANCER CLASSIFICATION USING GENE EXPRESSION DATA 80
1. Introduction 80
2. Selection of Informative Genes 82
3. Algorithms for the Selection Problem 84
3.1 The WINNOWAlgorithm 84
3.2 A Simple Greedy Algorithm 85
3.3 Prediction by the Majority Voting 86
4. Computational Results 87
4.1 Comparison of Prediction Methods 87
4.2 Comparison of Selection Methods 88
5. Discussions 90
Acknowledgments 91
References 92
FINDING FUNCTIONAL STRUCTURES IN GLIOMA GENE- EXPRESSIONS USING GENE SHAVING CLUSTERING AND MDL PRINCIPLE 94
1. Introduction 94
2. Description of Processing Glioma Data Set 96
3. A Brief Review of “Gene Shaving” (GS) 98
4. The GS-MDL Clustering Algorithm 100
4.1 Background on Mixture Models for Gene Expression Data and Traditional Estimation Methods 101
4.2 MDL Estimation of the Number of Clusters 104
5. Functional Insights in Clustering Glioma Gene- Expression 107
Note 119
References 119
DESIGN ISSUES AND COMPARISON OF METHODS FOR MICROARRAY- BASED CLASSIFICATION 124
1. Introduction 124
2. Classification Rules 125
3. Some Specific Classification Rules 126
4. Constrained Classifiers 129
5. Perceptrons and Neural Networks 130
6. Error Estimation 134
7. Feature Selection 136
8. Illustration of Classification Techniques on Microarray Data 138
9. Conclusion 139
References 140
ANALYZING PROTEIN SEQUENCES USING SIGNAL ANALYSIS TECHNIQUES 142
1. Introduction 142
2. Frequency Analysis of Proteins 144
3. Time-Frequency Analysis 147
3.1 Non-Stationary Signals 147
3.2 Wavelet Transform 148
3.3 Wigner-Ville Distribution 149
3.4 Interference Terms 151
4. Application of Time-Frequency Analysis to Protein Families 153
4.1 Fibroblast Growth Factors 153
4.2 Homeodomain Proteins 155
5. Selection of Amino Acid Mappings 155
5.1 Amino Acid Indices 155
5.2 Information Theory 159
5.3 Analysis 162
6. Conclusions 164
Notes 165
References 165
SCALE-DEPENDENT STATISTICS OF THE NUMBERS OF TRANSCRIPTS AND PROTEIN SEQUENCES ENCODED IN THE GENOME 167
1. Introduction 167
2. Distributions of the Gene Expression Levels 169
2.1 Empirical Distributions 169
2.2 Effect of Sample Size on the Distribution Shape 172
3. Probability Distribution and an Estimator of the Total Number of Expressed Genes 174
4. Determination of the Number of Expressed Genes and GELPF in a Single Cell 176
4.1 The Number of Expressed Genes and GELPF in a Single Yeast Cell 176
4.2 Estimate of the Number of Expressed Genes and the GELPF in a Human Cell 178
5. Global Transcription Response to Damaging Factors 180
6. Stochastic and Descriptive Models of Gene Expression Process 183
7. Probability Distributions of the Numbers of Putative Proteins by Their Cluster Sizes and DNA- binding Proteins by the Regulated Promoters 186
8. Protein Domain Statistics in Proteomes 188
8.1 Statistical Analysis of Proteome Complexity 188
8.2 Prediction of the Numbers of Protein-Coding Genes in the Genome and of Protein Domains in the Entire Proteome 197
9. Conclusion 201
Acknowledgments 203
Appendix A: Infinity Limit for Population Growth Associated with the Generalized Pareto Probability Distribution 203
Appendix B: Population Growth Curve for the Number of Human Expressed Genes 205
Notes 205
References 207
STATISTICAL METHODS IN SERIAL ANALYSIS OF GENE EXPRESSION ( SAGE) 213
1. Introduction 213
2. Biology and Bioinformatics Background 214
3. Estimation 216
3.1 Point Estimation (Counts, Errors, Size) 217
3.2 Estimation by Interval (Error-Bars) 232
4. Differential Expression Detection 233
4.1 Single-Library or “Pseudo-library” 234
4.2 Replicated Libraries in One Class 238
4.3 Multiple Libraries Outlier Finding 241
5. Illustration of Methods Application 241
6. Conclusions 244
Acknowledgments 245
References 245
NORMALIZED MAXIMUM LIKELIHOOD MODELS FOR BOOLEAN REGRESSION WITH APPLICATION TO PREDICTION AND CLASSIFICATION IN GENOMICS 248
1. Introduction 248
2. The NML Model for Bernoulli Strings 250
3. The NML Model for Boolean Regression 252
3.1 The NML Model for the Boolean ClassM(, k, f ) 253
3.2 The NML Model for the Boolean ClassM(., k, f ) 255
3.3 A Two Part Code for the Boolean ClassM(, k, f ) 259
3.4 A Two Part Code for the Boolean ClassM(., k, f ) 260
4. Experimental Results 260
4.1 The NML Model for the Boolean Regression Models With k = 1 261
4.2 The NML Model for the Boolean Regression Models with k = 2 262
4.3 The NML Model for the Boolean Regression Models with k = 3 264
4.4 Extension of the Classification for Unseen Cases of the Boolean Regressors 266
4.5 Estimation of Classification Errors Achieved with Boolean Regression Models with k = 3 267
5. Conclusions 268
References 270
INFERENCE OF GENETIC REGULATORY NETWORKS VIA BEST- FIT EXTENSIONS 272
1. Introduction 272
2. Boolean Networks 274
3. The Best-Fit Extension Problem 275
4. Simulation Analysis 282
5. Conclusions 287
References 287
REGULARIZATION AND NOISE INJECTION FOR IMPROVING GENETIC NETWORK MODELS 292
1. Introduction 292
2. Current Approaches to Tackling the Dimensionality Problem 293
3. Learning Genetic Network Models 294
4. Robust Methods 297
5. Noise Injection is Equivalent to Regularization 299
6. Comparison with Other Models 301
7. Discussion 305
Acknowledgments 306
References 306
PARALLEL COMPUTATION AND VISUALIZATION TOOLS FOR CODETERMINATION ANALYSIS OF MULTIVARIATE GENE EXPRESSION RELATIONS 309
1. Introduction 309
2. Codetermination Algorithm 310
3. Prediction System Design 311
4. Parallel Analysis of Gene Expression (PAGE) 312
4.1 The Three Sequential Algorithms and Motivation for Parallel Implementation 313
4.2 Parallel implementation 313
4.3 Parallelization Methods 314
4.4 Parallel Versions of Algorithms 316
5. Visualization of Gene Expression (VOGE) 318
6. Summary and Conclusions 321
Acknowledgments 321
References 321
SINGLE NUCLEOTIDE POLYMORPHISMS AND THEIR APPLICATIONS 323
1. Introduction 323
2. SNPs and Genotype-Phenotype Association 326
3. SNPs, Haplotypes and Genetic Association 330
3.1 Haplotype Methods for Genetic Association 332
3.2 Estimating Haplotypes with SNPs 334
4. SNPs and Haplotype Blocks 337
4.1 Linkage Disequilibrium 338
4.2 Haplotype Blocks 340
4.3 Simulations 345
4.4 Applications 348
4.5 Tagging SNPs 349
5. Conclusions 351
6. Resources 351
6.1 Selected Haplotype Reconstruction Software 351
6.2 Tagging SNP Software 352
Acknowledgments 353
References 353
THE CONTRIBUTION OF ALTERNATIVE TRANSCRIPTION AND ALTERNATIVE SPLICING TO THE COMPLEXITY OF MAMMALIAN TRANSCRIPTOMES 362
1. Introduction 362
2. Alternative Splicing in Mouse 363
3. Impact of Alternative Splicing on the Coding Potential 368
4. Alternative Splicing and Alternative Transcription 371
5. Regulation of Splicing 374
5.1 Length Distribution is Different Between Constitutive and Cryptic Exons 374
5.2 Constitutive and Cryptic Exons Differ in their Flanking Splice Signals 376
5.3 Constitutive Exons are Enriched in Known Splice Enhancer Motifs 377
5.4 Recruitment of Repeat Sequences in Alternative Splicing 380
6. Alternative Splicing of Regulatory Factors 382
6.1 Alternative Splicing of Zinc Finger-Containing Proteins 383
7. Conclusions 384
Notes 386
References 386
COMPUTATIONAL IMAGING, AND STATISTICAL ANALYSIS OF TISSUE MICROARRAYS: QUANTITATIVE AUTOMATED ANALYSIS OF TISSUE MICROARRAYS 392
1. Introduction 392
2. Oxidation and Storage 395
3. Fixation and Antigen Retrieval 396
4. Standardization of Immunohistochemistry 398
5. Quantitative Immunohistochemistry 400
6. Fluorescence-Based Platforms for Quantitative Analysis 403
References 408
Index 415

Erscheint lt. Verlag 26.12.2007
Zusatzinfo X, 416 p. 5 illus. in color.
Verlagsort New York
Sprache englisch
Themenwelt Informatik Grafik / Design Digitale Bildverarbeitung
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Medizin / Pharmazie Medizinische Fachgebiete Onkologie
Studium 1. Studienabschnitt (Vorklinik) Biochemie / Molekularbiologie
Naturwissenschaften Biologie Biochemie
Naturwissenschaften Biologie Genetik / Molekularbiologie
Technik Umwelttechnik / Biotechnologie
Schlagworte classification • Complexity • Data Analysis • gene expression • genes • Hybridization • Image Analysis • microarray • Modeling • proving • single nucleotide polymorphism • Statistics • termination • Visualization
ISBN-10 0-387-26288-1 / 0387262881
ISBN-13 978-0-387-26288-8 / 9780387262888
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 7,7 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Discover the smart way to polish your digital imagery skills by …

von Gary Bradley

eBook Download (2024)
Packt Publishing (Verlag)
50,39
Explore powerful modeling and character creation techniques used for …

von Lukas Kutschera

eBook Download (2024)
Packt Publishing (Verlag)
43,19
Generate creative images from text prompts and seamlessly integrate …

von Margarida Barreto

eBook Download (2024)
Packt Publishing (Verlag)
32,39