High-Dimensional Data Analysis in Cancer Research (eBook)
VIII, 392 Seiten
Springer New York (Verlag)
978-0-387-69765-9 (ISBN)
Multivariate analysis is a mainstay of statistical tools in the analysis of biomedical data. It concerns with associating data matrices of n rows by p columns, with rows representing samples (or patients) and columns attributes of samples, to some response variables, e.g., patients outcome. Classically, the sample size n is much larger than p, the number of variables. The properties of statistical models have been mostly discussed under the assumption of fixed p and infinite n. The advance of biological sciences and technologies has revolutionized the process of investigations of cancer. The biomedical data collection has become more automatic and more extensive. We are in the era of p as a large fraction of n, and even much larger than n. Take proteomics as an example. Although proteomic techniques have been researched and developed for many decades to identify proteins or peptides uniquely associated with a given disease state, until recently this has been mostly a laborious process, carried out one protein at a time. The advent of high throughput proteome-wide technologies such as liquid chromatography-tandem mass spectroscopy make it possible to generate proteomic signatures that facilitate rapid development of new strategies for proteomics-based detection of disease. This poses new challenges and calls for scalable solutions to the analysis of such high dimensional data. In this volume, we will present the systematic and analytical approaches and strategies from both biostatistics and bioinformatics to the analysis of correlated and high-dimensional data.
Multivariate analysis is a mainstay of statistical tools in the analysis of biomedical data. It concerns with associating data matrices of n rows by p columns, with rows representing samples (or patients) and columns attributes of samples, to some response variables, e.g., patients outcome. Classically, the sample size n is much larger than p, the number of variables. The properties of statistical models have been mostly discussed under the assumption of fixed p and infinite n. The advance of biological sciences and technologies has revolutionized the process of investigations of cancer. The biomedical data collection has become more automatic and more extensive. We are in the era of p as a large fraction of n, and even much larger than n. Take proteomics as an example. Although proteomic techniques have been researched and developed for many decades to identify proteins or peptides uniquely associated with a given disease state, until recently this has been mostly a laborious process, carried out one protein at a time. The advent of high throughput proteome-wide technologies such as liquid chromatography-tandem mass spectroscopy make it possible to generate proteomic signatures that facilitate rapid development of new strategies for proteomics-based detection of disease. This poses new challenges and calls for scalable solutions to the analysis of such high dimensional data. In this volume, we will present the systematic and analytical approaches and strategies from both biostatistics and bioinformatics to the analysis of correlated and high-dimensional data.
Preface 7
Contents 9
Contributors 13
On the Role and Potential of High-Dimensional Biologic Data in Cancer Research 15
1.1 Introduction 15
1.2 Potential of High-Dimensional Data in Biomedical Research 15
1.3 Statistical Challenges and Opportunities with High- Dimensional Data 20
1.4 Needed Future Research 23
References 24
Variable Selection in Regression – Estimation, Prediction, Sparsity, Inference 26
2.1 Overview of Model Selection Methods 26
2.2 Multivariable Modeling: Penalties/Shrinkage 29
2.3 Least Angle Regression 34
2.4 Dantzig Selector 35
2.5 Prediction and Persistence 38
2.6 Difficulties with Post-Model Selection Inference 39
2.7 Penalized Likelihood for Generalized Linear Models 41
2.8 Simulation Study 41
2.9 Application of the Methods to the Prostate Cancer Data Set 43
2.10 Conclusion 45
References 45
Multivariate Nonparametric Regression 47
3.1 An Example 48
3.2 Linear and Additive Models 48
3.3 Interactions 49
3.4 Basis Function Expansions 51
3.5 Regression Tree Models 52
3.6 Spline Models 56
3.7 Logic Regression 59
3.8 High-Dimensional Data 62
3.9 Survival Data 65
3.10 Discussion 68
References 68
Risk Estimation 71
4.1 Risk 71
4.2 Covariance Penalty 72
4.3 Resampling Methods 77
4.4 Applications of Risk Estimation 82
References 91
Tree-Based Methods 94
5.1 Chapter Outline 94
5.2 Background 95
5.3 Classification and Regression Trees 96
5.4 Tree-Based Ensembles 99
5.5 Example: Prostate Cancer Microarrays 107
5.6 Software 109
5.7 Recent Research and Oncology Applications 109
References 111
Support Vector Machine Classification for High- Dimensional Microarray Data Analysis, With Applications in Cancer Research 113
6.1 Classification Problems: A Statistical Point of View 114
6.2 Support Vector Machine for Two-Class Classification 117
6.3 Support Vector Machines for Multiclass Problems 122
6.4 Parameter Tuning and Solution Path for SVM 124
6.5 Sparse Learning with Support Vector Machines 125
6.6 Cancer Data Analysis Using SVM 130
References 133
Bayesian Approaches: Nonparametric Bayesian Analysis of Gene Expression Data 137
7.1 Introduction 137
7.2 Bayesian Analysis of Microarray Data 139
7.3 Nonparametric Bayesian Mixture Model 143
7.4 Posterior Inference of the Bayesian Model 145
7.5 Leukemia Gene Expression Example 148
7.6 Discussion 152
References 154
Index 157
Erscheint lt. Verlag | 19.12.2008 |
---|---|
Reihe/Serie | Applied Bioinformatics and Biostatistics in Cancer Research | Applied Bioinformatics and Biostatistics in Cancer Research |
Zusatzinfo | VIII, 392 p. 23 illus., 6 illus. in color. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Medizin / Pharmazie ► Medizinische Fachgebiete ► Onkologie |
Studium ► 1. Studienabschnitt (Vorklinik) ► Biochemie / Molekularbiologie | |
Studium ► 2. Studienabschnitt (Klinik) ► Humangenetik | |
Naturwissenschaften ► Biologie | |
Technik | |
Schlagworte | Bayesian Approaches • Bioinformatics • Cancer Research • classification • gene expression • genes • High-Dimensional Biologic data • Laboratory • microarray • Multivariate Nonparametric Regression • Oncology • Proteomics • Risk Estimation • Tree-based Methods • Vector • Vector Machine Classi¯cation |
ISBN-10 | 0-387-69765-9 / 0387697659 |
ISBN-13 | 978-0-387-69765-9 / 9780387697659 |
Haben Sie eine Frage zum Produkt? |
Größe: 2,8 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich