High-Dimensional Data Analysis in Cancer Research (eBook)

Xiaochun Li, Ronghui Xu (Herausgeber)

eBook Download: PDF
2008 | 2009
VIII, 392 Seiten
Springer New York (Verlag)
978-0-387-69765-9 (ISBN)

Lese- und Medienproben

High-Dimensional Data Analysis in Cancer Research -
Systemvoraussetzungen
96,29 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Multivariate analysis is a mainstay of statistical tools in the analysis of biomedical data. It concerns with associating data matrices of n rows by p columns, with rows representing samples (or patients) and columns attributes of samples, to some response variables, e.g., patients outcome. Classically, the sample size n is much larger than p, the number of variables. The properties of statistical models have been mostly discussed under the assumption of fixed p and infinite n. The advance of biological sciences and technologies has revolutionized the process of investigations of cancer. The biomedical data collection has become more automatic and more extensive. We are in the era of p as a large fraction of n, and even much larger than n. Take proteomics as an example. Although proteomic techniques have been researched and developed for many decades to identify proteins or peptides uniquely associated with a given disease state, until recently this has been mostly a laborious process, carried out one protein at a time. The advent of high throughput proteome-wide technologies such as liquid chromatography-tandem mass spectroscopy make it possible to generate proteomic signatures that facilitate rapid development of new strategies for proteomics-based detection of disease. This poses new challenges and calls for scalable solutions to the analysis of such high dimensional data. In this volume, we will present the systematic and analytical approaches and strategies from both biostatistics and bioinformatics to the analysis of correlated and high-dimensional data.


Multivariate analysis is a mainstay of statistical tools in the analysis of biomedical data. It concerns with associating data matrices of n rows by p columns, with rows representing samples (or patients) and columns attributes of samples, to some response variables, e.g., patients outcome. Classically, the sample size n is much larger than p, the number of variables. The properties of statistical models have been mostly discussed under the assumption of fixed p and infinite n. The advance of biological sciences and technologies has revolutionized the process of investigations of cancer. The biomedical data collection has become more automatic and more extensive. We are in the era of p as a large fraction of n, and even much larger than n. Take proteomics as an example. Although proteomic techniques have been researched and developed for many decades to identify proteins or peptides uniquely associated with a given disease state, until recently this has been mostly a laborious process, carried out one protein at a time. The advent of high throughput proteome-wide technologies such as liquid chromatography-tandem mass spectroscopy make it possible to generate proteomic signatures that facilitate rapid development of new strategies for proteomics-based detection of disease. This poses new challenges and calls for scalable solutions to the analysis of such high dimensional data. In this volume, we will present the systematic and analytical approaches and strategies from both biostatistics and bioinformatics to the analysis of correlated and high-dimensional data.

Preface 7
Contents 9
Contributors 13
On the Role and Potential of High-Dimensional Biologic Data in Cancer Research 15
1.1 Introduction 15
1.2 Potential of High-Dimensional Data in Biomedical Research 15
1.3 Statistical Challenges and Opportunities with High- Dimensional Data 20
1.4 Needed Future Research 23
References 24
Variable Selection in Regression – Estimation, Prediction, Sparsity, Inference 26
2.1 Overview of Model Selection Methods 26
2.2 Multivariable Modeling: Penalties/Shrinkage 29
2.3 Least Angle Regression 34
2.4 Dantzig Selector 35
2.5 Prediction and Persistence 38
2.6 Difficulties with Post-Model Selection Inference 39
2.7 Penalized Likelihood for Generalized Linear Models 41
2.8 Simulation Study 41
2.9 Application of the Methods to the Prostate Cancer Data Set 43
2.10 Conclusion 45
References 45
Multivariate Nonparametric Regression 47
3.1 An Example 48
3.2 Linear and Additive Models 48
3.3 Interactions 49
3.4 Basis Function Expansions 51
3.5 Regression Tree Models 52
3.6 Spline Models 56
3.7 Logic Regression 59
3.8 High-Dimensional Data 62
3.9 Survival Data 65
3.10 Discussion 68
References 68
Risk Estimation 71
4.1 Risk 71
4.2 Covariance Penalty 72
4.3 Resampling Methods 77
4.4 Applications of Risk Estimation 82
References 91
Tree-Based Methods 94
5.1 Chapter Outline 94
5.2 Background 95
5.3 Classification and Regression Trees 96
5.4 Tree-Based Ensembles 99
5.5 Example: Prostate Cancer Microarrays 107
5.6 Software 109
5.7 Recent Research and Oncology Applications 109
References 111
Support Vector Machine Classification for High- Dimensional Microarray Data Analysis, With Applications in Cancer Research 113
6.1 Classification Problems: A Statistical Point of View 114
6.2 Support Vector Machine for Two-Class Classification 117
6.3 Support Vector Machines for Multiclass Problems 122
6.4 Parameter Tuning and Solution Path for SVM 124
6.5 Sparse Learning with Support Vector Machines 125
6.6 Cancer Data Analysis Using SVM 130
References 133
Bayesian Approaches: Nonparametric Bayesian Analysis of Gene Expression Data 137
7.1 Introduction 137
7.2 Bayesian Analysis of Microarray Data 139
7.3 Nonparametric Bayesian Mixture Model 143
7.4 Posterior Inference of the Bayesian Model 145
7.5 Leukemia Gene Expression Example 148
7.6 Discussion 152
References 154
Index 157

Erscheint lt. Verlag 19.12.2008
Reihe/Serie Applied Bioinformatics and Biostatistics in Cancer Research
Applied Bioinformatics and Biostatistics in Cancer Research
Zusatzinfo VIII, 392 p. 23 illus., 6 illus. in color.
Verlagsort New York
Sprache englisch
Themenwelt Medizin / Pharmazie Medizinische Fachgebiete Onkologie
Studium 1. Studienabschnitt (Vorklinik) Biochemie / Molekularbiologie
Studium 2. Studienabschnitt (Klinik) Humangenetik
Naturwissenschaften Biologie
Technik
Schlagworte Bayesian Approaches • Bioinformatics • Cancer Research • classification • gene expression • genes • High-Dimensional Biologic data • Laboratory • microarray • Multivariate Nonparametric Regression • Oncology • Proteomics • Risk Estimation • Tree-based Methods • Vector • Vector Machine Classi¯cation
ISBN-10 0-387-69765-9 / 0387697659
ISBN-13 978-0-387-69765-9 / 9780387697659
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 2,8 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Das Lehrbuch für das Medizinstudium

von Florian Horn

eBook Download (2020)
Georg Thieme Verlag KG
70,99
Das Lehrbuch für das Medizinstudium

von Florian Horn

eBook Download (2020)
Georg Thieme Verlag KG
70,99