Data Mining for Genomics and Proteomics - Darius M. Dziuda

Blick ins Buch

Data Mining for Genomics and Proteomics (eBook)

Analysis of Gene and Protein Expression Data

Darius M. Dziuda (Autor)

eBook Download: PDF

2010 | 1. Auflage
328 Seiten
John Wiley & Sons (Verlag)
978-0-470-59340-0 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

Data Mining for Genomics and Proteomics uses pragmatic examples and a complete case study to demonstrate step-by-step how biomedical studies can be used to maximize the chance of extracting new and useful biomedical knowledge from data. It is an excellent resource for students and professionals involved with gene or protein expression data in a variety of settings.

Darius M. Dziuda, PhD, is Associate Professor of Data Mining and Statistics in the Department of Mathematical Sciences at Central Connecticut State University (CCSU). His research and professional activities have been focused on efficient data mining of biomedical data and on methods for identification of parsimonious multivariate biomarkers for medical diagnosis, prognosis, personalized medicine, and drug discovery. For CCSU's data mining program, Dr. Dziuda developed and teaches graduate-level courses on Data Mining for Genomics and Proteomics and on Biomarker Discovery.

1. Introduction.

1.1 Basic terminology.

1.2 Overlapping areas of research.

1.2.1 Genomics.

1.2.2 Proteomics.

1.2.3 Bioinformatics.

1.2.4 Transcriptomics and other - omics ....

1.2.5 Data mining.

1. Basic analysis of gene expression microarray data.

2.1 Introduction.

2.2 Microarray technology.

2.3 Low-level preprocessing of Affymetrix microarrays.

2.4 Public repositories of microarray data.

2.5 Gene expression matrix.

2.6 Additional preprocessing, quality assessment and filtering.

2.7 Basic exploratory data analysis.

2.8 Unsupervised learning (taxonomy-related analysis).

2.8.1 Cluster analysis.

2.8.2 Principal component analysis.

2.8.3 Self-organizing maps.

2.9 Exercises.

1. Biomarker Discovery and Classification.

3.1 Overview.

3.2 Feature Selection.

3.2.1 Introduction.

3.2.2 Univariate versus multivariate approaches.

3.2.3 Supervised versus unsupervised methods.

3.2.4 Taxonomy of feature selection methods.

3.2.5 Feature selection for multiclass discrimination.

3.2.6 Regularization and feature selection.

3.2.7 Stability of biomarkers.

3.3 Discriminant Analysis.

3.3.1 Introduction.

3.3.2 Learning Algorithm.

3.3.3 A stepwise hybrid feature selection with T2.

3.4 Support Vector Machines.

3.4.1 Hard-Margin Support Vector Machines.

3.4.2 Soft- Margin Support Vector Machines.

3.4.3 Kernels.

3.4.4 SVMs and multiclass discrimination.

3.4.5 SVMs and Feature Selection: Recursive Feature Elimination.

3.4.6 Summary.

3.5 Random Forests.

3.5.1 Introduction.

3.5.2 Random Forests Learning Algorithm.

3.5.3 Random Forests and Feature Selection.

3.5.5 Summary.

3.6 Ensemble classifiers, bootstrap methods, and the modified bagging schema.

3.6.1 Ensemble classifiers.

3.6.2 Bootstrap methods.

3.6.3 Bootstrap and linear discriminant analysis.

3.6.4 The modified bagging schema.

3.7 Other learning algorithms.

3.7.1 k-Nearest Neighbor classifiers.

3.7.2 Artificial Neural Networks.

3.8 Eight commandments of gene expression analysis (for biomarker discovery).

3.9 Exercises.

1. The Informative Set of Genes.

4.1 Introduction.

4.2 Definitions.

4.3 The method.

4.3.1 Identification of the Informative Set of Genes.

4.3.2 Primary expression patterns of the Informative Set of Genes.

4.3.3 The most frequently used genes of the primary expression patterns.

4.4 Using the Informative Set of Genes to identify robust multivariate biomarkers.

4.5 Summary.

4.6 Exercises.

1. Analysis of protein expression data.

5.1 Introduction.

5.2 Protein chip technology.

5.2.1 Antibody microarrays.

5.2.2 Peptide microarrays.

5.2.3 Protein microarrays.

5.2.4 Reverse phase microarrays.

5.3 Two-dimensional gel electrophoresis.

5.4 MALDI-TOF and SELDI-TOF mass spectrometry.

5.5 Preprocessing of mass spectrometry data.

5.6 Analysis of protein expression data.

5.6.1 Additional preprocessing.

5.6.2 Basic exploratory data analysis.

5.6.3 Unsupervised learning.

5.6.4 Supervised learning - feature selection and biomarker discovery.

5.6.5 Supervised learning - classification systems.

5.7 Associating biomarker peaks with proteins.

5.7.1 Introduction.

5.7.2 The Universal Protein Resource (UniProt).

5.7.3 Search programs.

5.7.4 Tandem mass spectrometry.

5.8 Summary.

1. Sketches for selected exercises.

6.1 Introduction.

6.2 Multiclass discrimination (Exercise 3.2).

6.3 Identifying the Informative Set of Genes (Exercises 4.2 to 4.6).

6.4 Using the Informative set of Genes to identify robust multivariate markers (Exercise 4.8).

6.5 Validating biomarkers on an independent test data set (Exercise 4.8).

6.6 Using a training set that combines more than one data set (Exercises 3.5 and 4.1 to 4.8).

Erscheint lt. Verlag	3.8.2010
Reihe/Serie	Wiley Series on Methods and Applications
Reihe/Serie	Wiley Series on Methods and Applications
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Mathematik / Informatik ► Informatik ► Netzwerke
	Mathematik / Informatik ► Mathematik
	Naturwissenschaften ► Biologie ► Genetik / Molekularbiologie
	Technik ► Umwelttechnik / Biotechnologie
Schlagworte	Biowissenschaften • Computer Science • Database & Data Warehousing Technologies • Data Mining • Data Mining Statistics • Datenbanken u. Data Warehousing • Genomforschung u. Proteomik • genomics • Genomics & Proteomics • Informatik • Life Sciences • Proteomics • Statistics • Statistik
ISBN-10	0-470-59340-7 / 0470593407
ISBN-13	978-0-470-59340-0 / 9780470593400

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

104,54 €