Unsupervised Feature Extraction Applied to Bioinformatics (eBook)
XVIII, 321 Seiten
Springer International Publishing (Verlag)
978-3-030-22456-1 (ISBN)
This book proposes applications of tensor decomposition to unsupervised feature extraction and feature selection. The author posits that although supervised methods including deep learning have become popular, unsupervised methods have their own advantages. He argues that this is the case because unsupervised methods are easy to learn since tensor decomposition is a conventional linear methodology. This book starts from very basic linear algebra and reaches the cutting edge methodologies applied to difficult situations when there are many features (variables) while only small number of samples are available. The author includes advanced descriptions about tensor decomposition including Tucker decomposition using high order singular value decomposition as well as higher order orthogonal iteration, and train tenor decomposition. The author concludes by showing unsupervised methods and their application to a wide range of topics.
- Allows readers to analyze data sets with small samples and many features;
- Provides a fast algorithm, based upon linear algebra, to analyze big data;
- Includes several applications to multi-view data analyses, with a focus on bioinformatics.
Prof. Taguchi is currently a Professor at Department of Physics, Chuo University. Prof. Taguchi received a master degree in Statistical Physics from Tokyo Institute of Technology, Japan in 1986, and PhD degree in Non-linear Physics from Tokyo Institute of Technology, Tokyo, Japan in 1988. He worked at Tokyo Institute of Technology and Chuo University. He is with Chuo University (Tokyo, Japan) since 1997. He currently holds the Professor position at this university. His main research interests are in the area of Bioinformatics, especially, multi-omics data analysis using linear algebra. Dr. Taguchi has published a book on bioinformatics, more than 100 journal papers, book chapters and papers in conference proceedings.
Foreword 7
Preface 9
Acknowledgments 11
Contents 12
Acronyms 16
Part I Mathematical Preparations 18
1 Introduction to Linear Algebra 19
1.1 Introduction 19
1.2 Scalars 19
1.2.1 Scalars 19
1.2.2 Dummy Scalars 20
1.2.3 Generating New Features by Arithmetic 21
1.3 Vectors 21
1.3.1 Vectors 21
1.3.2 Geometrical Interpretation of Vectors: One Dimension 22
1.3.3 Geometrical Interpretation of Vectors: Two Dimensions 23
1.3.4 Geometrical Interpretation of Vectors: Features 25
1.3.5 Generating New Features by Arithmetic 26
1.3.6 Dummy Vectors 26
1.4 Matrices 27
1.4.1 Equivalences to Geometrical Representation 28
1.4.2 Matrix Manipulation and Feature Generation 29
1.5 Tensors 32
1.5.1 Introduction of Tensors 32
1.5.2 Geometrical Representation of Tensors 33
1.5.3 Generating New Features 35
1.5.4 Tensor Algebra 35
Appendix 38
Rank 38
2 Matrix Factorization 39
2.1 Introduction 39
2.2 Matrix Factorization 39
2.2.1 Rank Factorization 40
2.2.2 Singular Value Decomposition 41
2.2.2.1 How to Compute SVD 42
2.2.2.2 Applying SVD to Shop Data 44
2.3 Principal Component Analysis 46
2.4 Equivalence Between PCA and SVD 47
2.5 Geometrical Representation of PCA 49
2.5.1 PCA Selects the Axis with the Maximal Variance 49
2.5.2 PCA Selects the Axis with Minimum Residuals 52
2.5.3 Non-equivalence Between Two PCAs 53
2.6 PCA as a Clustering Method 54
Appendix 59
Proof of Theorem 2.1 59
References 61
3 Tensor Decomposition 62
3.1 Three Principal Realizations of TD 62
3.2 Performance of TDs as Tools Reducing the Degreesof Freedoms 66
3.2.1 Tucker Decomposition 66
3.2.2 CP Decomposition 68
3.2.3 Tensor Train Decomposition 70
3.2.4 TDs Are Not Always Interpretable 71
3.3 Various Algorithms to Compute TDs 72
3.3.1 CP Decomposition 73
3.3.2 Tucker Decomposition 77
3.3.3 Tensor Train Decomposition 80
3.4 Interpretation Using TD 82
3.5 Summary 86
3.5.1 CP Decomposition 87
3.5.1.1 Advantages 87
3.5.1.2 Disadvantages 87
3.5.2 Tucker Decomposition 87
3.5.2.1 Advantages 87
3.5.2.2 Disadvantages 88
3.5.3 Tensor Train Decomposition 88
3.5.3.1 Advantages 88
3.5.3.2 Disadvantages 88
3.5.4 Superiority of Tucker Decomposition 88
Appendix 89
Moore-Penrose Pseudoinverse 89
References 93
Part II Feature Extractions 94
4 PCA Based Unsupervised FE 95
4.1 Introduction: Feature Extraction vs Feature Selection 95
4.2 Various Feature Selection Procedures 96
4.3 PCA Applied to More Complicated Patterns 99
4.4 Identification of Non-sinusoidal Periodicity by PCA Based Unsupervised FE 106
4.5 Null Hypothesis 111
4.6 Feature Selection with Considering P-Values 112
4.7 Stability 115
4.8 Summary 116
Reference 116
5 TD Based Unsupervised FE 117
5.1 TD as a Feature Selection Tool 117
5.2 Comparisons with Other TDs 121
5.3 Generation of a Tensor From Matrices 123
5.4 Reduction of Number of Dimensions of Tensors 124
5.5 Identification of Correlated Features Using Type I Tensor 125
5.6 Identification of Correlated Features Using Type II Tensor 128
5.7 Summary 129
Reference 130
Part III Applications to Bioinformatics 131
6 Applications of PCA Based Unsupervised FE to Bioinformatics 132
6.1 Introduction 132
6.2 Some Introduction to Genomic Science 132
6.2.1 Central Dogma 133
6.2.2 Regulation of Transcription 133
6.2.3 The Technologies to Measure the Amount of Transcript 134
6.2.4 Various Factors that Regulate the Amount of Transcript 134
6.2.5 Other Factors to Be Considered 135
6.3 Biomarker Identification 136
6.3.1 Biomarker Identification Using Circulating miRNA 136
6.3.1.1 Biomarker Identification Using Serum miRNA 136
6.3.2 Circulating miRNAs as Universal Disease Biomarker 148
6.3.3 Biomarker Identification Using Exosomal miRNAs 150
6.4 Integrated Analysis of mRNA and miRNA Expression 158
6.4.1 Understanding Soldier's Heart From the mRNA and miRNA 158
6.4.2 Identifications of Interactions Between miRNAs and mRNAs in Multiple Cancers 171
6.5 Integrated Analysis of Methylation and Gene Expression 175
6.5.1 Aberrant Promoter Methylation and Expression Associated with Metastasis 176
6.5.2 Epigenetic Therapy Target Identification Based upon Gene Expression and Methylation Profile 180
6.5.3 Identification of Genes Mediating Transgenerational Epigenetics Based upon Integrated Analysis of mRNA Expression and Promoter Methylation 191
6.6 Time Development Analysis 195
6.6.1 Identification of Cell Division Cycle Genes 198
6.6.2 Identification of Disease Driving Genes 207
6.7 Gene Selection for Single Cell RNA-seq 215
6.8 Summary 219
References 220
7 Application of TD Based Unsupervised FE to Bioinformatics 225
7.1 Introduction 225
7.2 PTSD Mediated Heart Diseases 225
7.3 Drug Discovery From Gene Expression 231
7.4 Universarity of miRNA Transfection 239
7.5 One-Class Differential Expression Analysis for Multiomics Data Set 243
7.6 General Examples of Case I and II Tensors 251
7.6.1 Integrated Analysis of mRNA and miRNA 251
7.6.2 Temporally Differentially Expressed Genes 259
7.7 Gene Expression and Methylation in Social Insects 264
7.8 Drug Discovery From Gene Expression: II 269
7.9 Integrated Analysis of miRNA Expression and Methylation 273
7.10 Summary 279
Appendix 280
Universarity of miRNA Transfection 280
Study 1 280
Study 2 282
Study 3 283
Study 4 284
Study 5 284
Study 6 284
Study 7 287
Study 8 288
Study 9 289
Study 10 290
Study 11 291
Drug Discovery From Gene Expression: II 292
Heart Failure 292
PTSD 293
ALL 296
Diabetes 299
Renal Carcinoma 300
Cirrhosis 304
References 306
A Various Implementations of TD 309
A.1 Introduction 309
A.2 R 309
A.2.1 rTensor 309
A.2.2 ttTensor 310
A.3 Python 310
A.3.1 HOTTBOX 310
A.4 MATLAB 310
A.4.1 Tensor Toolbox 310
A.5 julia 311
A.5.1 TensorDecompositions.jl 311
A.6 TensorFlow 311
A.6.1 t3f 311
B List of Published Papers Related to the Methods 312
References 312
Glossary 315
Solutions 316
Problems of Chap.1 316
Problems of Chap.2 320
Problems of Chap.3 322
Index 327
Erscheint lt. Verlag | 23.8.2019 |
---|---|
Reihe/Serie | Unsupervised and Semi-Supervised Learning | Unsupervised and Semi-Supervised Learning |
Zusatzinfo | XVIII, 321 p. 111 illus., 94 illus. in color. |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Naturwissenschaften ► Biologie | |
Technik ► Elektrotechnik / Energietechnik | |
Schlagworte | Bioinformatics problems • matrix factorization • PCA based unsupervised FE • PCA/TD based unsupervised FE • TD based unsupervised FE • Tensor decompositions |
ISBN-10 | 3-030-22456-2 / 3030224562 |
ISBN-13 | 978-3-030-22456-1 / 9783030224561 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,5 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich