Weighted Network Analysis (eBook)
XXIII, 421 Seiten
Springer New York (Verlag)
978-1-4419-8819-5 (ISBN)
High-throughput measurements of gene expression and genetic marker data facilitate systems biologic and systems genetic data analysis strategies. Gene co-expression networks have been used to study a variety of biological systems, bridging the gap from individual genes to biologically or clinically important emergent phenotypes.
Preface 8
Acknowledgements 12
Contents 16
Acronyms 24
Chapter 1: Networks and Fundamental Concepts 26
1.1 Network Adjacency Matrix 26
1.1.1 Connectivity and Related Concepts 27
1.1.2 Social Network Analogy: Affection Network 27
1.2 Analysis Tasks Amenable to Network Methods 28
1.3 Fundamental Network Concepts 29
1.3.1 Matrix and Vector Notation 30
1.3.2 Scaled Connectivity 30
1.3.3 Scale-Free Topology Fitting Index 31
1.3.4 Network Heterogeneity 33
1.3.5 Maximum Adjacency Ratio 33
1.3.6 Network Density 34
1.3.7 Quantiles of the Adjacency Matrix 35
1.3.8 Network Centralization 35
1.3.9 Clustering Coefficient 36
1.3.10 Hub Node Significance 36
1.3.11 Network Significance Measure 37
1.3.12 Centroid Significance and Centroid Conformity 37
1.3.13 Topological Overlap Measure 38
1.3.14 Generalized Topological Overlap for Unweighted Networks 39
1.3.15 Multinode Topological Overlap Measure 41
1.4 Neighborhood Analysis in PPI Networks 43
1.4.1 GTOM Analysis of Fly Protein–Protein Interaction Data 43
1.4.2 MTOM Analysis of Yeast Protein–Protein Interaction Data 45
1.5 Adjacency Function Based on Topological Overlap 46
1.6 R Functions for the Topological Overlap Matrix 46
1.7 Network Modules 47
1.8 Intramodular Network Concepts 49
1.9 Networks Whose Nodes Are Modules 50
1.10 Intermodular Network Concepts 51
1.11 Network Concepts for Comparing Two Networks 52
1.12 R Code for Computing Network Concepts 54
1.13 Exercises 55
References 57
Chapter 2:Approximately Factorizable Networks 60
2.1 Exactly Factorizable Networks 60
2.2 Conformity for a Non-Factorizable Network 61
2.2.1 Algorithm for Computing the Node Conformity 62
2.3 Module-Based and Conformity-Based Approximation of a Network 64
2.4 Exercises 67
References 68
Chapter 3: Different Types of Network Concepts 69
3.1 Network Concept Functions 70
3.2 CF-Based Network Concepts 72
3.3 Approximate CF-Based Network Concepts 73
3.4 Fundamental Network Concepts Versus CF-Based Analogs 74
3.5 CF-Based Concepts Versus Approximate CF-Based Analog 75
3.6 Higher Order Approximations of Fundamental Concepts 76
3.7 Fundamental Concepts Versus Approx. CF-Based Analogs 77
3.8 Relationships Among Fundamental Network Concepts 78
3.8.1 Relationships for the Topological Overlap Matrix 79
3.9 Alternative Expression of the Factorizability F(A) 80
3.10 Approximately Factorizable PPI Modules 80
3.11 Studying Block Diagonal Adjacency Matrices 85
3.12 Approximate CF-Based Intermodular Network Concepts 87
3.13 CF-Based Network Concepts for Comparing Two Networks 88
3.14 Discussion 89
3.15 R Code 91
3.16 Exercises 93
References 98
Chapter 4:Adjacency Functions and Their Topological Effects 100
4.1 Definition of Important Adjacency Functions 100
4.2 Topological Effects of the Power Transformation AFpower 102
4.2.1 Studying the Power AF Using Approx. CF-Based Concepts 103
4.2.2 MAR Is a Nonincreasing Function of ß 103
4.3 Topological Criteria for Choosing AF Parameters 105
4.4 Differential Network Concepts for Choosing AF Parameters 106
4.5 Power AF for Calibrating Weighted Networks 107
4.6 Definition of Threshold-Preserving Adjacency Functions 107
4.7 Equivalence of Network Construction Methods 109
4.8 Exercises 110
References 112
Chapter 5: Correlation and Gene Co-Expression Networks 113
5.1 Relating Two Numeric Vectors 113
5.1.1 Pearson Correlation 115
5.1.2 Robust Alternatives to the Pearson Correlation 116
5.1.3 Biweight Midcorrelation 117
5.1.4 C-Index 118
5.2 Weighted and Unweighted Correlation Networks 119
5.2.1 Social Network Analogy: Affection Network 120
5.3 General Correlation Networks 121
5.4 Gene Co-Expression Networks 123
5.5 Mouse Tissue Gene Expression Data from of an F2 Intercross 125
5.6 Overview of Weighted Gene Co-Expression Network Analysis 130
5.7 Brain Cancer Network Application 132
5.8 R Code for Studying the Effect of Thresholding 134
5.9 Gene Network (Re-)Construction Methods 136
5.10 R Code 137
5.11 Exercises 139
References 140
Chapter 6: Geometric Interpretation of Correlation Networks Using the Singular Value Decomposition 144
6.1 Singular Value Decomposition of a Matrix datX 144
6.1.1 Signal Balancing Based on Right Singular Vectors 145
6.1.2 Eigenvectors, Eigengenes, and Left Singular Vectors 146
6.2 Characterizing Approx. Factorizable Correlation Networks 147
6.3 Eigenvector-Based Network Concepts 150
6.3.1 Relationships Among Density Concepts in Correlation Networks 152
6.4 Eigenvector-Based Approximations of Intermodular Concepts 153
6.5 Networks Whose Nodes are Correlation Modules 155
6.6 Dictionary for Fundamental-Based and Eigenvector-Based Concepts 156
6.7 Geometric Interpretation 157
6.7.1 Interpretation of Eigenvector-Based Concepts 157
6.7.2 Interpretation of a Correlation Network 158
6.7.3 Interpretation of the Factorizability 159
6.8 Network Implications of the Geometric Interpretation 160
6.8.1 Statistical Significance of Network Concepts 161
6.8.2 Intramodular Hubs Cannot be Intermediate Nodes 161
6.8.3 Characterizing Networks Where Hub Nodes Are Significant 161
6.9 Data Analysis Implications of the Geometric Interpretation 162
6.10 Brain Cancer Network Application 164
6.11 Module and Hub Significance in Men, Mice, and Yeast 168
6.12 Summary 171
6.13 R Code for Simulating Gene Expression Data 174
6.14 Exercises 178
References 180
Chapter 7: Constructing Networks from Matrices 182
7.1 Turning a Similarity Matrix into a Network 182
7.2 Turning a Symmetric Matrix into a Network 183
7.3 Turning a General Square Matrix into a Network 184
7.4 Turning a Dissimilarity or Distance into a Network 185
7.5 Networks Based on Distances Between Vectors 186
7.6 Correlation Networks as Distance-Based Networks 187
7.7 Sample Networks for Outlier Detection 188
7.8 KL Dissimilarity Between Positive Definite Matrices 190
7.9 KL Pre-Dissimilarity for Parameter Estimation 191
7.10 Adjacency Function Based on Distance Properties 192
7.11 Constructing Networks from Multiple Similarity Matrices 193
7.11.1 Consensus and Preservation Networks 194
7.12 Exercises 196
References 199
Chapter 8: Clustering Procedures and Module Detection 200
8.1 Cluster Object Scatters Versus Network Densities 200
8.2 Partitioning-Around-Medoids Clustering 202
8.3 k-Means Clustering 203
8.4 Hierarchical Clustering 205
8.5 Cophenetic Distance Based on a Hierarchical Cluster Tree 207
8.6 Defining Clusters from a Hierarchical Cluster Tree: The Dynamictreecut Library for R 209
8.7 Cluster Quality Statistics Based on Network Concepts 213
8.8 Cross-Tabulation-Based Cluster (Module) Preservation Statistics 214
8.9 Rand Index and Similarity Measures Between Two Clusterings 216
8.9.1 Co-Clustering Formulation of the Rand Index 217
8.9.2 R Code for Cross-Tabulation and Co-Clustering 218
8.10 Discussion of Clustering Methods 219
8.11 Exercises 221
References 226
Chapter 9: Evaluating Whether a Module is Preserved in Another Network 228
9.1 Introduction 228
9.2 Module Preservation Statistics 230
9.2.1 Summarizing Preservation Statistics and Threshold Values 233
9.2.2 Module Preservation Statistics for General Networks 234
9.2.3 Module Preservation Statistics for Correlation Networks 235
9.2.3.1 Eigennode-Based Density Preservation Statistics 236
9.2.3.2 Eigennode-Based Connectivity Preservation Statistics 237
9.2.3.3 Module Separability Statistics 238
9.2.4 Assessing Significance of Observed Module Preservation Statistics by Permutation Tests 239
9.2.5 Composite Preservation Statistic Zsummary 239
9.2.5.1 Thresholds for Module Preservation Statistics 240
9.2.6 Composite Preservation Statistic medianRank 241
9.2.6.1 Composite Preservation Statistic ZsummaryADJ for General Networks 241
9.3 Cholesterol Biosynthesis Module Between Mouse Tissues 242
9.4 Human Brain Module Preservation in Chimpanzees 245
9.5 KEGG Pathways Between Human and Chimpanzee Brains 252
9.6 Simulation Studies of Module Preservation 254
9.7 Relationships Among Module Preservation Statistics 260
9.8 Discussion of Module Preservation Statistics 263
9.9 R Code for Studying the Preservation of Modules 265
9.10 Exercises 266
References 266
Chapter 10: Association Measures and Statistical Significance Measures 269
10.1 Different Types of Random Variables 269
10.2 Permutation Tests for Calculating p Values 270
10.3 Computing p Values for Correlations 272
10.4 R Code for Calculating Correlation Test p Values 274
10.5 Multiple Comparison Correction Procedures for p Values 275
10.6 False Discovery Rates and q-values 278
10.7 R Code for Calculating q-values 280
10.8 Multiple Comparison Correction as p Value Transformation 282
10.9 Alternative Approaches for Dealing with Many p Values 285
10.10 R Code for Standard Screening 286
10.11 When Are Two Variable Screening Methods Equivalent? 287
10.12 Threshold-Equivalence of Linear Significance Measures 289
10.13 Network Screening 291
10.14 General Definition of an Association Network 292
10.15 Rank-Equivalence and Threshold-Equivalence 292
10.16 Threshold-Equivalence of Linear Association Networks 293
10.17 Statistical Criteria for Choosing the Threshold 294
10.18 Exercises 294
References 297
Chapter 11: Structural Equation Models and Directed Networks 298
11.1 Testing Causal Models Using Likelihood Ratio Tests 298
11.1.1 Depicting Causal Relationships in a Path Diagram 299
11.1.2 Path Diagram as Set of Structural Equations 301
11.1.3 Deriving Model-Based Predictions of Covariances 302
11.1.4 Maximum Likelihood Estimates of Model Parameters 304
11.1.5 Model Fitting p Value and Likelihood Ratio Tests 306
11.1.6 Model Fitting Chi-Square Statistics and LRT 306
11.2 R Code for Evaluating an SEM Model 308
11.3 Using Causal Anchors for Edge Orienting 313
11.3.1 Single Anchor Local Edge Orienting Score 314
11.3.2 Multi-Anchor LEO Score 316
11.3.3 Thresholds for Local Edge Orienting Scores 318
11.4 Weighted Directed Networks Based on LEO Scores 318
11.5 Systems Genetic Applications 319
11.6 The Network Edge Orienting Method 320
11.6.1 Step 1: Combine Quantitative Traits and SNPs 320
11.6.2 Step 2: Genetic Marker Selection and Assignment to Traits 322
11.6.2.1 Manual SNP Selection 322
11.6.2.2 Automatic SNP Selection 323
11.6.3 Step 3: Compute Local Edge Orienting Scores for Aggregating the Genetic Evidence in Favor of a Causal Orientation 324
11.6.4 Step 4: For Each Edge, Evaluate the Fit of the Underlying Local SEM Models 324
11.6.5 Step 5: Robustness Analysis with Respect to SNP Selection Parameters 324
11.6.6 Step 6: Repeat the Analysis for the Next A–B Trait–Trait Edge and Apply Edge Score Thresholds to Orient the Network 326
11.6.7 NEO Software and Output 326
11.6.8 Screening for Genes that Are Reactive to Insig1 327
11.6.9 Discussion of NEO 327
11.7 Correlation Tests of Causal Models 329
11.8 R Code for LEO Scores 330
11.8.1 R Code for the LEO.SingleAnchor Score 330
11.8.2 R Code for the LEO.CPA 332
11.8.3 R Code for the LEO.OCA Score 334
11.9 Exercises 336
References 337
Chapter 12: Integrated Weighted Correlation Network Analysis of Mouse Liver Gene Expression Data 340
12.1 Constructing a Sample Network for Outlier Detection 340
12.2 Co-Expression Modules in Female Mouse Livers 343
12.2.1 Choosing the Soft Threshold Via Scale-Free Topology 343
12.2.2 Automatic Module Detection Via Dynamic Tree Cutting 345
12.2.3 Blockwise Module Detection for Large Networks 346
12.2.4 Manual, Stepwise Module Detection 347
12.2.5 Relating Modules to Physiological Traits 349
12.2.6 Output File for Gene Ontology Analysis 352
12.3 Systems Genetic Analysis with NEO 353
12.4 Visualizing the Network 356
12.4.1 Connectivity, TOM, and MDS Plots 356
12.4.2 VisANT Plot and Software 358
12.4.3 Cytoscape and Pajek Software 358
12.5 Module Preservation Between Female and Male Mice 359
12.6 Consensus modules Between Female and Male Liver Tissues 363
12.6.1 Relating Consensus Modules to the Traits 364
12.6.2 Manual Consensus Module Analysis 367
12.7 Exercises 369
References 370
Chapter 13: Networks Based on Regression Models and Prediction Methods 371
13.1 Least Squares Regression and MLE 371
13.2 R Commands for Simple Linear Regression 373
13.3 Likelihood Ratio Test for Linear Model Fit 374
13.4 Polynomial and Spline Regression Models 376
13.5 R Commands for Polynomial Regression and Spline Regression 378
13.6 Conditioning on Additional Covariates 381
13.7 Generalized Linear Models 382
13.8 Model Fitting Indices and Accuracy Measures 383
13.9 Networks Based on Predictors and Linear Models 383
13.10 Partial Correlations and Related Networks 384
13.11 R Code for Partial Correlations 386
13.12 Exercises 386
References 390
Chapter 14: Networks Between Categorical or Discretized Numeric Variables 391
14.1 Categorical Variables and Statistical Independence 391
14.2 Entropy 393
14.2.1 Estimating the Density of a Random Variable 394
14.2.2 Entropy of a Discretized Continuous Variable 396
14.3 Association Measures Between Categorical Vectors 397
14.3.1 Association Measures Expressed in Terms of Counts 399
14.3.2 R Code for Relating Categorical Variables 399
14.3.3 Chi-Square Statistic Versus Cor in Case of Binary Variables 400
14.3.4 Conditional Mutual Information 401
14.4 Relationships Between Networks of Categorical Vectors 402
14.5 Networks Based on Mutual Information 403
14.6 Relationship Between Mutual Information and Correlation 405
14.6.1 Applications for Relating MI with Cor 408
14.7 ARACNE Algorithm 409
14.7.1 Generalizing the ARACNE Algorithm 411
14.7.2 Discussion of Mutual Information Networks 412
14.7.3 R Packages for Computing Mutual Information 413
14.8 Exercises 414
References 417
Chapter 15: Network Based on the Joint Probability Distribution of Random Variables 419
15.1 Association Measures Based on Probability Densities 419
15.1.1 Entropy(X) Versus Entropy(Discretize(X)) 421
15.1.2 Kullback–Leibler Divergence for Assessing Model Fit 423
15.1.3 KL Divergence of Multivariate Normal Distributions 424
15.1.4 KL Divergence for Estimating Network Parameters 425
15.2 Partitioning Function for the Joint Probability 426
15.3 Discussion 427
References 428
Index 430
Erscheint lt. Verlag | 30.4.2011 |
---|---|
Zusatzinfo | XXIII, 421 p. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Studium ► 2. Studienabschnitt (Klinik) ► Humangenetik |
Naturwissenschaften ► Biologie ► Genetik / Molekularbiologie | |
Technik | |
Schlagworte | newjc • systems biology |
ISBN-10 | 1-4419-8819-X / 144198819X |
ISBN-13 | 978-1-4419-8819-5 / 9781441988195 |
Haben Sie eine Frage zum Produkt? |
Größe: 9,0 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich