Data Analysis and Classification (eBook)
XXII, 482 Seiten
Springer Berlin (Verlag)
978-3-642-03739-9 (ISBN)
Preface 5
List of Referees 8
Contents 9
Contributors 15
Part I Key-note 21
Clustering of High-Dimensional and Correlated Data 22
1 Introduction 22
2 Definition of Mixture Models 23
3 Maximum Likelihood Estimation 24
4 Choice of Starting Values for the EM Algorithm 24
5 Clustering via Normal Mixtures 25
6 Factor Analysis Model for Dimension Reduction 26
7 Some Recent Extensions for High-Dimensional Data 27
8 Mixtures of Normal Components with Random Effects 28
References 30
Statistical Methods for Cryptography 31
1 Introduction 31
1.1 Different disciplines in cryptography 32
2 Prime Numbers 33
2.1 Tests of primality 33
2.2 Deterministic tests 34
2.3 Some deterministic tests 35
3 The Sum Modulo m of Statistical Variables 36
References 39
Part II Cluster Analysis 40
An Algorithm for Earthquakes Clustering Based on MaximumLikelihood 41
1 Introduction 41
2 Conditional Intensity Function of the Clustering Procedure 42
2.1 The ETAS Model 43
2.2 Intensity Function for a Particular ClusteredInhomogeneous Poisson Process 44
3 The Proposed Clustering Method 45
3.1 Finding a Candidate Cluster and Likelihood Changes 45
3.2 The Algorithm of Clustering 46
4 Application to a Real Catalog and Final Remarks 47
References 48
A Two-Step Iterative Procedure for Clustering of Binary Sequences 49
1 Introduction 49
2 Clustering and Dimensionality Reduction 50
3 Example 53
References 56
Clustering Linear Models Using Wasserstein Distance 57
1 Introduction 57
2 Input Data and the Clustering Algorithm 58
3 Dynamic Clustering of Linear Models 59
3.1 Wasserstein Distance for Distributions 60
3.2 Representation and Allocation Functions 61
4 An Application Using Bank of Italy Household Survey Data 62
5 Conclusions and Perspectives 64
References 64
Comparing Approaches for Clustering Mixed Mode Data: An Application in Marketing Research 65
1 Introduction 65
2 Obtaining Partitions with Mixed Mode Data 66
3 Some Illustrative Applications 67
4 Discussion 72
References 72
The Progressive Single Linkage Algorithm Based on Minkowski Ultrametrics 74
1 Introduction 74
2 Ultrametric Approximations 75
3 The Progressive Single Linkage Algorithm 76
4 Some Applications to Real Data 77
5 Conclusions 79
Appendix 80
References 81
Visualization of Model-Based Clustering Structures 82
1 Introduction 82
2 Dimension Reduction for Model-Based Clustering 83
3 Visualization of Clustering Structures 84
4 Examples 86
4.1 Overlapping Clusters with Unconstrained Covariances 86
4.2 High Dimensional Mixture of Two Normal Distributions 87
4.3 Wisconsin Diagnostic Breast Cancer Data 87
5 Comments and Extensions 89
References 89
Part III Multidimensional Scaling 91
Models for Asymmetry in Proximity Data 92
1 Introduction 92
2 A Class of Scalar Product Models 92
3 The Analysis of Skew-Symmetry with External Information 94
4 Conclusions 96
References 97
Intimate Femicide in Italy: A Model to Classify How KillingsHappened 98
1 Introduction 98
2 National and International Scenario 99
3 Data and Method 101
4 The Main Results 102
References 104
Two-Dimensional Centrality of Asymmetric Social Network 105
1 Introduction 105
2 The Procedure 106
3 The Data 107
4 The Analysis 107
5 Result 108
6 Discussion 109
References 111
The Forward Search for Classical Multidimensional Scaling When the Starting Data Matrix Is Known 113
1 Introduction 113
2 Classical Multidimensional Scaling and the Forward Search 114
3 A Case Study: Linosa Dataset 116
4 Conclusions 120
References 121
Part IV Multivariate Analysis and Application 122
Discriminant Analysis on Mixed Predictors 123
1 Introduction 123
2 Mixed Discriminant Predictors 124
3 Application Example 126
3.1 Predictor Analysis 126
3.2 Discriminant Analysis 128
3.3 Comparison 128
4 Conclusion 130
References 130
A Statistical Calibration Model for Affymetrix Probe Level Data 131
1 Introduction 131
2 A Brief Review on preprocessing Methods 132
3 The Proposed Calibration Method 133
4 A Comparison with the Most Popular Methods 135
5 Conclusions 137
References 138
A Proposal to Fuzzify Categorical Variables in Operational Risk Management 139
1 Fuzzy Approach 139
2 The Problem 140
3 The Proposal 141
4 Results 143
5 Conclusions 144
References 145
Common Optimal Scaling for Customer Satisfaction Models: A Point to Cobb–Douglas' Form 146
1 Features of a Customer Satisfaction model 146
2 Categorical Regression with Common Optimal Scaling 147
2.1 The Pattern of the Model 148
2.1.1 The Algorithm of the Parameters Estimation 149
3 Multiplicative Models for CS 149
3.1 Some Observations 150
4 A Theory About Overall CS 151
5 Conclusions 152
References 152
Structural Neural Networks for Modeling Customer Satisfaction 154
1 Introduction 154
2 Customer Satisfaction and PLS Path Modeling 155
3 Customer Satisfaction and Neural Networks 156
4 A Structural Neural Network for Modeling CS 157
5 Concluding Remarks 161
References 161
Dimensionality of Scores Obtained witha Paired-Comparison Tournament Systemof Questionnaire Items 163
1 Preference Data Collection 163
2 Preference Data 164
3 Scoring Algorithms 165
4 Dimensions in Preference Data 166
5 Conclusions 170
References 170
Using Rasch Measurement to Assess the Roleof the Traditional Family in Italy 171
1 Introduction 171
2 Data and Descriptive Analysis 172
3 Method 173
4 Discussion 175
References 177
Preserving the Clustering Structure by a Projection PursuitApproach 178
1 Introduction 178
2 Projection Pursuit for Preserving the Clustering Structure 180
2.1 The Critical Bandwidth to Test Multimodality 180
2.2 Projection Pursuit Using the Adjusted Critical Bandwidth 182
3 Numerical Results 182
3.1 A Simulation Study 182
3.2 Real Data Applications 183
4 Concluding Remarks 185
References 185
Association Rule Mining of Multimedia Content 186
1 Introduction 186
2 Syntactic Analysis of Video Data 187
3 Semantic Analysis 189
4 Association Rules 190
5 Synergy Effects 191
References 192
Part V Classification and Classification Tree 194
Automatic Dictionary- and Rule-Based Systems for Extracting Information from Text 195
1 Introduction 195
2 A Model for Creating a Meta-dictionary by Means of a Hybrid System 197
3 Application to the Istat TUS Survey 199
References 204
Several Computational Studies About Variable Selection for Probabilistic Bayesian Classifiers 205
1 Introduction 205
2 Bayesian Networks and Classification 206
2.1 Learning Bayesian Networks 207
2.2 Classifiers Based on Bayesian Networks 207
3 Feature Subset Selection for Classification 207
4 Experimental Results and Conclusions 208
References 212
Semantic Classification and Co-occurrences: A Methodfor the Rules Production for the Information Extractionfrom Textual Data 214
1 Introduction 214
2 The Analyzed Corpus and Semantic Classification 215
3 Rules Production Using Co-occurrences and Collocations 215
4 Future Developments and Improvements 220
References 221
The Effectiveness of University Education: A Structural Equation Model 222
1 Introduction 222
2 The Model 223
3 Main Results 226
4 Conclusions 228
References 229
Simultaneous Threshold Interaction Detection in BinaryClassification 230
1 Introduction 230
2 Modeling Interaction Effects in Regression Analysis 231
3 The Trunk Model 233
4 Empirical Evidence 234
5 Concluding Remarks 236
References 237
Detecting Subset of Classifiers for Multi-attribute ResponsePrediction 238
1 Introduction 238
2 The SASSC Algorithm 239
3 Analyzing the Letter Recognition Dataset 242
4 Concluding Remarks 244
References 245
Clustering Textual Data by Latent Dirichlet Allocation: Applications and Extensions to Hierarchical Data 246
1 Introduction 246
2 The LDA Model 247
3 Considering Documents Structure: An Extension of the LDA Model 249
4 Generalizing the Prior Enrichment: Collapsed Variational Bayes 251
5 Discussion 253
References 253
Multilevel Latent Class Models for Evaluation of Long-term Care Facilities 254
1 Introduction and Data 254
2 The Model 255
3 The Results 257
4 Conclusions 261
References 261
Author–Coauthor Social Networks and Emerging ScientificSubfields 262
1 Introduction 262
2 Distribution of Tie Strength 264
3 Distribution of Clique Size 266
4 Random Graph Model for Preferential Attachment 267
5 The Emergence of Scientific Subfields 268
6 The Network of Well-Established Scholars 270
7 Conclusions 272
References 273
Part VI Statistical Models 274
A Hierarchical Model for Time Dependent Multivariate Longitudinal Data 275
1 Introduction 275
2 Model-Based Approach to Three-Way Data Clustering 276
3 Multivariate Hidden Markov Model for Three-Way Data Clustering 277
4 Computational Details 279
5 Simulation Results 280
6 Conclusion 282
References 283
Covariate Error Bias Effects in Dynamic Regression Model Estimation and Improvement in the Prediction by Covariate Local Clusters 284
1 Introduction 284
2 Bias Effects in Dynamic Regression Models with Errors in the Covariate 285
3 A Local Cluster Kalman Filter 286
4 Simulation Experiments 287
5 An Application 289
6 Conclusions 291
References 291
Local Multilevel Modeling for Comparisons of InstitutionalPerformance 292
1 Introduction 292
2 Capturing Local Behaviour 293
2.1 Mixture Modeling 293
2.2 Cluster-Weighted Modeling 294
3 The Proposal: Local Multilevel Modeling 295
4 An Example 296
5 Conclusions and Further Research 298
References 299
Modelling Network Data: An Introduction to Exponential Random Graph Models 300
1 Introduction 300
2 Exponential Random Graph Models for Social Networks 301
3 The Collaboration Network of Italian Scholars on Population Studies 303
4 Model Estimation Results 305
5 Concluding Remarks 307
References 307
Part VII Latent Variables 309
An Analysis of Earthquakes Clustering Based on a Second-Order Diagnostic Approach 310
1 Introduction 310
2 Second-Order Residual Analysis 311
2.1 The Weighted Process and Its Second-Order Properties 311
2.1.1 The Weighted Spectrum 312
2.1.2 The Weighted Correlation Integral 312
3 Space–Time ETAS Model 313
4 Nonparametric Estimation and Diagnostics 314
5 Conclusion 317
References 317
Latent Regression in Rasch Framework 319
1 Introduction 319
2 Longitudinal Latent Regression Model 320
3 Latent Regression Rasch Model with Missing Data 322
4 Conclusion 325
References 326
A Multilevel Latent Variable Model for Multidimensional Longitudinal Data 328
1 Introduction 328
2 Model for Continuous Responses 329
2.1 Model Specification 329
2.2 Estimation 331
2.3 Application to a Real Data Set 333
3 Conclusion 335
References 335
Turning Point Detection Using Markov Switching Models with Latent Information 336
1 Introduction 336
2 The Generalized Hamilton Model with Latent Information 337
3 Identifying and Forecasting USA and Japanese Turning Points 339
References 343
Part VIII Knowledge Extraction from Temporal Data 344
Statistical and Numerical Algorithms for Time SeriesClassification 345
1 Introduction 345
2 A Simulation Experiment 348
3 Application to Real Data 350
4 Concluding Remarks 351
References 351
Mining Time Series Data: A Selective Survey 353
1 Introduction 353
2 Comparing Time Series Shape 354
3 Criteria Based on Fourier and Wavelet Analysis 355
4 Structural Dissimilarity 357
5 Final Remarks 358
References 359
Predictive Dynamic Models for SMEs 361
1 Introduction 361
2 Default Estimation: A Methodological Proposal 362
3 Data Sources 364
4 Application 364
5 Conclusion 365
References 366
Clustering Algorithms for Large Temporal Data Sets 367
1 The Framework: Temporal Data Mining 367
2 Temporal Cluster Analysis 368
3 Clustering Algorithms: Applicability to Large Temporal Datasets 370
4 An Example of Application on a Radar Satellite Data Base 372
5 Concluding Remarks 373
References 374
Part IX Outlier Detection and Robust Methods 376
Robust Clustering for Performance Evaluation 377
1 Introduction 377
2 Mahalanobis Distances and the Forward Search 378
3 Example 379
References 385
Outliers Detection Strategy for a Curve Clustering Algorithm 387
1 Introduction 387
2 Dynamical Curves Clustering with Free knots Spline Estimation 388
3 An Improvement of DCC& FSE Algorithm
4 The Outliers Selection Process 390
5 Main Results 391
6 Conclusion and Future Work 394
References 394
Robust Fuzzy Classification 395
1 Introduction 395
2 Robust Fuzzy Cluster Analysis 396
3 Confirmatory Analysis 398
References 402
Weighted Likelihood Inference for a Mixed Regressive Spatial Autoregressive Model 403
1 Introduction 403
2 A Weighted Likelihood Approach 404
3 A Small Simulation Study 406
4 A Real Example 408
5 Final Remarks 409
References 410
Detecting Price Outliers in European Trade Data with the Forward Search 411
1 Introduction 411
2 Application Context, Data and Statistical Patterns 412
3 Application of the Forward Search 413
4 Heuristic Comparison with the ``Backward'' Outliers 416
5 Towards an Automatic Procedure 416
6 Discussion and Main Conclusions 418
References 418
Part X Statistical Methods for Financial and Economics Data 420
Comparing Continuous Treatment Matching Methods in PolicyEvaluation 421
1 Introduction 421
2 Simulating a Subsidies Allocation Mechanism:The Case of L.488 422
3 The Matching Methods in the Continuous Framework 423
4 The Monte Carlo Experiment 424
5 Results and Conclusions 426
References 428
Temporal Aggregation and Closure of VARMA Models: Some New Results 429
1 Introduction 429
2 Temporal Aggregation 430
3 Temporal Aggregation and VARMA Models 431
4 ``Markovian" Representation 433
References 436
An Index for Ranking Financial Portfolios According to Internal Turnover 438
1 Introduction 438
2 Style Analysis Models 439
3 Ranking Portfolios According to Internal Turnover 441
4 Concluding Remarks 444
References 445
Bayesian Hidden Markov Modelsfor Financial Data 446
1 Introduction 446
2 The Model 447
3 Computational Implementation 448
4 Bayesian Inference and Forecasting 450
5 An Application to Financial Data 451
6 Conclusions 453
References 453
Part XI Missing Values 455
Regression Imputation for Space-Time Datasets with MissingValues 456
1 Introduction 456
2 A New Regression Single Imputation Method 458
3 Missing Data Simulation 459
4 Results 461
5 Toward a Multiple Imputation Method 461
References 463
A Multiple Imputation Approach in a Survey on University Teaching Evaluation 464
1 Introduction 464
2 An Imputation Procedure to Recover for Missingness 465
3 An Application to the Data on the Evaluationof University Teaching 466
3.1 AD 468
3.2 AE 468
4 Some Final Remarks 472
References 472
Erscheint lt. Verlag | 14.3.2010 |
---|---|
Reihe/Serie | Studies in Classification, Data Analysis, and Knowledge Organization | Studies in Classification, Data Analysis, and Knowledge Organization |
Zusatzinfo | XXII, 482 p. 109 illus. |
Verlagsort | Berlin |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Technik | |
Wirtschaft ► Allgemeines / Lexika | |
Schlagworte | classification • cluster analysis • Clustering • Complex Data Structures • Data Analysis • ITEM • Latent variable model • measure • Multidimensional Scaling • Multimensional Data Analysis • Projection Pursuit • Statistical Models • Time Series |
ISBN-10 | 3-642-03739-9 / 3642037399 |
ISBN-13 | 978-3-642-03739-9 / 9783642037399 |
Haben Sie eine Frage zum Produkt? |
Größe: 11,3 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich