Classification as a Tool for Research (eBook)
XXXVI, 823 Seiten
Springer Berlin (Verlag)
978-3-642-10745-0 (ISBN)
Clustering and Classification, Data Analysis, Data Handling and Business Intelligence are research areas at the intersection of statistics, mathematics, computer science and artificial intelligence. They cover general methods and techniques that can be applied to a vast set of applications such as in business and economics, marketing and finance, engineering, linguistics, archaeology, musicology, biology and medical science. This volume contains the revised versions of selected papers presented during the 11th Biennial IFCS Conference and 33rd Annual Conference of the German Classification Society (Gesellschaft für Klassifikation - GfKl). The conference was organized in cooperation with the International Federation of Classification Societies (IFCS), and was hosted by Dresden University of Technology, Germany, in March 2009.
Preface
6
Contents
14
Contributors
24
Part I (Semi-) Plenary Presentations 38
Hierarchical Clustering with Performance Guarantees 39
1 Introduction 39
2 A Replacement for k-d Trees 40
2.1 The Curse of Dimension for Spatial Data Structures 40
2.2 Low Dimensional Manifolds and Intrinsic Dimension 42
2.3 Random Projection Trees 43
3 A Replacement for Complete Linkage 44
3.1 An Existence Problem for Hierarchical Clustering 44
3.2 Approximation Algorithms for Clustering 45
3.3 Farthest-First Traversal 46
3.4 A Hierarchical Clustering Algorithm 47
References 50
Alignment Free String Distances for Phylogeny 51
1 Introduction 51
2 Four Alignment Free Distances 52
2.1 The MSM Distance 52
2.2 The k-word distance 53
2.3 The ACS distance 54
2.4 A Compression Distance 55
3 Simulations 56
3.1 A Simple Evolutionary Model 56
3.2 The Simulation Process 57
3.3 Simulation Results 57
4 Conclusion 59
References 59
Data Quality Dependent Decision Making in Pattern Classification 61
1 Introduction 61
2 Theoretical Framework 63
2.1 Problem Formulation 63
2.2 Quality-Based Fusion 66
2.3 Data Quality Assessment 68
3 An Illustration of the Benefits of the Quality Based Fusion 69
4 Conclusions 71
References 71
Clustering Proteins and Reconstructing Evolutionary Events 73
1 Introduction: Clustering and Knowledge Feedback 73
2 Clustering Using the Data Recovery Approach 75
2.1 Additive Clustering and One-by-One Iterative Extraction 75
2.2 One Cluster Clustering 76
2.2.1 Pre-specified Intensity 76
2.2.2 Optimal Intensity 77
3 Proteome Knowledge in Determining Similarity Shift 77
3.1 Protein Families and Evolutionary Tree 77
3.2 Utilizing Knowledge of Proteome 79
4 Advancing Genome Knowledge 82
4.1 Reconstructed Histories of HPFs 82
4.2 Derived Ancestors of Herpes Proteins 82
5 Conclusion 83
References 84
Microarray Dimension Reduction Based on Maximizing Mantel Correlation Coefficients Using a Genetic Algorithm Search Strategy 85
1 Introduction 85
2 Methods 87
3 Results 90
4 Discussion 95
References 95
Part II Classification and Data Analysis 97
Multiparameter Hierarchical Clustering Methods 98
1 Introduction 98
2 Notation and Terminology 101
3 Two Parameter Hierarchical Clustering: A Characterization Theorem 101
4 Metric Stability of C 104
References 105
Unsupervised Sparsification of Similarity Graphs 106
1 Introduction and Related Work 106
2 Sparsification 108
2.1 Existing Approaches 109
2.2 An Object-specific, Unsupervised Approach to Sparsification 110
3 Evaluation 111
4 Conclusion 113
References 113
Simultaneous Clustering and Dimensionality ReductionUsing Variational Bayesian Mixture Model 115
1 Introduction 115
2 Exponential Family and e-PCA 116
3 Constrained Mixture Model 117
4 Variational Bayes Method 118
4.1 Optimal q2() for Fixed q1(Zn) 119
4.2 Optimal q1(Zn) for Fixed q2() 119
4.3 Laplace Approximation 120
5 Dimensionality Reduction 120
6 Experiments 121
7 Discussion and Conclusion 122
References 122
A Partitioning Method for the Clustering of Categorical Variables 124
1 Introduction 124
2 A Center-Based Partitioning Method for the Clustering of Categorical Variables 125
2.1 Definition of the Latent Variable 126
2.2 The Center-Based Clustering Algorithm 127
3 Applications 128
3.1 Simulation Study 128
3.2 Real Data Application 129
4 Concluding Remarks 131
References 132
Treed Gaussian Process Models for Classification 133
1 Introduction and Background 133
1.1 Gaussian Processes for Regression and Classification 134
2 Treed Gaussian Processes 135
2.1 TGP for Regression 135
2.2 TGP for Classification 136
3 Illustrations and Empirical Results 138
3.1 2d Exponential Data 138
3.2 Classification TGP on Real Data 139
4 Conclusion 140
References 140
Ridgeline Plot and Clusterwise Stability as Tools for Merging Gaussian Mixture Components 141
1 Introduction 141
2 The Ridgeline Method 143
3 A Method Based on Misclassification Probabilities 144
4 Bootstrap Stability Assessment 145
5 Real Data Example: Clustering Melody Contours 146
6 Conclusion 148
References 148
Clustering with Confidence: A Low-Dimensional Binning Approach 149
1 Introduction 149
2 Cluster Trees: Piecewise Constant Density Estimates 150
3 Clustering with Confidence 152
3.1 Bootstrap Confidence Sets for Level Sets 153
3.2 Constructing the Cluster Tree 153
4 Example: ``Automatic Gating'' in Flow Cytometry 155
5 Summary and Future Work 156
References 156
Local Classification of Discrete Variables by Latent Class Models 158
1 Introduction 158
2 Mixtures Versus Common Components 159
3 Latent Class Analysis 159
3.1 Estimation 160
3.2 Model Selection 161
4 Local Classification of Discrete Variables 161
4.1 Class Conditional Mixtures 161
4.2 Common Components 162
4.2.1 Classification Capability 163
5 Application 163
5.1 Simulation Study 163
5.2 SNP Data 165
6 Conclusion 165
References 165
A Comparative Study on Discrete Discriminant Analysis through a Hierarchical Coupling Approach 167
1 Introduction 167
2 Combining Models in Biclass Problems 168
3 The Hierarchical Coupling Model (HIERM) 169
4 Comparison of the HIERM Model with Other Models, Using Similarity Coefficients for Binary Data 170
4.1 Similarity Coefficients for Binary Data 170
5 Numerical Experiments 170
6 Conclusions 174
References 175
A Comparative Study of Several Parametricand Semiparametric Approaches for Time Series Classification 176
1 Introduction 176
2 Some Dissimilarity Measures Between Time Series 177
3 Simulation Study 179
3.1 Classification of Time Series as Stationary or Non-Stationary 179
3.2 Clustering of ARMA Time Series 180
3.3 Clustering of Non-Linear Time Series 182
4 Concluding Remarks 183
References 183
Finite Dimensional Representation of Functional Data with Applications 185
1 Introduction 185
2 Representing Functional Data in a Reproducing Kernel Hilbert Space 186
2.1 Functional Data Projections onto the Eigenfunctions Space 187
3 Experiments 189
3.1 RKHS Projections Versus PCA Projections 189
3.2 Classification Example 191
4 Conclusions 192
References 192
Clustering Spatio-Functional Data: A Model Based Approach 194
1 Introduction and Problematic 194
2 The Spatio-Functional Data 195
3 Dynamic Clustering Algorithm 196
4 Dynamic Clustering for Spatio-Functional Data 196
5 Analysis of a Real Dataset: Sea Temperatureof the Italian Coast 198
6 Conclusion and Future Research 200
References 202
Use of Mixture Models in Multiple Hypothesis Testingwith Applications in Bioinformatics 203
1 Introduction 203
2 Modelling of Z-Scores 205
3 Example: Breast Cancer Data 207
4 Empirical Null 207
5 Simulation Study 208
References 209
Finding Groups in Ordinal Data: An Examinationof Some Clustering Procedures 211
1 Introduction 211
2 Clustering Procedures for Ordinal Data 212
3 Simulation Experiment Characteristics 213
4 Discussion on Simulation Results 215
5 Limitations 217
References 218
An Application of One-mode Three-way Overlapping Cluster Analysis 219
1 Introduction 219
2 Overlapping Cluster Analysis Models 220
3 Improvement of the Algorithm 221
4 An Application 222
5 Discussion and Conclusion 225
References 226
Evaluation of Clustering Results: The Trade-off Bias-Variability 227
1 Desirable Properties of a Clustering Solution 227
2 Evaluating Stability 228
2.1 Indices of Agreement Between Partitions 228
2.2 Cross-validation 229
3 The Proposed Approach 229
4 Data Analysis 230
5 Discussion and Perspectives 232
References 233
Cluster Structured Multivariate Probability Distributionwith Uniform Marginals 235
1 Introduction 235
2 The 3-Dimensional Case 235
2.1 Description of Proposed Probability Distribution 236
2.1.1 Marginal Distributions 236
2.1.2 Raw and Central Moments 237
2.1.3 The Characteristic Function 239
3 Cluster Structured 3-Dimensional Distribution with Uniform Marginals 239
4 The n-Dimensional Case 241
Reference 242
Analysis of Diversity-Accuracy Relations in Cluster Ensemble 243
1 Introduction 243
2 Diversity Measures 244
3 Numerical Experiments and Results 245
4 Summary 248
References 250
Linear Discriminant Analysis with more Variablesthan Observations: A not so Naive Approach 252
1 Introduction 252
2 A Not so Naive Linear Discriminant Rule 253
3 Asymptotic Properties 255
4 Simulation Study 257
5 Conclusions and Perspectives 259
References 259
Fast Hierarchical Clustering from the Baire Distance 260
1 Introduction 260
2 Longest Common Prefix or Baire Distance 261
2.1 Ultrametric Baire Space and Distance 261
3 Application to Chemoinformatics 261
3.1 Dimensionality Reduction by Random Projection 262
3.2 Chemoinformatics Data Clustering 262
4 Application to Astronomy 263
4.1 Clustering SDSS Data Based on a Baire Distance 263
4.2 Baire and K-means Cluster Comparison 265
5 Conclusions 266
References 268
The Trend Vector Model: Identification and Estimation in SAS 269
1 Introduction 269
2 Example 271
3 Identification Problems 272
3.1 De Rooij's Solution 273
3.2 Simpler Solution 273
4 Estimation with SAS proc nlmixed 274
5 Conclusion 275
References 276
Discrete Beta-Type Models 277
1 Introduction 277
2 A Re-parameterized Discrete Beta Distribution 278
3 Smoothing by Discrete Beta Kernels 280
3.1 Choosing the Smoothing Parameter h 281
4 Application to a Real Data Set 282
5 Concluding Remarks 284
References 284
The R Package DAKS: Basic Functions and Complex Algorithms in Knowledge Space Theory 286
1 Introduction 286
2 Basics of KST and IITA 287
3 The R Package DAKS 289
4 Conclusion 293
References 293
Methods for the Analysis of Skew-Symmetry in Asymmetric Multidimensional Scaling 294
1 Introduction 294
2 Scalar Product-like Models (Two-Way Case) 295
3 Scalar Product-like Models (Three-Way Case) 298
4 Distance-like Models 299
5 Conclusions 300
References 300
Canonical Correspondence Analysis in Social Science Research 302
1 Introduction 302
2 Canonical Correspondence Analysis 303
3 Constraining by a Single Categorical Variable 304
4 Constraints for Dealing with Missing Responses 307
5 Discussion 309
References 309
Exploring Data Through Archetypes 310
1 Introduction 310
2 Elements of Archetypal Analysis 311
3 Elements of Spreadplot Design 314
4 The Proposed Exploratory Data Analysis Strategy 314
4.1 Deriving and Analyzing Archetypes by Varying m 315
4.2 Representing Data in the Spaces Spanned by the Archetypes 317
4.3 Exploring the Peripheries of the Data Scatter 318
5 Concluding Remarks 319
References 320
Exploring Sensitive Topics: Sensitivity, Jeopardy, and Cheating 322
1 Introduction 322
2 Randomized Response 323
3 Exploring Sensitivity 324
4 The Sensitivity Level 326
5 Conclusions 327
References 328
Sampling the Join of Streams 329
1 Introduction 329
2 General Framework 330
3 Four Algorithms for Sampling the Join of Streams 331
3.1 Reservoir Sampling 331
3.2 Weighted Reservoir Sampling 331
3.3 Deterministic Reservoir Sampling 332
3.4 Active Reservoir Sampling 334
4 Experimental Results 334
5 Conclusion and Future Works 335
References 336
The R Package fechner for Fechnerian Scaling 337
1 Introduction 337
2 Theory of FS 338
3 The R Package fechner 340
4 Example 341
5 Conclusion 344
References 344
Asymptotic Behaviour in Symbolic Markov Chains 345
1 Introduction 345
2 Symbolic Variables 346
3 Markov Chains 346
3.1 The Markov Property 346
4 The CK Property for Symbolic Stochastic Processes 347
4.1 Multivalued Categorical Variable 347
4.2 Interval Valued Variable 347
5 Stationary Distribution in Discrete Time 348
5.1 Single Categorical Variable 348
5.2 Multivalued Categorical Variable 349
5.3 Single Valued Quantitative Variable (Continuous Variable) 349
5.4 Particular Case: The Random Walk 349
5.5 Interval Valued Variable 350
5.6 Particular Case 350
5.7 Random Walk in Discrete Time 351
5.8 Non Independent Random Walk 351
6 Conclusion 352
6.1 Future Work 352
References 352
An Interactive Graphical System for Visualizing Data Quality–Tableplot Graphics 353
1 Introduction 353
2 Visualization Design 355
3 Interactivity in Tableplot 356
4 Tableplot for Visualizing Data Quality 356
5 Comparison with Other Plots 359
6 Software 360
7 Conclusion 361
References 361
Symbolic Multidimensional Scaling Versus Noisy Variables and Outliers 362
1 Introduction 362
2 Symbolic Data 363
3 Symbolic Multidimensional Scaling Methods 364
4 The Models 366
5 Results of Simulations 367
6 Final Remarks 368
References 369
Principal Components Analysis for Trapezoidal Fuzzy Numbers 371
1 Introduction 372
2 The Method: PCA-TF 372
3 Tests of the Performance of the PCA-TF Method 377
4 Conclusions 380
References 380
Factor Selection in Observational Studies – An Applicationof Nonlinear Factor Selection to Propensity Scores 381
1 Introduction 381
2 Theoretical Framework 383
3 Factor Selection for Propensity Score Modelling 384
3.1 Non-linear Factor Selection 384
3.2 Factor Selection for the Propensity Score Model 385
4 Example 386
References 388
Nonlinear Mapping Using a Hybridof PARAMAP and Isomap Approaches 390
1 Introduction 390
2 PARAMAP and Isomap Algorithms 391
2.1 The PARAMAP Algorithm 392
2.2 The Isomap Algorithm 392
2.3 Evaluation of the Mapping Results 393
3 PARAMAP-Isomap Hybrid Approach 394
3.1 Isomap Preprocessing Step 394
3.2 Mapping the Holdout Points 395
4 Results on the Experimental Configurations 395
4.1 Sphere with 62 Regularly Spaced Points 395
4.2 Sphere with 1,000 Points 397
4.3 Swiss Roll with 1,000 Points 398
5 Conclusion 398
References 399
Dimensionality Reduction Techniques for Streaming Time Series: A New Symbolic Approach 400
1 Introduction 400
2 Related Works 401
3 A New Symbolic Strategy for Streaming Time Series Dimensionality Reduction 402
3.1 Training Step 403
3.2 Online Representation 404
3.3 A Feasible Representation for Bivariate Streaming Time Series 405
3.4 Time Series Approximation 405
4 Experimental Evaluation 406
5 Conclusions and Perspectives 406
References 407
A Batesian Semiparametric Generalized Linear Modelwith Random Effects Using Dirichlet Process Priors 409
1 Introduction 409
2 Finite Mixture GLM with Random Effects 410
3 Representation of the Dirichlet Process Mixture Model 410
4 Algorithm: Blocked Gibbs Sampler 412
5 Simulation Studies 413
5.1 Simulation Study 2 415
6 Conclusion 415
References 416
Exact Confidence Intervals for Odds Ratios with Algebraic Statistics 417
1 Introduction 417
2 Traditional Confidence Intervals for the Odds Ratio 418
3 Algebraic Confidence Interval 419
4 Simulation Study 422
5 Example 423
6 Discussion 424
References 424
The CHIC Analysis Software v1.0 426
1 Introduction 426
2 Data Entry and Data Management 427
3 Simple Correspondence Analysis 428
4 Multiple Correspondence Analysis 429
5 Visualization Options 430
6 Ward Clustering as a Complementary Method 432
7 Summary 432
References 432
Part III Applications 434
Clustering the Roman Heaven: Uncovering the Religious Structures in the Roman Province Germania Superior 435
1 Introduction 435
2 Data and Methods 436
3 Results and Interpretations 438
4 Validation of the Results 440
5 Conclusion 441
References 442
Geochemical and Statistical Investigation of Roman Stamped Tiles of the Legio XXI Rapax 443
1 Introduction 443
2 The Roman Stamped Tiles Investigated by Giacomini 444
3 Preparation of Data Coming from Different Laboratories 444
4 Hierarchical Cluster Analysis by Ward's Method 446
5 Validation of Cluster Analysis Results 446
6 Interpretation of the Geochemical Clusters 449
References 450
Land Cover Classification by Multisource Remote Sensing: Comparing Classifiers for Spatial Data 451
1 Introduction 451
2 Benchmarking Classifiers for Multisource Rock Glacier Detection 453
2.1 Materials and Methods 453
2.2 Results 454
3 Discussion 455
3.1 State-of-the-Art Classifiers for Land Cover Mapping 455
3.2 High-Dimensional Problems in Remote Sensing 456
3.3 Spatial Error Estimation 456
3.4 Indirect Classification in Remote Sensing 457
4 Conclusions 457
References 458
Are there Cluster of Communities with the SameDynamic Behaviour? 460
1 Introduction 460
2 Data for German Community Dynamics 461
3 Visualization and Clustering of Similar Dynamics 463
4 Explaining Patterns of Multidimensional Dynamics 464
5 Transition to Knowledge and Spatial Abstraction 464
6 Discussion 465
7 Conclusion 466
References 467
Land Cover Detection with Unsupervised Clusteringand Hierarchical Partitioning 469
1 Introduction 469
2 Processing Flow 470
3 Hierarchical Segmentation 472
3.1 Transition Regions and Image Masking 472
4 Unsupervised Clustering 473
5 Classification and Cluster Separability 474
6 Concluding Remarks and Prospectives 476
References 476
Using Advanced Regression Models for Determining Optimal Soil Heterogeneity Indicators 477
1 Introduction 477
1.1 Research Target 478
1.2 Article Structure 478
2 Data Description 479
3 Advanced Regression Techniques 479
3.1 Introduction to Regression Techniques 480
3.2 Neural Networks 480
3.3 Regression Tree 481
3.4 Support Vector Regression 481
3.5 Linear Regression and Naive Estimator 482
3.6 Model Parameter Estimation 482
4 Regression Results 482
5 Conclusion 483
5.1 Future Work 484
References 484
Local Analysis of SNP Data 486
1 Introduction 486
2 Methods 487
2.1 Associative Classification 487
2.2 Localised Logistic Regression 489
3 Data 490
4 Results 491
5 Summary and Discussion 492
References 493
Airborne Particulate Matter and Adverse Health Events: Robust Estimation of Timescale Effects 494
1 Introduction 494
2 Materials and Methods 495
2.1 Data and Statistical Approach to Estimation of Associationsat Different Timescales 495
2.2 Fourier Decomposition 497
2.3 Singular Spectrum Analysis 497
3 Results and Discussion 499
References 501
Identification of Specific Genomic Regions Responsiblefor the Invasivity of Neisseria Meningitidis 503
1 Introduction 503
2 Neisseria Meningitidis and the FrpB Proteins 504
3 Algorithm for Detection of Genomic Regions Responsible for Disease 505
4 Results and Discussion 507
5 Conclusion 510
References 510
Classification of ABC Transporters Using Community Detection 512
1 Introduction 512
2 Materials and Methods 513
2.1 Data Sources 513
2.2 Methods 514
2.2.1 Identification and Filtering of Isorthologous Links 514
2.2.2 Identification of Isortholog Groups by Community Detection 514
2.2.3 Validation 515
3 Results 517
3.1 General Results on ABC System Classification 517
3.2 Results on Pentose-Related Importer Subfamily 517
4 Conclusion 518
References 519
Estimation of the Number of Sustained Viral Respondersby Interferon Therapy Using Random Numbers with a Logistic Model 520
1 Introduction 520
2 Subjects and Model Assumptions 521
3 Methods 521
4 Results 522
5 Discussion 524
References 526
Virtual High Throughput Screening Using Machine Learning Methods 528
1 Introduction 528
2 Data Description 529
3 Prediction of Experimental HTS Results Using Machine Learning Methods 530
3.1 Sampling Strategy 530
3.2 Machine Learning Methods 530
3.3 Comparison of Molecular and Atomic Descriptors 531
3.4 Results and Discussion 532
4 Conclusion and Future Developments 534
References 535
Network Analysis of Works on Clustering and Classification from Web of Science 536
1 Introduction 536
2 Networks from WoS 537
3 Analyses of Records from JoC 538
3.1 Collaboration Network 540
3.2 Citation Network Analysis 541
3.3 Citations Between Authors 544
3.3.1 Line Islands [10,400] – Authors Citations 544
4 Conclusion 546
References 547
Recommending in Social Tagging Systems Based on Kernelized Multiway Analysis 548
1 Introduction 548
2 Related Work 549
3 Tensors and Tucker Decomposition 550
4 Recommendation Based on Tucker Decomposition 551
5 Smoothing with Kernel Functions 552
6 Experimental Results 553
7 Conclusions 555
References 555
Dynamic Population Segmentation in Online Market Monitoring 556
1 Introduction 556
2 Related Work 557
3 Sensor Binning Based on Price Dynamics 558
3.1 Harvest Adaptation 558
3.2 Dynamic Population Segmentation 560
4 Harvest Balancing 561
4.1 Feasible Harvest Schedules 561
4.2 Harvest Schedule Tuning 562
5 Summary 562
References 563
Gaining `Consumer Insights' from Influential Actors in Weblog Networks 564
1 Introduction 564
2 Methods for Analyzing the Blogosphere 565
2.1 SNA Measures and Ego Networks 565
2.2 Netnography 566
3 Empirical Study: Mobile Communication 567
3.1 Data Description 567
3.2 SNA Analysis 568
3.3 Netnographical Analysis 569
4 Conclusions and Future Work 570
References 571
Visualising a Text with a Tree Cloud 572
1 Introduction 572
2 Constructing a Tree Cloud 573
2.1 Building the List of Frequent Terms 573
2.2 Building the Distance Matrix 574
2.3 Building the Tree 574
2.4 Building the Tree Cloud 575
3 Evaluating the Quality of a Tree Cloud 575
3.1 Stability and Robustness 577
3.2 Arboricity 577
3.3 Distance Comparison on the Obama Corpus 577
3.4 Robustness to Parameter Variations 578
4 Conclusion 578
References 579
A Tree Kernel Based on Classification and Citation Datato Analyse Patent Documents 581
1 Introduction 581
2 European Classification System ECLA 582
3 Patent Citations 583
4 Tree Kernel 583
5 Experiment 586
6 Conclusions 587
References 588
A New SNA Centrality Measure Quantifying the Distanceto the Nearest Center 589
1 Introduction 589
2 Methodology 590
3 Data and Data Preparation 591
4 Results 593
4.1 R-Devel Network 593
4.2 R-Help Network 593
4.3 Empirical Evidence of the Usefulness of the WDNC 593
4.4 WDNC Compared to Formal R Organization 595
5 Conclusion and Discussion 596
References 596
Mining Innovative Ideas to Support New Product Research and Development 597
1 Introduction 597
2 Rationale Behind Mining Innovative Ideas 598
3 Process of Mining Innovative Ideas 599
4 Acquisition of Ideas 600
5 Acquisition of Technological Context Information 600
6 Relationship among Scientific Categories 600
7 Classification of Ideas 601
8 Results and Evaluation 602
9 Outlook 603
References 604
The Basis of Credit Scoring: On the Definition of Credit Default Events 605
1 Introduction 605
2 Data Set of Individual Payment Histories 606
3 A Payment-Pattern Approach to the Identification of Credit Default Events 606
3.1 The Patterns of Payment 607
3.2 Measurement of Profitability 608
3.3 Application to the Empirical Data Set 609
4 Indicators of Individual Payment Performance 610
4.1 Description of Indicators 610
4.2 Application to the Empirical Data Set 611
5 Discussion 612
References 612
Forecasting Candlesticks Time Series with Locally Weighted Learning Methods 613
1 Introduction 613
2 Locally Weighted Learning Methods for Candlestick Time Series 615
2.1 k-NN for Candlestick Time Series 615
2.1.1 Determination of the k Nearest Neighbors 615
2.1.2 Generation of the Forecast 616
3 Forecasting the S& P500 Candlestick Time Series
3.1 Removing the Trend from Candlestick Time Series 617
3.1.1 Differencing the Intervals 617
3.1.2 Differencing the Candlestick Using the Previous Close Value 618
3.2 The One-Step Ahead Forecasting Experiment 618
4 Future Work 620
References 620
An Analysis of Alternative Methods for MeasuringLong-Run Performance: An Application to ShareRepurchase Announcements 622
1 Introduction 622
2 Methodologies for Measuring Long-Run Performance 623
2.1 BHAR, Fama-French Alphas and Cross-Sectional Regressions 623
2.2 Calendar Time Portfolio Approach 624
2.3 Generalized Calendar Time Approach 625
3 Data and Empirical Results 625
3.1 BHAR, Fama-French Alphas and Cross-Sectional Regressions 626
3.2 Calendar Time Portfolio Approach 627
3.3 Generalized Calendar Time Approach 628
4 Conclusion 629
References 629
Knowledge Discovery in Stock Market Data 630
1 Introduction 630
2 Daily Returns on Stocks 631
3 Knowledge Discovery in Market Activities 633
4 Types of Marked States 634
5 Discussion 635
6 Conclusion 636
References 636
The Asia Financial Crises and Exchange Rates: Had there been Volatility Shifts for Asian Currencies? 638
1 Introduction 638
2 Model and Bayesian Inference 639
2.1 The Volatility Model 639
2.2 The Prior and Posterior Distribution 641
2.3 Gibbs Sampling 642
3 Empirical Analysis 642
3.1 Model Choice 642
3.2 Thailand 643
3.3 The Philippines 645
3.4 Indonesia 645
3.5 South Korea 645
4 Conclusions 646
References 646
The Pricing of Risky Securities in a Fuzzy Least Square Regression Model 647
1 The Capital Asset Pricing Model 647
2 A Fuzzy Least Squares Regression Approach 649
3 Case Study 651
4 Final Remarks 653
References 654
Classification of the Indo-European Languages Using a Phylogenetic Network Approach 655
1 Introduction 655
2 Description of the Dyen Database 656
3 Materials and Methods 657
4 Results and Discussion 659
5 Conclusion 661
References 662
Parsing as Classification 664
1 Introduction 664
2 WCDG 666
3 MSTParser 667
4 MaltParser 668
5 Parser Combination 669
6 Conclusion 670
References 670
Comparing the Stability of Clustering Results of Dialect Data Based on Several Distance Matrices 672
1 Introduction 672
2 The Compound Matrix 673
3 Why Use these Statistical Measures for Linguistic Data? 676
4 Comparing Hierarchical Cluster Results 676
4.1 Cluster Stability Results 676
4.2 Interpretation of the Dialect clusters 677
References 679
Marketing and Regional Sales: Evaluationof Expenditure Strategies by Spatial Sales Response Functions 680
1 Introduction 680
2 Cross Sectional Sales Response Models 681
2.1 The Basic CSSR Model 681
2.2 Bayesian Inference by MCMC for CSSR Models 682
3 A Spatial Auto-Regressive Extension to CSSR Models 684
3.1 Spatial Lags 684
3.2 The CSSR-SAR Model 685
4 Empirical Test 687
5 Conclusions and Outlook 687
References 688
A Demand Learning Data Based Approach to Optimize Revenues of a Retail Chain 689
1 Introduction 689
2 Model Description 690
3 Numerical Study 693
4 Summary and Future Directions 695
References 696
Missing Values and the Consistency Problem Concerning AHP Data 698
1 Introduction 698
2 Consistency Adjustment Approaches 700
2.1 Manual Consistency Adjustment Approaches 700
2.2 Automated Consistency Adjustment Approaches 701
2.2.1 Automated Expert-Choice Method (AEM) 701
2.2.2 Iterative Eigenvalue Improvement Method (IEM) 701
2.2.3 Genetic Adjustment Method (GAM) 701
3 Comparison of Automated Consistency Adjustment Approaches 702
3.1 Common Performance Measures 702
3.2 Results 703
4 Outlook 705
References 705
Monte Carlo Methods in the Assessment of New Products:A Comparison of Different Approaches 706
1 Introduction 706
2 Assessment Methods of NPD 707
3 Real Options in the Assessment of NPD 709
4 Monte Carlo Simulation in Assessment of NPD 710
5 Conclusions and Outlook 712
References 713
Preference Analysis and Product Design in Markets for Elderly People: A Comparison of Methods and Approaches 714
1 Introduction 714
2 New Approach for Elderly People 716
2.1 Application of the New Approach for Elderly People 717
2.2 Comparison of Results 719
2.3 Field Predictability Test 719
3 Discussion and Outlook 720
References 720
Usefulness of A Priori Information about Customersfor Market Research: An Analysis for Personalisation Aspects in Retailing 722
1 Introduction 722
2 Personalisation Aspects in Retailing 723
3 Preference Estimation in Market Research 723
4 Empirical Investigation 724
4.1 Research Object and Design 724
4.2 Results 725
5 Conclusion and Outlook 728
References 728
Importance of Consumer Preferences on the Diffusionof Complex Products and Systems 730
1 Motivation 730
2 Specifics of Complex Products and Systems 731
3 The Diffusion of Complex Products and Systems 732
4 Consumer Preferences in the Diffusion of CoPS 734
5 Model Behaviour 735
6 Summary 737
References 737
Household Possession of Consumer Durables on Background of some Poverty Lines 739
1 Introduction 739
2 The Method 740
3 Conclusions 745
References 746
Effect of Consumer Perceptions of Web Site Brand Personality and Web Site Brand Association on Web Site Brand Image 747
1 Introduction 747
2 Theoretical Background and Hypotheses 748
3 Methods 748
4 Results 750
5 Discussion 753
References 753
Perceptually Based Phoneme Recognition in Popular Music 755
1 Introduction 755
2 Description of the Task 756
3 Auditory Modelling 756
4 Feature Extraction 758
5 Classifier Tuning 759
6 Results and Discussion 760
7 Summary 762
References 762
SVM Based Instrument and Timbre Classification 763
1 Introduction 763
2 Feature Extraction 764
2.1 Preprocessing 765
2.2 Perceptive Linear Prediction 766
2.3 Mel Frequency Cepstral Coefficients 766
3 Clustering 766
4 Classification 767
5 Software 768
6 Results 768
7 Conclusion 770
References 770
Three-way Scaling and Clustering Approach to Musical Structural Analysis 771
1 Introduction 771
2 The Three-Way Structure Model 773
2.1 Results by INDSCAL 773
2.2 Results Using INDCLUS Presented as a Hanabi Chart 774
3 Conclusion and Discussion 776
References 778
Improving GMM Classifiers by Preliminary One-class SVM Outlier Detection: Application to Automatic Music Mood Estimation 779
1 Introduction 779
1.1 Mood Models 780
1.2 Mood Audio Features 780
1.3 Mood Classification 781
2 Proposed System 781
3 Outlier Detection with One-Class SVM 782
3.1 One-Class SVM 782
3.2 Estimation of Kernel Parameters 783
4 Evaluation 783
4.1 Dataset and Parameter Settings 783
4.2 Results 785
5 Conclusions 785
References 786
Multiobjective Optimization for Decision Supportin Automated 2.5D System-in-Package Electronics Design 787
1 Introduction 787
2 Multiobjective Decision Support 789
3 Optimization Problems and Algorithms 790
4 Constructive Placement Heuristic 790
5 Group Constraint Concept 792
6 Computational Results 794
7 Conclusion 795
References 795
Multi-Objective Quality Assessment for EA Parameter Tuning 796
1 Introduction 796
2 Definition of Test Functions 797
3 Experiments and Results 798
3.1 Experiments 798
3.2 Results 799
3.2.1 Results on F10–F12 799
4 Conclusions and Outlook 802
References 803
A Novel Multi-Objective Target Value Optimization Approach 804
1 Introduction 804
2 Efficient Global Optimization (EGO) 805
3 The New Approach 807
3.1 Exemplary Progress 808
3.2 Stopping Criterion 808
4 Handling of Missing Values 809
5 Case Study 810
6 Summary 811
References 811
Desirability-Based Multi-Criteria Optimisation of HVOF Spray Experiments 813
1 The Process of High Velocity Oxy-Fuel Spraying 813
2 Experimental Designs 815
2.1 Plackett-Burman Design 815
2.2 Fractional-Factorial 25-1 Design 816
2.3 Central Composite Design 817
3 Multi-criteria Optimisation 818
3.1 Overlayed Contours 818
3.2 Desirabilities 818
4 Conclusion 820
References 820
Index 821
Erscheint lt. Verlag | 3.8.2010 |
---|---|
Reihe/Serie | Studies in Classification, Data Analysis, and Knowledge Organization | Studies in Classification, Data Analysis, and Knowledge Organization |
Zusatzinfo | XXXVI, 823 p. 236 illus. |
Verlagsort | Berlin |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Technik | |
Schlagworte | Artificial Intelligence • Biology • Business Intelligence • classification • Clustering • Data Analysis • Data handling • Knowledge Discovery • Linguistics |
ISBN-10 | 3-642-10745-1 / 3642107451 |
ISBN-13 | 978-3-642-10745-0 / 9783642107450 |
Haben Sie eine Frage zum Produkt? |
Größe: 23,6 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich