Advances in Data Analysis, Data Handling and Business Intelligence (eBook)

Proceedings of the 32nd Annual Conference of the Gesellschaft für Klassifikation e.V., Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University, Hamburg, July 16-18, 2008
eBook Download: PDF
2009 | 2010
XXVI, 695 Seiten
Springer Berlin (Verlag)
978-3-642-01044-6 (ISBN)

Lese- und Medienproben

Advances in Data Analysis, Data Handling and Business Intelligence -
Systemvoraussetzungen
149,79 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Data Analysis, Data Handling and Business Intelligence are research areas at the intersection of computer science, artificial intelligence, mathematics, and statistics. They cover general methods and techniques that can be applied to a vast set of applications such as in marketing, finance, economics, engineering, linguistics, archaeology, musicology, medical science, and biology. This volume contains the revised versions of selected papers presented during the 32nd Annual Conference of the German Classification Society (Gesellschaft für Klassifikation, GfKl). The conference, which was organized in cooperation with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), was hosted by Helmut-Schmidt-University, Hamburg, Germany, in July 2008.

Preface 5
Contents 8
Contributors 15
Part I Invited 25
Semi-supervised Probabilistic Distance Clusteringand the Uncertainty of Classification 26
1 Introduction 26
1.1 Clustering 26
1.2 Classification 27
1.3 Learning 27
1.4 Semi-supervised Clustering 28
1.5 Matching Labels 28
1.6 Plan of This Paper 28
2 Probabilistic Distance Clustering 29
2.1 Notation 29
2.2 Probabilistic Clustering 29
2.3 Probabilistic Distance Clustering 29
2.4 Probabilities and the Joint Distance Function 30
2.5 The Classification Uncertainty Function 31
2.6 An Extremum Problem for the Cluster Probabilitiesat a Point 32
2.7 An Extremum Problem for Clustering the Data Set 32
2.8 An Outline of the Probabilistic Distance ClusteringAlgorithm of Ben-Israel and Iyigun (2008) 33
3 Prior Information and Classification 34
3.1 Probabilistic Labels 34
3.2 An Extremum Problem for Classification 34
4 Semi-supervised Distance Clustering 34
4.1 An Extremum Problem for Semi-supervised Clustering 34
4.2 Probabilities 35
4.3 Cluster Centers 35
4.4 Algorithm 36
5 Examples 37
Appendix 1: The Membership Probabilities 39
Appendix 2: The Classification Uncertainty Function 41
References 42
Strategies of Model Construction for the Analysisof Judgment Data 44
1 Introduction 44
2 Strategies of Model Construction 45
3 Theories of Judgment and Empirical Findings 46
3.1 The Problem of Information Weighting 47
3.2 Illusory Correlations in Judgments 49
4 Three-Way Two-Mode Models 50
4.1 Application 52
4.1.1 Results 52
5 Conclusions 54
References 54
Clustering of High-Dimensional Data via FiniteMixture Models 56
1 Introduction 56
2 Definition of Mixture Models 57
3 Choice of Starting Values for the EM Algorithm 58
4 Clustering via Normal Mixtures 59
5 Some Recent Extensions for High-Dimensional Data 60
6 Factor Analysis Model for Dimension Reduction 61
7 Mixtures of Common Factor Analyzers 63
8 Fitting of Factor-Analytic Models 65
References 66
Clustering and Dimensionality Reduction toDiscover Interesting Patterns in Binary Data 68
1 Introduction 68
2 Basic Notation and Definitions 69
3 Cluster Analysis of Binary Data 70
3.1 Dissimilarity Measures for Binary Data 70
4 Strategies for Patterns Identification 71
4.1 Column-Wise Quantification of Binary Attributes 72
4.2 Row-Wise Quantification of Binary Attributes 73
5 Empirical Evidence from a Real Data-Set 75
6 Conclusion 77
References 78
Kernel Methods for Detecting the Direction of Time Series 79
1 Introduction 79
2 Statistical Methods 81
2.1 A Hilbert Space Embedding for Distributions 81
2.2 Hilbert Schmidt Independence Criterion 82
2.3 Autoregressive Moving Average Models 82
3 Learning the True Time Direction 83
3.1 The Classification Method 83
3.2 The ARMA Method 84
4 Experiments 85
5 Conclusion and Discussion 86
References 88
Statistical Processes Under Change: EnhancingData Quality with Pretests 89
1 Introduction 89
2 The Model of Multiple Sources Mixed-Mode Design 90
3 Qualitative and Quantitative Test Methodsfor Questionnaires 91
3.1 Qualitative Test Methods 92
3.1.1 Cognitive Interviews 92
3.1.2 Expert Discussion Groups 95
3.1.3 Observation 96
3.2 Quantitative Test Methods 96
3.2.1 Behaviour Coding 96
3.2.2 Interviewer and Interviewee Debriefing 97
3.2.3 Follow-up Interviews 97
3.2.4 Experiments 97
3.2.5 Post-evaluation Methods 98
4 The Pretest Concerning the Change-over to the 2008 Economic Sector Classification 99
References 101
Part II Clustering and Classification 102
Evaluation Strategies for Learning Algorithmsof Hierarchies 103
1 Introduction 103
2 Evaluation Strategies in the Literature 105
3 Interdisciplinary Comparison of Evaluation Measures 107
4 Experiments and Conclusion 110
References 112
Fuzzy Subspace Clustering 113
1 Introduction 113
2 Preliminaries and Notation 114
3 Attribute Weighting 115
3.1 Axes-Parallel Gustafson–Kessel Fuzzy Clustering 115
3.2 Attribute Weighting Fuzzy Clustering 116
4 Attribute Selection 116
5 Principal Axes Weighting 118
5.1 Gustafson–Kessel Fuzzy Clustering 118
5.2 Reformulation of Gustafson–Kessel Fuzzy Clustering 119
6 Principal Axes Selection 120
7 Experiments 120
8 Summary 122
References 122
Motif-Based Classification of Time Series withBayesian Networks and SVMs 124
1 Introduction 124
2 Related Work 125
3 Discovery of Generalized Semi-Continuous Motifs 127
4 Experimental Evaluation 131
5 Conclusion 132
References 132
A Novel Approach to Construct DiscreteSupport Vector Machine Classifiers 134
1 Introduction 134
2 Discrete Support Vector Machines 136
2.1 Motivation and Mathematical Formulation 136
2.2 Constructing DSVM Classifiers by Integer Programming 138
3 Empirical Evaluation 140
4 Conclusions 143
References 143
Predictive Classification Trees 145
1 Statement of the Problem 145
2 Factor Selection 146
2.1 Factor Reduction 147
2.2 Example 147
3 Predictive Measures of Association 148
4 Tree Induction 150
References 152
Isolated Vertices in Random Intersection Graphs 153
1 Introduction 153
2 Definitions and Main Results 154
3 Preliminaries 156
3.1 Edge Probability 156
3.2 Isolated Vertices in Gs(n,m,d) 157
4 Proof of Theorem 1 161
5 Remarks on Other Distributions 163
References 163
Strengths and Weaknesses of Ant Colony Clustering 164
1 Introduction 164
2 Ant Colony Clustering 165
3 Analysis of Ant Colony Clustering by Means of Self-Organizing Batch Maps 166
4 Improvement of Ant Colony Clustering 168
5 Data Analysis with Emergent Ant Colony Clustering 170
6 Experimental Settings and Results 170
7 Discussion 172
8 Summary 172
References 173
Variable Selection for Kernel Classifiers:A Feature-to-Input Space Approach 174
1 Introduction 174
2 Feature-to-Input Space Variable Selection 175
2.1 FI-Selection Based on the Group Means 175
2.2 FI-Selection Based on the Kernel Weight Vector 176
3 Simulation Study 177
4 Simulation Results and Conclusions 179
5 Application to Data Sets 180
6 Summary 182
References 182
Finite Mixture and Genetic Algorithm Segmentationin Partial Least Squares Path Modeling: Identificationof Multiple Segments in Complex Path Models 184
1 Introduction 184
2 Computational Experiment 186
3 Results 188
4 Summary and Conclusion 190
References 192
Cluster Ensemble Based on Co-occurrence Data 194
1 Introduction 194
2 The Algorithm 195
3 Benchmark Experiments 196
4 Results 199
5 Summary 200
References 201
Localized Logistic Regression for CategoricalInfluential Factors 202
1 Introduction 202
2 Analysis of SNP Data 203
3 Logistic Regression 204
4 Localized Logistic Regression 205
5 Calculation of Weights for Categorical Predictors 207
6 Application to SNP Data 209
7 Summary 211
References 211
Clustering Association Rules with Fuzzy Concepts 213
1 Introduction 213
2 Nomenclature 215
3 Linguistic Clustering 215
3.1 Rule Trajectory Visualization 215
3.2 Linguistic Concepts 216
4 Experiments 218
4.1 Artificial Data Set 218
4.2 Real-world Data Set 219
5 Conclusion and Future Work 220
References 221
Clustering with Repulsive Prototypes 222
1 Introduction 222
2 Fuzzy c-Means and Noise Clustering 223
3 Repulsive Prototypes 224
4 Experimental Results 227
5 Conclusions and Future Work 229
References 229
Part III Mixture Analysis 231
Weakly Homoscedastic Constraints for Mixturesof t-Distributions 232
1 Introduction 232
2 Preliminaries and Notation 233
2.1 The Crab Data Set 235
3 Weakly Homoscedastic Covariance Matrices 236
4 Numerical Studies 238
5 Conclusions 240
References 241
Bayesian Methods for Graph Clustering 242
1 Introduction 242
2 A Mixture Model for Networks 244
3 Bayesian View of MixNet 244
3.1 Bayesian Probabilistic Model 244
3.2 Variational Inference 245
3.2.1 Variational Bayes E-Step 246
3.2.2 Variational Bayes M-Step: Optimization of q() 247
3.2.3 Variational Bayes M-Step: Optimization of q() 247
3.2.4 Lower Bound 247
3.3 Model Selection 248
4 Experiments 248
4.1 Comparison of the Criteria 249
5 Conclusion 251
References 251
Determining the Number of Components inMixture Models for Hierarchical Data 253
1 Introduction 254
2 Multilevel Latent Class Model 255
3 Design of the Simulation Study 257
4 Results of the Simulation Study 258
5 Conclusions 259
References 260
Testing Mixed Distributions when the MixingDistribution Is Known 262
1 Introduction 262
2 Construction of the Test Statistics 264
2.1 The Score Statistics T(k) 264
2.2 Schwarz Criteria Statistics 265
3 Simulation Study 267
3.1 Empirical Levels 268
3.2 Empirical Powers when the Mixed Density Is Known 269
References 270
Classification with a Mixture Model Havingan Increasing Number of Components 271
1 Introduction 271
2 Main Notations and Assumptions 273
3 Convergence 276
4 Random Classification of the Observations 278
References 279
Nonparametric Fine Tuning of Mixtures: Applicationto Non-Life Insurance Claims Distribution Estimation 280
1 Introduction 280
2 Mixture-Based Data Transformations 283
3 Beta Kernel Density Estimation 284
4 Application to Non-Life Insurance Data 287
References 289
Part IV Linguistics and Text Analysis 291
Classification of Text Processing Components:The Tesla Role System 292
1 Introduction 292
2 Related Work 293
3 A Role System for Text Processing Components 294
3.1 The Tesla Framework 295
3.2 The Relation Between Components and Roles 295
3.3 An Example: Alignment 298
4 Discussion 299
References 300
Nonparametric Distribution Analysis for Text Mining 302
1 Introduction 302
2 Maximum Mean Discrepancy 303
3 String Kernels 304
4 R Infrastructure 305
4.1 tm 306
4.2 kernlab 306
4.3 Framework for Kernel Methods on Text in R 306
4.4 Kernel MMD 306
5 Experiments 307
5.1 Data 307
5.2 Results 307
6 Conclusion 311
References 311
Linear Coding of Non-linear Hierarchies:Revitalization of an Ancient Classification Method 313
1 Introduction 313
1.1 Why Are Linear Codings Desirable? 313
1.2 Panini's Sivasutra-Technique 314
2 Linear Coding of Non-linear Hierarchies: Generalizing Panini's Sivasutra-Technique 316
2.1 S-Orders and S-Sortability: Formal Foundations 316
2.2 Constructing S-Orders 317
2.3 The Problem of Identifying Elements for Duplication 320
References 322
Automatic Dictionary Expansion Using Non-parallel Corpora 323
1 Introduction 323
2 Approach 325
3 Language Resources 326
4 Results 327
5 Discussion and Future Work 330
References 331
Multilingual Knowledge-Based Concept Recognitionin Textual Data 332
1 Introduction 332
2 Application in the Automotive Domain 333
3 Related Work 334
4 Requirements 334
5 Data Structure 335
6 Concept Recognition 337
6.1 Taxonomy Expansion 338
6.2 Matching Process 338
7 Evaluation 339
8 Conclusion and Future Work 340
References 340
Part V Pattern Recognition and Machine Learning 342
A Diversified Investment Strategy Using Autonomous Agents 343
1 Introduction 343
2 Agents' Implementation 344
2.1 Prediction Mechanism 345
2.2 Money Management Using Empirical Knowledge 348
2.3 Risk Management Using Domain Knowledge 350
2.4 Agents' Results 352
3 Final Remarks 353
References 353
Classification with Kernel Mahalanobis Distance Classifiers 354
1 Introduction 354
2 Kernels and Feature-Space Embedding 355
3 Kernel Mahalanobis Distance Classifiers 356
3.1 Kernel Mahalanobis Distances for Invertible Covariance 356
3.2 Kernel Mahalanobis Distance for Regularized Covariance 357
3.3 Classifiers Based on Kernel Mahalanobis Distances 358
4 Experiments 359
4.1 Experiments on 2D Toy Data 359
4.2 Real-World-Experiments 361
5 Discussion and Theoretical Considerations 361
6 Conclusion 363
References 364
Identifying Influential Cases in Kernel Fisher DiscriminantAnalysis by Using the Smallest Enclosing Hypersphere 365
1 Introduction 366
2 Kernel Fisher Discriminant Analysis 367
3 Criteria for Identifying Influential Cases in KFDA 368
4 The Smallest Enclosing Hypersphere 369
5 Monte Carlo Simulation Study 370
6 Application to a Data Set 372
7 Conclusions and Open Problems 372
References 373
Self-Organising Maps for Image Segmentation 374
1 Introduction 374
2 Materials and Methods 375
2.1 Theory 375
2.2 Data 376
2.3 Software 377
3 SOMs for Image Segmentation 377
4 Supervised SOMs 380
5 Discussion 382
References 383
Image Based Mail Piece Identification UsingUnsupervised Learning 385
1 Introduction 385
2 Motivation 386
3 Approach 387
4 Feature Extraction and Comparison 389
5 Search Area Consolidation 390
5.1 Mail Stream Analysis 390
5.2 Rejection Criteria Estimation 391
6 Mail Piece Identification 392
7 Experiments 393
8 Conclusion and Outlook 394
References 394
Part VI Statistical Musicology 396
Statistical Analysis of Human Body Movementand Group Interactions in Response to Music 397
1 Introduction 397
2 Experimental Design and Data Considerations 398
3 Analysis 399
4 Discussion 403
5 Conclusion 404
References 405
Applying Statistical Models and Parametric DistanceMeasures for Music Similarity Search 407
1 Introduction 407
2 Feature Extraction 408
3 Statistical Models 409
4 Parametric Distance Measures 410
5 Evaluation 412
5.1 Test Data and Evaluation Metric 412
5.2 Aggregation Process 413
6 Results 413
7 Conclusions 415
References 415
Finding Music Fads by Clustering Online RadioData with Emergent Self Organizing Maps 417
1 Introduction 417
2 Related Works 418
3 Data 419
4 Frequential Genre Integration 419
5 Visualisation of Music Fads 420
6 Identification of Fads 421
7 Fads Characterisation 422
8 Results 423
9 Discussion 423
10 Summary 423
References 424
Analysis of Polyphonic Musical Time Series 426
1 Introduction 426
2 Model for Polyphonic Sound 427
3 Preprocessing 428
3.1 Alphabet 428
3.2 Distortion Measures 429
4 Results 429
4.1 Data 429
4.2 Construction of the Alphabet 430
4.3 First Results 431
4.4 Comparison of Distortion Measures 431
4.5 Halleluja 432
4.6 Instrument Tracking 433
5 Conclusion 434
References 434
Part VII Banking and Finance 435
Hedge Funds and Asset Allocation:Investor Confidence, Diversification Benefits,and a Change in Investment Style Composition 436
1 Introduction 436
2 Literature Review 437
3 Data and Descriptive Statistics 438
4 Portfolio Benefits and Capital Flows 440
4.1 Expected Alpha and Allocation into Hedge Funds 440
4.2 Reduction of Diversification Benefits Over Time 442
4.3 Structural Breaks 443
5 Conclusion 444
References 445
Mixture Hidden Markov Models in Finance Research 446
1 Introduction 446
2 The Mixture Hidden Markov Model 447
3 Data Set 449
4 Results 451
5 Conclusions 454
References 454
Multivariate Comparative Analysis of StockExchanges: The European Perspective 455
1 Introduction 455
2 Data Description 456
3 Cluster Analysis 457
4 K-means Grouping 459
5 Factor Analysis 461
6 Summary and Conclusions 462
References 462
Empirical Examination of FundamentalIndexation in the German Market 464
1 Introduction 464
2 Data and Index Methodology 465
3 Results 466
4 Analysis 468
4.1 Efficient Market 468
4.2 Inefficient Market 470
5 Conclusion 470
References 472
The Analysis of Power For Some ChosenVaR Backtesting Procedures 473
1 Introduction 473
2 Tests Based on the Frequency of Failures 475
3 Tests Based on Multiple VaR Levels 476
Backtesting Errors 477
4 Empirical Research: Simulation Approach 478
5 Some Final Conclusions 482
References 482
Extreme Unconditional Dependence Vs. MultivariateGARCH Effect in the Analysis of DependenceBetween High Losses on Polish andGerman Stock Indexes 483
1 Introduction 483
2 Models to Be Compared 484
2.1 Extreme Dependence 484
2.1.1 Testing For 484
2.1.2 Models Used for Simulations 486
2.2 Varying Conditional Covariance 486
3 The Research 487
4 Summary 493
References 495
Is Log Ratio a Good Value for Measuring Returnin Stock Investments? 496
1 Introduction 496
2 The Data 497
3 Measuring Daily Return 497
4 The Distribution of Daily Returns 499
5 Modeling the Distribution of Returns 499
6 Discussion 501
7 Summary 502
References 502
Part VIII Marketing, Management Science and Economics 503
Designing Products Using QFD and CA: A Comparison 504
1 Introduction 504
2 Product Design in a Climbing Harness Market 505
2.1 Application of Conjoint Analysis 506
2.2 Application of Quality Function Deployment 507
3 Product Design in a Mobile Phone Market 507
3.1 Application of Conjoint Analysis 507
3.2 Application of Quality Function Deployment 509
3.3 Comparing the CA and QFD Results 513
3.4 Comparing the Results with Pullman et al.'s Experiment 513
4 Conclusions and Outlook 514
References 514
Analyzing the Stability of Price Response Functions:Measuring the Influence of DifferentParameters in a Monte Carlo Comparison 516
1 Introduction 516
2 Price Response Functions in Marketing 517
2.1 Alternatives of Price Response Functions 517
2.2 Price Response Functions and Connected Values 518
2.3 Instruments for Measuring Price Sensitivity 518
3 A Monte Carlo Comparison 520
3.1 Research Design 520
3.2 Results 521
4 Conclusion and Outlook 523
References 523
Real Options in the Assessment ofNew Products 525
1 Introduction 525
2 Uncertainties in Product Development 526
3 Real Options in NPD and R& D Projects
4 Real Options Assessment Using Excel Based Tools 528
5 Conclusions and Outlook 531
References 532
Exploring the Interaction Structure of Weblogs 533
1 Introduction 533
2 Identifying Blogs on the WWW 534
2.1 Social Networks of Blogs 534
2.2 Assessment of Egos and Ego Networks 535
3 Empirical Application 537
4 Conclusions and Future Work 539
References 540
Analyzing Preference Rankings when There AreToo Many Alternatives 541
1 Introduction and Motivation 541
2 Preliminaries 542
3 Methodology 543
3.1 Test Statistic 545
3.2 Multiple Comparisons 545
3.3 Rank Plots 546
3.4 Homogeneous Subsets 547
4 Illustration 547
4.1 Data 547
4.2 Results 548
5 Conclusion 550
References 550
Considerations on the Impact of Ill-ConditionedConfigurations in the CML Approach 551
1 Introduction 551
2 The Partial Credit Model 553
3 CML Approach to Estimate Item Parameters 554
4 State of the Art Regarding Existence of ML Estimates 555
5 Analysis of Fixed Small-Dimensional Datasets 556
6 Concluding Remarks 559
References 559
Dyadic Interactions in Service Encounter:Bayesian SEM Approach 561
1 Introduction 561
1.1 Service Encounter in Relationship Marketing 561
1.2 Research Design 562
2 APIM Model: Bayesian SEM Approach 563
2.1 Assumptions of Bayesian SEM 563
2.2 APIM Structural Model 564
3 Final Remarks 569
References 569
Part IX Archaeology and Spatial Planning 571
Estimating the Number of Buildings in Germany 572
1 Introduction 572
2 Inspection and Transformation of Data 573
3 Estimation 575
4 Information Optimisation 578
5 Conclusion 579
References 580
Mapping Findspots of Roman Military Brickstampsin Mogontiacum (Mainz) and Archaeometrical Analysis 581
1 Introduction 581
2 Mapping of the Locations of Findspots 583
3 Smooth Mapping by Nonparametric Density Estimation 584
4 Comparison of Different Periods 585
5 Conclusions 589
References 589
Analysis of Guarantor and Warrantee RelationshipsAmong Government Officials in theEighth Century in the Old Capital of Japanby Using Asymmetric Multidimensional Scaling 590
1 Introduction 590
2 Data 591
3 The Method 592
4 The Analysis and the Result 593
5 Discussion 595
References 599
Analysis of Massive Emigration from Poland:The Model-Based Clustering Approach 600
1 Introduction 600
2 Model-Based Clustering 601
2.1 Mixture Models 601
2.2 Parameter Estimation and Model Selection 602
2.3 Model-Based Strategy for Clustering 603
3 Example 604
4 Conclusions 606
5 Discussion 608
References 608
Part X Bio- and Health Sciences 610
Systematics of Short-Range Correlations inEukaryotic Genomes 611
1 Introduction 611
2 Systematics of Correlation Signatures 613
3 Algorithmic Challenges 617
3.1 Systematic Comparison of Many Trees: The Tree-Color Coding Method 617
3.2 Memory and Run Time Management for Large Genomes 618
4 Conclusion 620
References 621
On Classification of Molecules and Species ofRepresentation Rings 622
1 Introduction 622
2 Classification of Molecules by Symmetry Groups 623
3 Ordinary Representations of Finite Groups 624
4 Modular Representations of Finite Groups 626
5 Species of Representation Rings 627
6 Conclusions 631
References 631
The Precise and Efficient Identification ofMedical Order Forms Using Shape Trees 633
1 Introduction 633
2 Geometrical Shapes for Determining Similarity 634
2.1 Object Recognition 634
2.2 Shapes as Models for Regions 634
2.3 Modeling Regions as a Shape Tree 635
2.4 Shape Tree Structure 635
2.5 Searching in a Shape Tree 637
3 Document Identification of Specialized Order Forms 638
4 Experiments 640
5 Discussion 642
6 Summary 642
References 643
On the Prognostic Value of Gene ExpressionSignatures for Censored Data 644
1 Introduction 644
2 Prediction Accuracy of Survival Models 645
3 Measuring the Prognostic Value of Survival Models 647
4 Low-Dimensional Data: Simulation Example 649
5 High-Dimensional Data: Lymphoma Application 651
6 Conclusions 652
References 653
Quality-Based Clustering of Functional Data:Applications to Time Course Microarray Data 655
1 Introduction 655
2 Methods 657
2.1 K-Means Clustering of Functional Data 657
2.2 Quality-Based Clustering of Functional Data 657
3 Simulation Design 657
3.1 Integrated AR Processes for Simulated Data 658
4 Simulation Results 659
5 Summary 661
References 663
A Comparison of Algorithms to Find DifferentiallyExpressed Genes in Microarray Data 665
1 Introduction 665
2 Benchmark Data Set 666
3 Popular Algorithms to Identify Differentially Expressed Genes 667
4 The PUL Method to Identify DE Genes 667
4.1 Unit Transformation 668
4.2 Modeling Expressed Genes as Log Normals 669
4.3 Bayes Posterior Probabilities 671
4.4 Gene Scoring in PUL 671
5 IR Methods for the Evaluation of DE Algorithms 671
6 Results 672
7 Applications of PUL 674
8 Discussion 674
9 Summary 676
References 676
Part XI Exploratory Data Analysis, Modeling and Applications 678
Data Compression and Regression Basedon Local Principal Curves 679
1 Introduction 679
2 Data Compression with Local Principal Curves 680
2.1 Local Principal Curves 680
2.2 Simple Example: Speed-Flow Data 681
2.3 Parametrizations and Projections 681
3 Regression with Principal Curves 683
3.1 GAIA Data 683
3.2 Principal Component Regression 685
3.3 Dimension Reduction with Local Principal Curves 686
3.4 Direct Local Principal Curve Regression 687
3.5 Prediction and Comparison 687
4 Outlook 689
References 689
Optimization of Centrifugal Impeller UsingEvolutionary Strategies and Artificial Neural Networks 691
1 Introduction 691
2 Optimization of a Centrifugal Impeller Geometry 692
3 Optimization Using Evolutionary Strategies 692
4 Performance Predictions 693
4.1 Method 694
4.2 Experimental Settings 695
4.3 Experimental Results 695
5 Conclusion 698
References 699
Efficient Media Exploitation TowardsCollective Intelligence 700
1 Introduction 700
2 Progress Over Related Scientific Work 701
3 Intelligent Media Analysis 703
3.1 Text Analysis 704
3.2 Visual Information Analysis 705
3.3 Speech Analysis 706
4 Contextual Media Analysis and Fusion 706
5 Social Media Intelligence 707
6 Conclusions 708
References 708
Multi-class Extension of Verifiable EnsembleModels for Safety-Related Applications 710
1 Introduction 710
2 The Verifiable Ensemble 712
3 Common Multi-class Extensions 714
4 The Multi-class Ensemble 716
5 Conclusions 720
References 720
Dynamic Disturbances in BTA Deep-Hole Drilling:Modelling Chatter and Spiralling as RegenerativeEffects 722
1 Introduction 722
2 Chatter and Spiralling as Regenerative Effects 723
3 Modelling Chatter 725
3.1 Torsional Vibration Model 725
3.2 Chatter Simulation 726
4 Modelling Spiralling 727
4.1 Bending Vibration Model 727
4.2 Clustering of Increasing Eigenfrequency Courses 727
5 Outlook 729
References 731
Nonnegative Matrix Factorization for BinaryData to Extract Elementary Failure Mapsfrom Wafer Test Images 732
1 Introduction 732
1.1 Notation 733
2 Nonnegative Matrix Factorization 733
2.1 Alternating Least Squares Algorithm for NMF 733
3 NMF for Binary Datasets 734
3.1 Generative Model 734
3.2 Bernoulli Likelihood 735
3.3 Optimizing the Log-Likelihood 736
3.3.1 Alternating Gradient Ascent Algorithm 736
3.3.2 Alternating Least Squares on a Simplified Problem 737
3.3.3 Determining the Parameter 737
3.3.4 Semi-supervised Mode 738
3.4 Other Cost functions 738
4 Results 739
4.1 Toydata Example 739
4.2 Real World Example 740
5 Conclusion 741
References 741
Collective Intelligence Generation from User Contributed Content 742
1 Introduction 742
2 Collective Intelligence 744
2.1 Personal Intelligence 744
2.2 Media Intelligence 745
2.3 Mass Intelligence 747
2.4 Social Intelligence 747
2.5 Organizational Intelligence 748
3 Use Cases 749
3.1 Emergency Response Case Study 749
3.2 Consumers Social Group Case Study 750
4 Conclusions 751
References 751
Computation of the Molenaar Sijtsma Statistic 752
1 Introduction 752
2 Case I: The Computation of MS When No Provisional Measures Are Needed 754
3 Case II: The Computation of MS When ProvisionalMeasures Are Needed 757
4 Estimation of the Unobservable Joint Cumulative Probabilities in MSP5.0 759
5 Discussion 760
References 761
Keyword Index 762
Author Index 765

Erscheint lt. Verlag 14.10.2009
Reihe/Serie Studies in Classification, Data Analysis, and Knowledge Organization
Studies in Classification, Data Analysis, and Knowledge Organization
Zusatzinfo XXVI, 695 p. 172 illus., 68 illus. in color.
Verlagsort Berlin
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Datenbanken
Mathematik / Informatik Mathematik Statistik
Technik
Wirtschaft Betriebswirtschaft / Management Wirtschaftsinformatik
Schlagworte Artificial Intelligence • Business Intelligence • classification • Clustering • Cognition • Computer Science • Data Analysis • Databionics • Intelligence • learning • Linguistics • machine learning • Modeling • pattern recognition • service-oriented computing
ISBN-10 3-642-01044-X / 364201044X
ISBN-13 978-3-642-01044-6 / 9783642010446
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 14,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
der Grundkurs für Ausbildung und Praxis

von Ralf Adams

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
29,99
Das umfassende Handbuch

von Wolfram Langer

eBook Download (2023)
Rheinwerk Computing (Verlag)
34,93
Das umfassende Lehrbuch

von Michael Kofler

eBook Download (2024)
Rheinwerk Computing (Verlag)
34,93