Advances in Data Analysis (eBook)
XXIV, 364 Seiten
Birkhäuser Boston (Verlag)
978-0-8176-4799-5 (ISBN)
This unified volume is a collection of invited chapters presenting recent developments in the field of data analysis, with applications to reliability and inference, data mining, bioinformatics, lifetime data, and neural networks. The book is a useful reference for graduate students, researchers, and practitioners in statistics, mathematics, engineering, economics, social science, bioengineering, and bioscience.
An outgrowth of the 12th International Conference on Applied Stochastic Models and Data Analysis, this book is a collection of invited chapters presenting recent developments in the field of data analysis, with applications to reliability and inference, data mining, bioinformatics, lifetime data, and neural networks. Emphasized throughout the volume are new methods with the potential for solving real-world problems in various areas, including data mining and text mining, information theory and statistical applications, asymptotic behaviour of stochastic processes and random fields, bioinformatics and Markov chains, life table data, survival analysis, and risk in household insurance, neural networks and self-organizing maps, parametric and nonparametric statistics, and statistical theory and methods.Advances in Data Analysis is a useful reference for graduate students, researchers, and practitioners in statistics, mathematics, engineering, economics, social science, bioengineering, and bioscience.
Contents 6
Preface 14
List of Contributors 15
List of Tables 15
List of Figures 18
Part I Data Mining and Text Mining 23
1 Assessing the Stability of Supplementary Elements on Principal Axes Maps Through Bootstrap Resampling. Contribution to Interpretation in Textual Analysis 24
Ramón Alvarez-Esteban, Olga Valencia, and Mónica Bécue-Bertaut 24
1.1 Introduction 24
1.2 Data 25
1.3 Methodology 25
1.4 Results 26
1.4.1 CA results 26
1.4.2 Stability 27
1.5 Conclusion 30
References 31
2 A Doubly Projected Analysis for Lexical Tables 33
Simona Balbi and Michelangelo Misuraca 33
2.1 Introduction 33
2.2 Some methodological recall 34
2.2.1 Constrained principal component analysis 34
2.2.2 Principal component analysis onto a reference subspace 35
2.3 Basic concepts and data structure 35
2.4 A doubly projected analysis 36
2.5 The Italian academic programs: A study on skills and competences supply 36
References 38
3 Analysis of a Mixture of Closed and Open-Ended Questions in the Case of a Multilingual Survey 40
Mónica Bécue-Bertaut, Karmele Fernández-Aguirre, and Juan I. Modroño-Herrán 40
3.1 Introduction 40
3.2 Data and objectives 40
3.3 Notation 42
3.4 Methodology 43
3.4.1 Principle of multiple factor analysis 43
3.4.2 Integrating categorical sets in MFA 44
3.4.3 Integrating frequency tables in MFA 44
3.4.4 Extended MFA performed as a weighted PCA 44
3.5 Results 45
3.5.1 Clustering from closed questions only 45
3.5.2 Clustering from closed and open-ended questions 46
3.6 Conclusion 49
References 50
4 Number of Frequent Patterns in Random Databases 51
Loïck Lhote 51
4.1 Introduction 51
4.2 Model of databases 52
4.2.1 Frequent pattern mining 52
4.2.2 Model of random databases 53
4.3 Main results 54
4.3.1 Linear frequency threshold 54
4.3.2 Constant frequency threshold 54
4.3.3 Sketch of proofs 55
4.4 Dynamical databases 56
4.4.1 Dynamical sources 56
4.4.2 Main tools 57
4.4.3 Proof of Theorem 3 59
4.5 Improved memoryless model of databases 60
4.6 Experiments 60
4.7 Conclusion 61
References 62
Part II Information Theory and Statistical Applications 64
5 Introduction 65
Koustautiuos Zografos 65
5.1 Introduction 65
References 66
6 Measures of Divergence in Model Selection 67
Alex Karagrigoriou and Kyriacos Mattheou 67
6.1 Introduction 67
6.2 Measures of divergence 68
6.3 Model selection criteria 69
6.4 The divergence information criterion 71
6.5 Lower bound of the MSE of prediction of DIC 74
6.6 Simulations 77
References 80
7 High Leverage Points and Outliers in Generalized Linear Models for Ordinal Data 82
M.C. Pardo 82
7.1 Introduction 82
7.2 Background and notation for GLM 83
7.3 The hat matrix: Properties 85
7.4 Outliers 88
7.5 Numerical example 91
7.6 Conclusion 94
References 94
8 On a Minimization Problem Involving Divergences and Its Applications 96
Athanasios P. Sachlas and Takis Papaioannou 96
8.1 Introduction 96
8.2 Minimization of divergences 97
8.3 Properties of divergences without probability vectors 98
8.4 Graduating mortality rates via divergences 102
8.4.1 Divergence-theoretic actuarial graduation 102
8.4.2 Lagrangian duality results for the power divergence 104
8.5 Numerical investigation 105
8.6 Conclusions and comments 106
References 108
Part III Asymptotic Behaviour of Stochastic Processesand Random Fields 110
9 Remarks on Stochastic Models Under Consideration 111
Ekaterina V. Bulinskaya 111
9.1 Introduction 111
9.2 Results and methods 112
9.3 Applications 114
References 117
10 New Invariance Principles for Critical Branching Process in Random Environment 119
Valeriy I. Afanasyev 119
10.1 Introduction 119
10.2 Main results 121
10.3 Proof of Theorem 1 123
10.4 Finite-dimensional distributions 126
10.5 Conclusion 128
References 129
11 Gaussian Approximation for Multichannel Queueing Systems 130
Larisa G. Afanas'eva 130
11.1 Introduction 130
11.2 Model description 131
11.3 The basic theorem 131
11.4 A limit theorem for a regenerative arrival process 135
11.5 Doubly stochastic poisson process (DSPP) 136
11.6 Conclusion 140
References 141
12 Stochastic Insurance Models, Their Optimalityand Stability 142
Ekaterina V. Bulinskaya 142
12.1 Introduction 142
12.2 Model description 143
12.3 Optimal control 143
12.4 Sensitivity analysis 147
12.5 Conclusion 153
References 153
13 Central Limit Theorem for Random Fields and Applications 154
Alexander Bulinski 154
13.1 Introduction 154
13.2 Main results 155
13.3 Applications 161
References 163
14 A Berry--Esseen Type Estimate for Dependent Systems on Transitive Graphs 164
Alexey Shashkin 164
14.1 Introduction 164
14.2 Main result 165
14.3 Proof 166
14.4 Conclusion 169
References 169
15 Critical and Subcritical Branching Symmetric Random Walks on d-Dimensional Lattices 170
Elena Yarovaya 170
15.1 Introduction 170
15.2 Description of a branching random walk 171
15.3 Definition of criticality for branching random walks 173
15.4 Main equations 174
15.5 Asymptotic behavior of survival probabilities 175
15.6 Limit theorems 176
15.7 Proof of theorems for dimensions d=1,2 in critical and subcritical cases 177
15.8 Conclusions 180
References 181
Part IV Bioinformatics and Markov Chains 182
16 Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences 183
Juliette Martin, Leslie Regad, Anne-Claude Camproux, and Grégory Nuel 183
16.1 Introduction 183
16.2 Methods 184
16.2.1 Notations 184
16.2.2 Pattern Markov chains 185
16.2.3 Exact computations 185
16.3 Data 187
16.3.1 Simulated data 187
16.3.2 Real data 187
16.4 Results and discussion 188
16.4.1 Simulation study 188
16.4.2 Illustrations on biological sequences 189
16.5 Conclusion 191
References 191
17 On the Convergence of the Discrete-Time Homogeneous Markov Chain 193
I. Kipouridis and G.Tsaklidis 193
17.1 Introduction 193
17.2 The homogeneous Markov chain in discrete time 194
17.3 The equation of the image of a hypersphere under the transformation (2.1) 194
17.4 Representation of equation (3.6) in matrix form 197
17.5 Conditions for a hypersphere of Rn-1 to be the image of a hypersphere under the stochastic transformation pT(t)=pT(t-1)P 202
References 212
Part V Life Table Data, Survival Analysis, and Riskin Household Insurance 213
18 Comparing the Gompertz-Type Models with a First Passage Time Density Model 214
Christos H. Skiadas and Charilaos Skiadas 214
18.1 Introduction 214
18.2 The Gompertz-type models 215
18.3 Application to life table and the Carey medfly data 217
18.4 Remarks 218
18.5 Conclusion 219
References 219
19 A Comparison of Recent Procedures in Weibull Mixture Testing 221
Karl Mosler and Lars Haferkamp 221
19.1 Introduction 221
19.2 Three approaches for testing homogeneity 222
19.3 Implementing MLRT and D-tests with Weibull alternatives 223
19.4 Comparison of power 225
19.5 Conclusion 227
References 227
20 Hierarchical Bayesian Modelling of Geographic Dependence of Risk in Household Insurance 229
László Márkus, N. Miklós Arató, and Vilmos Prokaj 229
20.1 Introduction 229
20.2 Data description, model building, and a tool for fit diagnosis 230
20.3 Model estimation, implementation of the MCMC algorithm 233
20.4 Conclusion 236
References 237
Part VI Neural Networks and Self-Organizing Maps 238
21 The FCN Framework: Development and Applications 239
Yiannis S. Boutalis, Theodoros L. Kottas, and Manolis A. Christodoulou 239
21.1 Introduction 239
21.2 Fuzzy cognitive maps 242
21.2.1 Fuzzy cognitive map representation 242
21.3 Existence and uniqueness of solutions in fuzzy cognitive maps 244
21.3.1 The contraction mapping principle 244
21.3.2 Exploring the results 247
21.3.3 FCM with input nodes 250
21.4 The fuzzy cognitive network approach 252
21.4.1 Close interaction with the real system 252
21.4.2 Weight updating procedure 252
21.4.3 Storing knowledge from previous operating conditions 253
21.5 Controlling a wastewater anaerobic digestion unit (Kottas et al., 2006) 256
21.5.1 Control of the process using the FCN 258
21.5.2 Results 260
21.5.3 Discussion 263
21.6 The FCN approach in tracking the maximum power point in PV arrays (Kottas et al., 2007b) 263
21.6.1 Simulation of the PV system 266
21.6.2 Control of the PV system using FCN 267
21.6.3 Discussion 269
21.7 Conclusions 270
References 270
22 On the Use of Self-Organising Maps to Analyse Spectral Data 274
Véronique Cariou and Dominique Bertrand 274
22.1 Introduction 274
22.2 Self-organising map clustering and visualisation tools 275
22.3 Illustrative examples 276
22.4 Conclusion 280
References 281
23 Neuro-Fuzzy Versus Traditional Models for Forecasting Wind Energy Production 282
George Atsalakis, Dimitris Nezis, and Constantinos Zopounidis 282
23.1 Introduction 282
23.2 Related research 283
23.3 Methodology 287
23.4 Model presentation 288
23.5 Results 290
23.6 Conclusion 291
References 292
Part VII Parametric and Nonparametric Statistics 295
24 Nonparametric Comparison of Several Sequential k-out-of-n Systems 296
Eric Beutner 296
24.1 Introduction 296
24.2 Preliminaries and derivation of the test statistics 297
24.2.1 Sequential order statistics: Introduction and motivation 297
24.2.2 Sequential order statistics and associated counting processes 299
24.3 K-sample tests for known 's 302
24.4 K-sample tests for unknown 's 304
References 308
25 Adjusting p-Values when n Is Large in the Presence of Nuisance Parameters 310
Sonia Migliorati and Andrea Ongaro 310
25.1 Introduction 310
25.2 Normal model with known variance 311
25.3 Normal model with unknown variance 314
25.4 Conclusion 319
25.5 Appendix 320
References 323
Part VIII Statistical Theory and Methods 324
26 Fitting Pareto II Distributions on Firm Size: Statistical Methodology and Economic Puzzles 325
Aldo Corbellini, Lisa Crosato, Piero Ganugi, and Marco Mazzoli 325
26.1 Introduction 325
26.2 Data description 326
26.3 Fitting the Pareto II distribution by means of the forward search 327
26.4 Empirical results 328
26.5 Economic implications 329
26.6 Concluding remarks 331
References 332
27 Application of Extreme Value Theory to Economic Capital Estimation 333
Samit Paul and Andrew Barnes 333
27.1 Introduction 333
27.2 Background mathematics 334
27.2.1 Risk measure 334
27.2.2 Extreme value theory 334
27.2.3 Estimating VaR using EVT 335
27.3 Threshold uncertainty 336
27.3.1 Tail-data versus accuracy tradeoff 336
27.3.2 Mean residual life plot 336
27.3.3 Fit threshold ranges 337
27.4 Experimental framework and results 337
27.4.1 Data 337
27.4.2 Simulation engine 337
27.4.3 Threshold selection 337
27.4.4 Bootstrap results on VaR stability 338
27.5 Conclusion 338
References 339
28 Multiresponse Robust Engineering: Industrial Experiment Parameter Estimation 341
Elena G. Koleva and Ivan N. Vuchkov 341
28.1 Introduction 341
28.2 Combined method for regression parameter estimation 343
28.3 Experimental designs 345
28.4 Experimental application 345
28.5 Conclusion 347
References 348
29 Inference for Binomial Change Point Data 349
James M. Freeman 349
29.1 Introduction 349
29.2 Analysis 350
29.3 Applications 352
29.3.1 Page's data 352
29.3.2 Lindisfarne Scribes' data 353
29.3.3 Club foot data 354
29.3.4 Simulated data 354
29.4 Conclusion 355
References 356
Index 357
Erscheint lt. Verlag | 25.11.2009 |
---|---|
Reihe/Serie | Statistics for Industry and Technology | Statistics for Industry and Technology |
Zusatzinfo | XXIV, 364 p. 68 illus. |
Verlagsort | Boston |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Mathematik ► Angewandte Mathematik |
Mathematik / Informatik ► Mathematik ► Finanz- / Wirtschaftsmathematik | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Medizin / Pharmazie ► Allgemeines / Lexika | |
Technik | |
Schlagworte | best fit • Bioinformatics • credit risk assessment • Data Analysis • Databases • Data Mining • Fitting • forecasting life expectancy • Generalized Linear Model • goodness-of-fit tests • Information Theory • lifetime data analysis • Markov Chain • measure • mortality rates • multi-way data • Neural networks • Non-Parametric Statistics • parametric statistics • random fields • reliability and inference • resampling • ri • Sage • Statistica • Survival Analysis |
ISBN-10 | 0-8176-4799-6 / 0817647996 |
ISBN-13 | 978-0-8176-4799-5 / 9780817647995 |
Haben Sie eine Frage zum Produkt? |
Größe: 6,1 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich