Computational Network Theory - Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl

Computational Network Theory (eBook)

Theoretical Foundations and Applications

Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl (Autoren)

eBook Download: EPUB

2015 | 1. Auflage
280 Seiten
Wiley-Blackwell (Verlag)
978-3-527-69154-8 (ISBN)

This comprehensive introduction to computational network theory as a branch of network theory builds on the understanding that such networks are a tool to derive or verify hypotheses by applying computational techniques to large scale network data.
The highly experienced team of editors and high-profile authors from around the world present and explain a number of methods that are representative of computational network theory, derived from graph theory, as well as computational and statistical techniques.
With its coherent structure and homogenous style, this reference is equally suitable for courses on computational networks.

Matthias Dehmer studied mathematics at the University of Siegen (Germany) and received his Ph.D. in computer science from the Technical University of Darmstadt (Germany). Afterwards, he was a research fellow at Vienna Bio Center (Austria), Vienna University of Technology, and University of Coimbra (Portugal). He obtained his habilitation in applied discrete mathematics from the Vienna University of Technology. Currently, he is Professor at UMIT - The Health and Life Sciences University (Austria) and also holds a position at the Universität der Bundeswehr München. His research interests are in applied mathematics, bioinformatics, systems biology, graph theory, complexity and information theory. He has written over 180 publications in his research areas.
Frank Emmert-Streib studied physics at the University of Siegen (Germany) gaining his PhD in theoretical physics from the University of Bremen (Germany). He received postdoctoral training from the Stowers Institute for Medical Re- search (Kansas City, USA) and the University of Washington (Seattle, USA). Currently, he is an associate professor at the Queen's University Belfast (UK) at the Center for Cancer Research and Cell Biology heading the Computational Biology and Machine Learning Laboratory. His main research interests are in the field of computational medicine, network biology and statistical genomics.

1
Model Selection for Neural Network Models: A Statistical Perspective

Michele La Rocca and Cira Perna

1.1 Introduction

It is generally accepted that liner analysis often gives poor performances in approximating real data. Therefore, although it is easy to handle and fast to compute, and many statistical results are available, it cannot be extensively used especially when complex relationships are recognized in the data. In these contexts, it is common the use of non linear analysis which can successfully be employed to reveal these patterns.

However, parametric analysis, both linear and nonlinear, requires an “a priori” specification of the links among the variables of interest, which is not always possible. Therefore, even if the results have the advantage of the interpretability (in the sense that the model parameters are often associated to quantities with a “physical” meaning), misspecification problem can arise and can affect seriously the results of the analysis. In this respect, nonparametric analysis seems to be a more effective statistical tool due to its ability to model non-linear phenomena with few (if any) “a priori” assumptions about the nature of the data generating process. Well-studied and frequently used tools in nonparametric analysis include nearest neighbours regression, kernel smoothers, projection pursuit, alternating conditional expectations, average derivative estimation, and classification and regression trees.

In this context, computational network analysis forms a field of research which has enjoyed rapid expansion and increasing popularity in both the academic and the research communities, providing an approach that can potentially lead to better non-parametric estimators and providing an interesting framework for unifying different non-parametric paradigms, such as nearest neighbours, kernel smoothers, and projection pursuit.

Computational network tools have the advantage, with respect to other non-parametric techniques, to be very flexible tools able to provide, under very general conditions, an arbitrarily accurate approximation to an unknown target the function of interest. Moreover, they are expected to perform better than other non-parametric methods since the approximation form is not so sensitive to the increasing data space dimension (absence of “curse of dimensionality”), at least within particular classes of functions.

However, a major weakness of neural modeling is the lack of established procedures for performing tests for misspecified models and tests of statistical significance for the various parameters that have been estimated. This is a serious disadvantage in applications where there is a strong interest for testing not only the predictive power of a model or the sensitivity of the dependent variable to changes in the inputs but also the statistical significance of the result at a specified level of confidence. Significant correction for multiple hypothesis testing has been a central concern in many fields of research that deal with large sets of variables and small samples and where, as a consequence, the control of false positives becomes an important problem.

In such context data snooping, which occurs when a given set of data is used more than once for inference or model selection, it can be a serious problem. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than any merit inherent in the model yielding the result. In other words, looking long enough and hard enough at a given data set will often reveal one or more forecasting models that look good but are in fact useless [see][inter alia] White, 2000, Romano andWolf, 2005.

Unfortunately, as far as we know, there are no results addressing the problem just described in a neural network framework. The data snooping can be particularly serious when there is no theory supporting the modeling strategy as it is usual when using computational network analysis, which is basically atheoretical.

The aim of this chapter is to develop model selection strategies useful for computational network analysis based on statistical inference tools. In particular, we propose hypothesis testing procedures both for variable selection and model adequacy. The approach takes into account the problem of data snooping and uses resampling techniques to overcome the analytical and probabilistic difficulties related to the estimation of the sampling distribution of the test statistics involved. The chapter is organized as follows. Section 1.2 describes the structure of the data generating process and the neural network model considered. In Section 1.3, we address the problem of input selection and in Section 1.4 the selection of the hidden layer size. In both cases, application to simulated and real data are considered. Some remarks conclude the papers.

1.2 Feedforward Neural Network Models

Let the observed data be the realization of a sequence of random vectors of order , with and joint distribution . Moreover, let be the marginal distribution of . The random variables represent targets (in the neural network jargon) and it is usually of interest the probabilistic relationship with the variables , described by the conditional distribution of the random variable . Certain aspects of this probability law play an important role in interpreting what is learned by artificial neural network models. If , then and we can write

1.1

where and is a measurable function.

The function embodies the systematic part of the stochastic relation between and . On the data-generating process, we assume also that:

are independent and identically distributed (i.i.d.) random vectors; are independent of , and .
The random vectors have a compact support, say .

These conditions guarantee that has finite variance.

The function can be approximated by a single hidden layer feed-forward neural network defined as:

1.2

where is a vector of network weights, with compact subset of , and is the input vector augmented by a bias component 1. The network (Eq. (1.2)) has input neurons, neurons in the hidden layer and identity function for the output layer. The (fixed) hidden unit activation function is chosen in such a way that is continuous for each in the support of and is measurable for each in .

On the neural network model, we assume that

The activation function, , is sigmoidal.
The function has all the derivatives.

This latter assumption guarantees (Hornik, Stinchcombe, and Auer, 1994 inter alia) that feedforward neural networks with sufficiently many hidden units and properly adjusted parameters can approximate any function arbitrarily well. Moreover, Barron (1993) gives convergence rates for hidden layer feedforward networks with sigmoidal activation functions, approximating a class of functions that satisfy certain smoothness conditions.

Given a training set of observations, the estimation of the network weights (learning) is obtained by solving the optimization problem

1.3

where is a proper chosen loss function. Under general regularity conditions White (1989), a weight vector solving Eq. (1.3) exists and converges almost surely to , which solves

1.4

provided that the integral exists and the optimization problem has a unique solution vector interior to . Observe that this is not necessarily true for neural network models in the absence of appropriate restrictions since the parametrization of the network function is not unique and certain simple symmetry operations applied to the weight vector do not change the value of the output. For a sigmoid activation function, centered around 0, these symmetry operations correspond to an exchange of hidden units and multiplying all weights of connections going into and out of a particular hidden unit by 1. The permutability of hidden units generally results in a non-unique as there are numerous distinct weight vectors yielding identical network outputs. In any case, this may not be a main concern for different reasons. Firstly, several authors provide sufficient conditions to ensure uniqueness of in a suitable parameter space for specific network configurations. Particularly, for the case of sigmoidal activation functions with , it is possible to restrict attention only to weight vectors with (see Ossen and Rüugen, 1996). Secondly, the possible presence of multiple minima has no essential effect, at least asymptotically, for solutions to Eq. (1.4) (see White, 1989). Thirdly, several global optimization strategies (simulation annealing, genetic algorithms, etc.) are available to avoid being trapped in local minima and they have been successfully employed in neural network modeling. Finally, when the focus is on prediction, it can be shown that the unidentifiability can be overcome and the problem disappears (Hwang and Ding, 1997).

Asymptotic normality of the weight vector estimator can also be established. In particular, let and denote by and the gradient and the Hessian operators, respectively. Assume that and are nonsingular matrices. If general regularity conditions hold, then

where (White, 1989, theorem 2, p. 457).

These results make it possible to test the hypotheses about the connection strengths, which can be of great help in defining pruning...

Erscheint lt. Verlag	4.5.2015
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Mathematik ► Angewandte Mathematik
	Studium ► Querschnittsbereiche ► Epidemiologie / Med. Biometrie
	Naturwissenschaften ► Biologie
	Technik
ISBN-10	3-527-69154-5 / 3527691545
ISBN-13	978-3-527-69154-8 / 9783527691548

Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 27,0 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.