Data Analysis and Chemometrics for Metabolomics (eBook)
432 Seiten
Wiley (Verlag)
978-1-119-63940-4 (ISBN)
Understand new modes of analysing metabolomic data
Metabolomics is the study of metabolites, small molecules and chemical substrates within cells or larger structures which collectively make up the metabolome. The field of metabolomics stands to benefit enormously from chemometrics, an approach which brings advanced statistical techniques to bear on data of this kind.
Data Analysis and Chemometrics for Metabolomics constitutes an accessible introduction to chemometric techniques and their applications in the field of metabolomics. Thoroughly and accessibly written by a leading expert in chemometrics, and printed in full-colour, it brings robust data analysis into conversation with the metabolomic field to the immense benefit of practitioners.
Data Analysis and Chemometrics for Metabolomics readers will also find:
- Statistical insights into the nature of metabolomic hypothesis testing, validation, and more
- All metabolomics data sets from the book on a companion website
- Case studies from human, animal, plant and bacterial biology
Data Analysis and Chemometrics for Metabolomics is ideal for practitioners in the life sciences, clinical sciences and chemistry, as well as metabolomics researchers or developers of research instruments looking to apply cutting-edge analytical techniques, and statisticians developing methods to design experiments and analyse large datasets of clinical and biological origin.
Richard G. Brereton, PhD, is Professor Emeritus at the University of Bristol, UK, and Director of Brereton Consultancy. He has published widely on chemometrics and related subjects, and serves as Editor-in-Chief for Heritage Science.
Understand new modes of analysing metabolomic data Metabolomics is the study of metabolites, small molecules and chemical substrates within cells or larger structures which collectively make up the metabolome. The field of metabolomics stands to benefit enormously from chemometrics, an approach which brings advanced statistical techniques to bear on data of this kind. Data Analysis and Chemometrics for Metabolomics constitutes an accessible introduction to chemometric techniques and their applications in the field of metabolomics. Thoroughly and accessibly written by a leading expert in chemometrics, and printed in full-colour, it brings robust data analysis into conversation with the metabolomic field to the immense benefit of practitioners. Data Analysis and Chemometrics for Metabolomics readers will also find: Statistical insights into the nature of metabolomic hypothesis testing, validation, and more All metabolomics data sets from the book on a companion website Case studies from human, animal, plant and bacterial biology Data Analysis and Chemometrics for Metabolomics is ideal for practitioners in the life sciences, clinical sciences and chemistry, as well as metabolomics researchers or developers of research instruments looking to apply cutting-edge analytical techniques, and statisticians developing methods to design experiments and analyse large datasets of clinical and biological origin.
CHAPTER 1
Introduction
The subject matter of this book is a synthesis between chemometrics and metabolomics, both relatively recent scientific disciplines. This chapter describes the background to these disciplines and then introduces the background to the case studies which are used to illustrate the chemometric methods and describes some software packages that can be used to obtain results described in this text.
1.1 CHEMOMETRICS
The name chemometrics was first proposed by Svante Wold in 1972 in the context of spline fitting [1]. Together with Bruce Kowalski, they founded the International Chemometrics Society and the term slowly took off in the 1970s. However, the pioneers did not widely use this term for some years, but a major event that catalysed it was a workshop in Cosenza, Italy, in 1983 [2] where many of the early pioneers met. After this time several initiatives took off, including the main niche journals, Journal of Chemometrics (Wiley) [3] and Chemometrics and Intelligent Laboratory Systems (Elsevier) [4], together with courses and the first textbooks [5, 6] with regular reviews and ACS (American Chemical Society) symposia starting a few years earlier [7].
However, these events primarily concern name recognition and organisation, and the main seeds for the subject were sown many years earlier.
Applied statistics was one of the main influences on chemometrics, although the two approaches have diverged in recent years. The modern framework for applied statistics was developed in the early 20th century and we still use terminology first defined during these decades. Before that, early academic statistics was mainly mathematical and theoretical, often linked to probability theory, game theory, statistical mechanics, distributions etc. and viewed as a subdiscipline of mathematics. Although many early pioneers had already used approaches previously that we would now regard as the forerunners of modern applied statistics, their ideas were not well incorporated into mainstream thinking until the early 20th century.
A problem in the 19th century was partly the division of academic disciplines. Would a mathematician talk to a biologist? They worked in separate institutes and had separate libraries and training. For applied statistics to develop, less insular thinking was required. There also needed to be some level of non‐academic contribution as many of the catalysts were at the time linked to industrial, agricultural and medical problems. With core academic disciplines, the application of statistical methods in physics and chemistry, which would eventually progress to quantum mechanics and statistical mechanics, fell outside mainstream applied statistics and has led to specialist statistically‐based methods that are largely unrelated to chemometrics.
However, in the first three decades of the 20th century, there was a revolution in thinking. Such changes primarily involved formalising ideas that had been less well established over the previous decades and even centuries. Karl Pearson [8] and William Gossett publishing under the pseudonym ‘Student’ [9] are recognised as two of the early pioneers. Pearson set up the first statistics department in the world, based in London, and his 1900 paper first introduced the idea of a p value, although historic predecessors can be traced several centuries back [10–12].
It was not until after the First World War that applied statistical methods were properly formalised in their modern incarnation. Ronald Fisher was possibly the most important figure in developing a modern framework for statistical methodology that many people still use today. In 1925 he published Statistical Methods for Research Workers [13] and established the concepts of p values, significance tests and ANOVA (analysis of variance). Ten years later he wrote a book that described the basis of almost all statistical experimental designs [14] used even now, and his paper on classification of irises (the plants) [15] is an essential introduction to multivariate classification techniques, with this dataset used even now for demonstrating and comparing new approaches. Other important workers over that period, included Harold Hotelling, who among others was attributed with progressing the widespread use and recognition of PCA (principal components analysis) [16, 17] and Jerzy Neymar and Ergon Pearson who developed alternative approaches to hypothesis tests to those proposed by Fisher [18].
During the interwar period, many of the cornerstones of modern applied statistics were developed, and we continue to use methods first introduced during this era; many approaches used in chemometrics have a hundred‐year vintage. However, there were some significant differences from modern practice. There was no capacity to perform intensive computations or generate large quantities of analytical data, so applications were more limited. Agriculture was at the forefront. During this era, the old land‐owning classes had to modernise to survive: many farm labourers left for the cities and agriculture became more automated. The relationship between landowners and tenants weakened and larger farms were viewed more as an industry rather than the birthright of aristocratic classes. This required a significant change in production, and agricultural statistics was very important, especially to improve the economies of Western Nations. Other important driving forces came from the use of psychology to interpret test scores, and from economics. Common to all these types of data is that experiments involved considerable investment in time, so it was reasonable to spend substantial effort analysing the results, some required weeks of manual calculations, as data was expensive and precious. In modern days, spectra, in contrast, can be obtained relatively rapidly and quickly, so spending days or weeks performing statistical calculations would be an unbalanced use of resources.
Furthermore, without the aid of computers many of the multivariate methods we now take for granted would involve a large amount of time. Salsburg [19] claims as follows: ‘To get some idea of the physical effort involved, consider Table VII that appears on page 123 of Studies in Crop Variation. I. [20]. If it took about one minute to complete a single large‐digit multiplication, I estimate that Fisher needed about 185 hours of work to generate that table. There are fifteen tables of similar complexity and four large complicated graphs in the article. In terms of physical labor alone, it must have taken at least eight months of 12‐hour days to prepare the tables for this article.’ Of course, Fisher would have had many assistants to perform calculations, and he would have been very well resourced compared to most workers of the time. Hence, only quite limited statistical studies could be performed routinely. Some algorithms and designs such as Yates' algorithm [21] were developed with simplicity of calculation in mind as the data had special mathematical properties and although still reported in some textbooks even now are not so crucial to know about with the advent of modern computing power. Computers can invert large matrices very quickly, whereas a similar calculation might take days or longer using manual methods. In areas such as quantum chemistry, a calculation that may take up an entire PhD via manual calculations can now be done in seconds or less using modern computing.
The statistician of the first half of the 20th century would be armed with logarithm tables, calculators, slide rules and special types of graph paper, and in many cases would tackle less data‐rich problems than nowadays. However, there was a gap between the mathematical literature where quite sophisticated methods could be described, often in intensely theoretical language, and the practical applications of much more limited and in most cases simpler approaches. Many of the more elaborate methods of those early days would not have much widespread practical use, but modern‐day multivariate statistics can now take advantage of them. The chemometrician can routinely use methods that on very large spectroscopic or chromatographic datasets that were inconceivable prior to the widespread availability of modern computers.
In the post‐war years, chemical manufacturing was of increased importance and multivariate methods were applied by industrial chemical engineers [22]. G.E.P. Box worked with a group in the chemical company ICI in the UK for some years, before moving to the US. His text [23] written together with two co‐authors, is considered a classic in modern statistical thinking for applied scientists emphasising experimental design and regression modelling and brings the work of the early 20th century into the modern era.
In the 1970s, mainstream applied statistics started to diverge from chemometrics. In chemometrics, we often come across short fat datasets, where the number of variables may far exceed the number of samples. For example, we may record thousands of mass spectral or NMR or chromatographic data points for each of perhaps 20–100 samples. These sorts of problems were not conceivable to the original statistical pioneers, measurements were expensive, so variables were scarce. Fisher's classic iris data [15] consisted of 150 samples but only four variables. Once sample sizes are less than...
Erscheint lt. Verlag | 8.7.2024 |
---|---|
Sprache | englisch |
Themenwelt | Naturwissenschaften ► Chemie |
ISBN-10 | 1-119-63940-9 / 1119639409 |
ISBN-13 | 978-1-119-63940-4 / 9781119639404 |
Haben Sie eine Frage zum Produkt? |
Größe: 27,4 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich