Database Annotation in Molecular Biology (eBook)

Principles and Practice

Arthur M. Lesk (Herausgeber)

eBook Download: PDF
2005 | 1. Auflage
288 Seiten
Wiley (Verlag)
978-0-470-85685-7 (ISBN)

Lese- und Medienproben

Database Annotation in Molecular Biology -
Systemvoraussetzungen
108,99 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Two factors dominate current molecular biology: the amount of raw data is increasing very rapidly and successful applications in biomedical research require carefully curated and annotated databases. The quality of the experimental data -- especially nucleic acid sequences -- is satisfactory; however, annotations depend on features inferred from the data rather than measured directly, for instance the identification of genes in genome sequences. It is essential that these inferences are as accurate as possible and this requires human intervention.

With the recognition of the importance of accurate database annotation and the requirement for individuals with particular constellations of skills to carry it out, annotators are emerging as specialists within the profession of bioinformatics. This book compiles information about annotation -- its current status, what is required to improve it, what skills must be brought to bear on database curation and hence what is the proper training for annotators.

The book should be essential reading for all people working on biological databases, both biologists and computer scientists. It will also be of interest to all users of such databases, including molecular biologists, geneticists, protein chemists, clinicians and drug developers.

Arthur M. Lesk, The Pennsylvania State University, U.S.A.


Two factors dominate current molecular biology: the amount of raw data is increasing very rapidly and successful applications in biomedical research require carefully curated and annotated databases. The quality of the experimental data -- especially nucleic acid sequences -- is satisfactory; however, annotations depend on features inferred from the data rather than measured directly, for instance the identification of genes in genome sequences. It is essential that these inferences are as accurate as possible and this requires human intervention. With the recognition of the importance of accurate database annotation and the requirement for individuals with particular constellations of skills to carry it out, annotators are emerging as specialists within the profession of bioinformatics. This book compiles information about annotation -- its current status, what is required to improve it, what skills must be brought to bear on database curation and hence what is the proper training for annotators. The book should be essential reading for all people working on biological databases, both biologists and computer scientists. It will also be of interest to all users of such databases, including molecular biologists, geneticists, protein chemists, clinicians and drug developers.

Arthur M. Lesk, The Pennsylvania State University, U.S.A.

Database Annotation in Molecular Biology 3
Contents 7
Preface 11
List of Contributors 13
1 Annotation and Databases: Status and Prospects 15
1.1 Introduction 15
1.2 Annotation of Genomic Data 17
1.3 Databases: Concepts and Definitions 23
1.4 Access to Annotation Databases 26
Glossary 33
References 34
I THE DATABANKS 37
2 Survey of Sequence Databases: Archival Projects 39
2.1 Introduction 39
2.2 Nucleotide Sequence Databases 41
2.3 Swiss-Prot 47
2.4 TrEMBL 53
2.5 PIR 54
2.6 UniProt 56
References 57
3 Survey of Sequence Databases: Derived Databases 59
3.1 Introduction 59
3.2 Protein and Gene Family Databases 61
3.3 Discussion 72
References 74
4 Databanks of Macromolecular Structure 77
4.1 Introduction 77
4.2 Background 78
4.3 Archival Structural Databases Now 82
4.4 Contextual Databases 87
4.5 Derived Structural Data Databases 88
4.6 Summary and View of the Future 90
References 91
5 Gene Expression Databases 95
5.1 Introduction 95
5.2 What Do We Mean by Microarray Gene Expression Data? 97
5.3 Data Complexity 97
5.4 Minimum Information About a Microarray Experiment (MIAME) 99
5.5 Journals and MIAME 102
5.6 Storage and Exchange Formats: MAGE-OM and MAGE-ML 103
5.7 ArrayExpress 105
5.8 Annotation Tools 106
5.9 Curation 106
5.10 Standardization and Semantics 107
5.11 Public Microarray Databases 108
5.12 ArrayExpress, an Example of a Public Repository 108
5.13 Submissions to ArrayExpress 108
5.14 MIAMExpress and Other MIAME Compliant Annotation Systems 109
5.15 Databases of Protein Expression Patterns 109
5.16 The Gene Expression Database (GXD) 110
5.17 Conclusion 111
References 111
II THE BASIS OF ANNOTATION 113
6 Taxonomy: a Moving Target for Sequence Data 115
6.1 Introduction 116
6.2 Nomenclature 118
6.3 Operational Definitions 120
6.4 Searching for the Taxonomic Gold Standard 123
6.5 Conclusions 126
References 126
7 Genomics and Proteomics: Design and Sources of Annotation 127
7.1 Beyond the Sequence: the Challenge of Complete Genome Analysis 128
7.2 Extracting the Genes 128
7.3 Organism Specific Peculiarities 130
7.4 Topology of Genomes 131
7.5 Gene Extraction Pipelines 132
7.6 Added Value and Knowledge 135
7.7 Beyond the Parts List 138
References 140
8 Annotation of Protein Sequences 145
8.1 Introduction 146
8.2 What is Annotation? 146
8.3 UniProt: Universal Protein Resource 147
8.4 Protein Family Classification 148
8.5 InterPro: Integrated Resource of Protein Families, Domains and Sites 148
8.6 PIR Protein Families and Superfamilies 149
8.7 Ontologies 150
8.8 Protein Names, Source Information and Unique Identifiers 151
8.9 Common Identification Errors 152
8.10 Evidence Attribution 153
8.11 Position Specific Annotations 155
8.12 Rule-based Annotation 156
8.13 Conclusions 158
Acknowledgements 159
References 159
9 Issues in the Annotation of Protein Structures 163
9.1 Data Harvesting 165
9.2 Identification of the Biologically Relevant Assembly 166
9.3 Taxonomy 168
9.4 Sequence Recognition and Cross-reference 169
9.5 Recognition of Secondary Structure Elements 170
9.6 Validation of Structures 171
9.7 Residue Identification 172
9.8 Hetgroup Identification 173
9.9 Solvent Handling 175
9.10 Miscellaneous Annotation Issues 175
9.11 Conclusions 177
References 177
10 Classification of Protein Function 181
10.1 Introduction 181
10.2 Mechanisms of Divergence of Protein Function 183
10.3 Classification of Protein Functions 185
10.4 Methods for Assigning Protein Function 189
10.5 Applications of Full-organism Information: Inferences from Genomic Context and Protein Interaction Patterns 193
10.6 Conclusions 194
References 194
III DATABASE DESIGN AND INTEGRATION 199
11 Information Flow and Data Integration of Databanks 201
11.1 Introduction 201
11.2 Information Flow Among Databanks 202
11.3 Database Distribution Format 206
11.4 Genome Annotation Errors and Error Propagation 209
11.5 Data Integration and Knowledge Discovery: iProClass Case Study 210
11.6 Conclusions 212
Acknowledgements 213
References 213
12 Models of Database Interconnectivity 217
12.1 Introduction 217
12.2 Heterogeneity in Bioinformatics Data Management 218
12.3 Data Models 220
12.4 Architectures for Data Integration 225
12.5 Implementing a Database Federation 228
12.6 Conclusions 232
References 233
13 The European Bioinformatics Institute Macromolecular Structure Relational Database Technology 237
13.1 Database Design Process 239
13.2 Loading and Exporting Data in mmCIF 240
13.3 Exporting mmCIFs or XML Files from the Deposition Database 243
13.4 Subtypes and ‘Leaf Views’ 243
13.5 Maintenance Aspects 244
13.6 Data Clean-up 245
13.7 The Search Database 246
13.8 Transformation 248
13.9 Incremental Transformation 248
13.10 Replication 249
13.11 Oracle Cartridge Applications 250
13.12 Related Data Warehouse 252
Acknowledgements 252
References 252
IV CONCLUSIONS AND PROSPECTS 255
14 Looking Around, Looking Ahead 257
Index 259

Erscheint lt. Verlag 1.9.2005
Sprache englisch
Themenwelt Studium Querschnittsbereiche Infektiologie / Immunologie
Naturwissenschaften Biologie Genetik / Molekularbiologie
Schlagworte Bioinformatics & Computational Biology • Bioinformatik • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • Life Sciences • Molekularbiologie
ISBN-10 0-470-85685-8 / 0470856858
ISBN-13 978-0-470-85685-7 / 9780470856857
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)
Größe: 10,5 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Antibiotika, Virostatika, Antimykotika, Antiparasitäre Wirkstoffe

von Hans-Reinhard Brodt; Achim Hörauf; Michael Kresken …

eBook Download (2023)
Georg Thieme Verlag KG
149,99