Preserving Digital Information - Henry Gladney

Blick ins Buch

Preserving Digital Information (eBook)

Henry Gladney (Autor)

eBook Download: PDF

2007
XXIII, 319 Seiten
Springer Berlin (Verlag)
978-3-540-37887-7 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

Cultural history enthusiasts have asserted the urgent need to protect digital information from imminent loss. This book describes methodology for long-term preservation of all kinds of digital documents. It justifies this methodology using 20^th century theory of knowledge communication, and outlines the requirements and architecture for the software needed. The author emphasizes attention to the perspectives and the needs of end users.

Henry M. Gladney is an industry consultant for digital preservation and document management. In 2001, he founded his own company, HMG Consulting, based in Saratoga, CA, after having worked for IBM Research for decades, designing - among other systems - a digital library service that is the core of today's IBM Content Manager®. He is a regular author in the top ACM periodicals, holds eleven patents, and produces the 'Digital Document Quarterly', an online newsletter that has discussed preservation extensively.

Henry M. Gladney is an industry consultant for digital preservation and document management. In 2001, he founded his own company, HMG Consulting, based in Saratoga, CA, after having worked for IBM Research for decades, designing – among other systems – a digital library service that is the core of today's IBM Content Manager®. He is a regular author in the top ACM periodicals, holds eleven patents, and produces the "Digital Document Quarterly", an online newsletter that has discussed preservation extensively.

Preface 8
Summary Table of Contents 16
Detailed Table of Contents 18
Figures 23
Tables 24
Part I: Why We Need Long-term Digital Preservation 25
1 State of the Art 31
1.1 What is Digital Information Preservation? 32
1.2 What Would a Preservation Solution Provide? 35
1.3 Why Do Digital Data Seem to Present Difficulties? 36
1.4 Characteristics of Preservation Solutions 38
1.5 Technical Objectives and Scope Limitations 43
1.6 Summary 45
2 Economic Trends and Social Issues 47
2.1 The Information Revolution 47
2.2 Economic and Technical Trends 49
2.3 Democratization of Information 54
2.4 Social Issues 55
2.5 Documents as Social Instruments 57
2.6 Why So Slow Toward Practical Preservation? 67
2.7 Selection Criteria: What is Worth Saving? 69
2.8 Summary 74
Part II: Information Object Structure 77
3 Introduction to Knowledge Theory 81
3.1 Conceptual Objects: Values and Patterns 82
3.2 Ostensive Definition and Names 84
3.3 Objective and Subjective: Not a Technological Issue 87
3.4 Facts and Values: How Can We Distinguish? 89
3.5 Representation Theory: Signs and Sentence Meanings 92
3.6 Documents and Libraries: Collections, Sets, and Classes 94
3.7 Syntax, Semantics, and Rules 96
3.8 Summary 98
4 Lessons from Scientific Philosophy 101
4.1 Intentional and Accidental Information 101
4.2 Distinctions Sought and Avoided 103
4.3 Information and Knowledge: Tacit and Human Aspects 106
4.4 Trusted and Trustworthy 109
4.5 Relationships and Ontologies 110
4.6 What Copyright Protection Teaches 112
4.7 Summary 114
5 Trust and Authenticity 117
5.1 What Can We Trust? 118
5.2 What Do We Mean by ‘Authentic’? 119
5.3 Authenticity for Different Information Genres 122
5.4 How Can We Preserve Dynamic Resources? 127
5.5 Summary 129
6 Describing Information Structure 133
6.1 Testable Archived Information 134
6.2 Syntax Specification with Formal Languages 135
6.3 Monographs and Collections 139
6.4 Digital Object Schema 141
6.5 From Ontology to Architecture and Design 148
6.6 Metadata 153
6.7 Summary 157
Part III: Distributed Content Management 159
7 Digital Object Formats 163
7.1 Character Sets and Fonts 163
7.2 File Formats 166
7.3 Perpetually Unique Resource Identifiers 176
7.4 Summary 184
8 Archiving Practices 187
8.1 Security 187
8.2 Recordkeeping Standards 197
8.3 Archival Best Practices 199
8.4 Repository Audit and Certification 200
8.5 Summary 202
9 Everyday Digital Content Management 205
9.1 Software Layering 207
9.2 A Model of Storage Stack Development 209
9.3 Repository Architecture 210
9.4 Archival Collection Types 220
9.5 Summary 226
Part IV: Digital Object Architecture for the Long Term 229
10 Durable Bit-Strings and Catalogs 233
10.1 Media Longevity 234
10.2 Replication to Protect Bit-Strings 237
10.3 Repository Catalog f Collection Consistency 238
10.4 Collection Ingestion and Sharing 239
10.5 Summary 241
11 Durable Evidence 243
11.1 Structure of Each Trustworthy Digital Object 244
11.2 Infrastructure for Trustworthy Digital Objects 251
11.3 Other Ways to Make Documents Trustworthy 256
11.4 Summary 257
12 Durable Representation 259
12.1 Representation Alternatives 260
12.2 Design of a Durable Encoding Environment 266
12.3 Summary 272
Part V: Peroration 275
13 Assessment and the Future 275
13.1 Preservation Based on Trustworthy Digital Objects 276
13.2 Open Challenges of Metadata Creation 280
13.3 Applied Knowledge Theory 283
13.4 Assessment of the TDO Methodology 285
13.5 Summary and Conclusion 287
Appendices 289
Appendix A: Acronyms and Glossary 289
Appendix B: Uniform Resource Identifier Syntax 304
Appendix C: Repository Requirements 306
Appendix D: Assessment with Independent Criteria 308
Appendix E: Universal Virtual Computer Specification 313
E.1 Memory Model 313
E.2 Machine Status Registers 314
E.3 Machine Instruction Codes 315
E.4 Organization of an Archived Module 320
E:5 Application Example 321
Appendix F: Software Modules Wanted 324
Bibliography 327

12 Durable Representation (p. 235-236)

We want unambiguous communication with future generations with whom dialog is impossible, without restricting what today’s authors can communicate. For this, we need language that we can confidently expect our descendants to understand easily. This challenge is the kind of language problem that has been central to computer science since it emerged as a discipline in the 1960s. Its core can be restated as, "ensure that an arbitrary computer program will execute correctly on a machine whose architecture is unknown when the program is saved."

The English logician A. M. Turing showed in 1937 (and various computing machine experts have put this into practice since then in various particular ways) that it is possible to develop code instruction systems for a computing machine which cause it to behave as if it were another, specified, computing machine. …

A code, which according to Turing's schema is supposed to make one machine behave as if it were another specific machine … must do the following things. It must contain, in terms that the machine will understand and (purposively obey), instructions … that will cause the machine to examine every order it gets and determine whether this order has the structure appropriate to an order of the second machine. It must then contain, in terms of the order system of the first machine, sufficient orders to make the machine cause the actions to be taken that the second machine would have taken under the influence of the order in question.

The important result of Turing's is that in this way the first machine can be caused to imitate the behavior of any other machine. von Neumann 1956, The Computer and the Brain, pp.70–71

Durable encoding, described in this chapter, represents difficult content types with the aid of programs written in virtual machine code - the code of a machine we call a UVC (Universal Virtual Computer). This Turing- Machine-equivalent virtual machine is simple compared to the designs of practical hardware. Its design can be specified completely, concisely, and unambiguously for future interpretation.

Objects to be preserved might consist of several source files, each represented as a bit-stream in a Fig. 32 digital object collection, with labeled links between parts of the complete package. Much of each TDO will be encoded using XML, relations, encryption algorithms, and identifiers. These are governed by relatively simple standards that are widely used - standards that we can be reasonably confident will be completely and correctly understood many years into the future. As described in §11.1, metadata can, and should, record the representation of each TDO component. The means for making each Fig. 32 content blob interpretable forever remains to be provided. What follows describes how this can be accomplished for a single content blob.

12.1 Representation Alternatives

We want information representation methods that can be embodied in tools whose use would be practical for information producers and consumers who do not have specialized skills or equipment.

Erscheint lt. Verlag	21.3.2007
Zusatzinfo	XXIII, 319 p.
Verlagsort	Berlin
Sprache	englisch
Themenwelt	Geisteswissenschaften ► Sprach- / Literaturwissenschaft
	Mathematik / Informatik ► Informatik
	Wirtschaft ► Betriebswirtschaft / Management ► Wirtschaftsinformatik
Schlagworte	Architecture • Archiv • authenticity • Computer • Content Management • Data archival • Digital Object Formats • Distributed Content Management • Document • Dublin Core Metadata Initiative - DCMI • Information Object Structure • information system • Long-Term Preservation • Management • Multimedia • Open Archival Information System - OAIS • Technology • Trust
ISBN-10	3-540-37887-1 / 3540378871
ISBN-13	978-3-540-37887-7 / 9783540378877

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 3,9 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

53,49 €