Digital Document Processing (eBook)
XX, 464 Seiten
Springer London (Verlag)
978-1-84628-726-8 (ISBN)
This book brings all the major and frontier topics in the field of document analysis together into a single volume, creating a unique reference source that will be invaluable to a large audience of researchers, lecturers and students working in this field. With chapters written by some of the most distinguished researchers active in this field, this book addresses recent advances in digital document processing research and development.
With the advent of the Digital Library initiative, web document processing and biometric aspects of digital document processing, together with new techniques of printed and handwritten Optical Character Recognition (OCR), a good overview of this fast-developing field is invaluable. In this book, all the major and frontier topics in the field of document analysis are brought together into a single volume creating a unique reference source.Highlights include: Document structure analysis followed by OCR of Japanese, Tibetan and Indian printed scripts. Online and offline handwritten text recognition approaches; Japanese postal and Arabic check processing; Document image quality modelling, mathematical expression recognition, graphics recognition, document information retrieval, super resolution text, metadata extraction in digital library; Biometric and forensic aspects: individuality of handwriting detection; Web document analysis, text and hypertext mining and bank check data mining.Containing chapters written by some of the most eminent researchers active in this field, this book can serve as a handbook for the research scholar as well as a supporting book for advanced graduate students interested in document processing or image analysis.
Preface 6
Contents 8
Contributors 17
1 Reading Systems: An Introduction to Digital Document Processing 20
1.1 Introduction 20
1.2 Text Sensing 22
1.3 Sensor Scope 22
1.4 Sensor Grid 25
1.5 Pre-processing 25
1.6 Invariance to Affine Transforms 26
1.7 Invariance to Ink-Trace Thickness 28
1.8 Shape Features 29
1.9 Processing Type 31
1.10 Computing Architecture 32
1.11 Computing Strategy 32
1.12 Knowledge Base 33
1.13 Cognitive Reliability 34
1.14 Response in Case of Difficult Input 34
1.15 Classification Accuracy 35
1.16 Energy and Mental Concentration 36
1.17 Processing Speed 36
1.18 Volume Processing 36
1.19 Summary of Human Versus Machine Reading 37
1.20 Conclusion 45
References 45
2 Document Structure and Layout Analysis 48
2.1 Introduction 48
2.2 Pre-processing 50
2.3 Representing Document Structure and Layout 53
2.4 Document Layout Analysis 55
2.5 Understanding Document Structure 61
2.6 Performance Evaluation 62
2.7 Handwritten Document Analysis 64
2.8 Summary 65
References 66
3 OCR Technologies for Machine Printed and Hand Printed Japanese Text 68
3.1 Introduction 68
3.2 Pre-Processing 68
3.3 Feature Extraction 77
3.4 Classification 80
3.5 Dimension Reduction 82
3.6 Performance Evaluation of OCR Technologies 83
3.7 Learning Algorithms 86
3.8 Conclusion 88
References 89
4 Multi-Font Printed Tibetan OCR 91
4.1 Introduction 91
4.2 Properties of Tibetan Characters and Scripts 92
4.3 Isolated Tibetan Character Recognition 96
4.4 Tibetan Document Segmentation 106
4.5 Experiment Results 112
4.6 Summary 114
Acknowledgments 114
References 114
5 On OCR of a Printed Indian Script 117
5.1 Introduction 117
5.2 Origin and Properties of Indian Scripts 118
5.3 Document Pre-Processing 122
5.4 Character Recognition 125
5.5 Performance Analysis 132
5.6 Conclusion 135
Acknowledgments 135
References 136
6 A Bayesian Network Approach for On-line Handwriting Recognition 138
6.1 Introduction 138
6.2 Modelling of Character Components and Their Relationships 141
6.3 Recognition and Training Algorithms 147
6.4 Experimental Results and Analysis 149
6.5 Conclusions 156
References 157
7 New Advances and New Challenges in On- Line Handwriting Recognition and Electronic Ink Management 159
7.1 Introduction 159
7.2 On-Line Handwriting Recognition Systems 160
7.3 New Trends in On-Line Handwriting Recognition 160
7.4 New Trends in Electronic Ink Management Systems 164
7.5 Conclusion, Open Problems and New Challenges 172
References 173
8 Off-Line Roman Cursive Handwriting Recognition 181
8.1 Introduction 181
8.2 Methodology 182
8.3 Emerging Topics 187
8.4 Outlook and Conclusions 191
Acknowledgment 192
References 192
9 Robustness Design of Industrial Strength Recognition Systems 200
9.1 Characterization of Robustness 200
9.2 Complex Recognition System: Postal Address Recognition 202
9.3 Performance Influencing Factors 204
9.4 Robustness Design Principles 209
9.5 Robustness Strategy for Implementation 218
9.6 Conclusions 224
Acknowledgments 224
References 225
10 Arabic Cheque Processing System: Issues and Future Trends 228
10.1 Introduction 228
10.2 Datasets 229
10.3 Legal Amount Processing 230
10.4 Courtesy Amount Processing 237
10.5 Conclusion and Future Perspective 245
References 247
11 OCR of Printed Mathematical Expressions 250
11.1 Introduction 250
11.2 Identification of Expressions in Document Images 252
11.3 Recognition of Expression Symbols 256
11.4 Interpretation of Expression Structure 260
11.5 Performance Evaluation 266
11.6 Conclusion and Future Research 270
References 271
12 The State of the Art of Document Image Degradation Modelling 275
12.1 Introduction 275
12.2 Document Image Degradations 276
12.3 The Measurement of Image Quality 278
12.4 Document Image Degradation Models 280
12.5 Applications of Models 284
12.6 Public-Domain Software and Image Databases 286
12.7 Open Problems 287
Acknowledgments 289
References 289
13 Advances in Graphics Recognition 294
13.1 Introduction 294
13.2 Application Scenarios 297
13.3 Early Processing 300
13.4 Symbol Recognition and Indexing 301
13.5 Architectures and Meta-data Modelling 302
13.6 On-Line Graphics Recognition and Sketching Interfaces 304
13.7 Performance Evaluation 306
13.8 An Application Scenario: Interpretation of Architectural Sketches 307
13.9 Conclusions: Sketching the Future 308
Acknowledgment 310
References 310
14 An Introduction to Super-Resolution Text 317
14.1 Introduction 317
14.2 Super-Resolution: An Analytical Model 319
14.3 MISO Super-Resolution: A Closer Look 320
14.4 Case Study: SURETEXT– Camera-Based SRText 330
14.5 Conclusions 337
Acknowledgment 337
References 337
15 Meta-Data Extraction from Bibliographic Documents for the Digital Library 340
15.1 Introduction 340
15.2 The Users’ Needs 341
15.3 Bibliographic Elements as Descriptive Meta-Data 342
15.4 Meta-Data Extraction in Bibliographic Documents 344
15.5 General Overview of the Work 344
15.6 Bibliographic Element Recognition for Library Management 346
15.7 Bibliographic Reference Structure in Technological Watch 352
15.8 Citation Analysis in Research Piloting and Evaluation 355
15.9 Conclusion 360
References 360
16 Document Information Retrieval 362
16.1 Introduction 362
16.2 Document Retrieval Based on the Vector-Space Model 363
16.3 Applications 374
16.4 Summary and Conclusion 386
References 386
17 Biometric and Forensic Aspects of Digital Document Processing 390
17.1 Introduction 390
17.2 Image Pre-processing and Interactive Tools 392
17.3 Discriminating Elements and Their Similarities 395
17.4 Writer Verification 397
17.5 Signature Verification 405
17.6 Concluding Remarks 414
References 414
18 Web Document Analysis 417
18.1 Introduction 417
18.2 Web Content Extraction, Repurposing and Mining 418
18.3 Web Image Analysis 421
18.4 Web Document Modelling and Annotation 424
18.5 Concluding Remarks 426
References 426
19 Semantic Structure Analysis of Web Documents 430
19.1 Introduction 430
19.2 Related Work 431
19.3 Semantic Structure of Web Documents 433
19.4 Vision-based Page Segmentation (VIPS) 435
19.5 Determining Topic Coherency of Web Page Segments 440
19.6 Extracting Semantic Structure by Integrating Visual and Content Information 441
19.7 Advantages of the Integrated Approach 443
19.8 Conclusions and Discussion 443
References 444
20 Bank Cheque Data Mining: Integrated Cheque Recognition Technologies 446
20.1 Introduction 446
20.2 Challenges of the Cheque Processing Industry 447
20.3 Payee Name Recognition 452
20.4 Cheque Mining with A2iA CheckReaderTM 460
20.5 Conclusions 466
References 466
Index 468
Erscheint lt. Verlag | 13.3.2007 |
---|---|
Reihe/Serie | Advances in Computer Vision and Pattern Recognition | Advances in Computer Vision and Pattern Recognition |
Zusatzinfo | XX, 464 p. |
Verlagsort | London |
Sprache | englisch |
Themenwelt | Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik |
Schlagworte | Bayesian Network • Calculus • Cognition • Data Mining • digital document processing • Digital Libraries • Dodument data mining • graphics • handwriting • Handwriting Recognition • Hypertext • Image Analysis • Information Retrieval • Layout • Mathematica • OCR • Optical character recognition, • optical character recognition (OCR) • Text Mining • Web document analysis |
ISBN-10 | 1-84628-726-X / 184628726X |
ISBN-13 | 978-1-84628-726-8 / 9781846287268 |
Haben Sie eine Frage zum Produkt? |
Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich