Handbook of Signal Processing Systems (eBook)
XIII, 1210 Seiten
Springer International Publishing (Verlag)
978-3-319-91734-4 (ISBN)
In this new edition of the Handbook of Signal Processing Systems, many of the chapters from the previous editions have been updated, and several new chapters have been added. The new contributions include chapters on signal processing methods for light field displays, throughput analysis of dataflow graphs, modeling for reconfigurable signal processing systems, fast Fourier transform architectures, deep neural networks, programmable architectures for histogram of oriented gradients processing, high dynamic range video coding, system-on-chip architectures for data analytics, analysis of finite word-length effects in fixed-point systems, and models of architecture.
There are more than 700 tables and illustrations; in this edition over 300 are in color.
Shuvra S. Bhattacharyya is a Professor in the Department of Electrical and Computer Engineering at the University of Maryland, College Park. He holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). From January 2015 through December 2018, he is also a part-time visiting professor in the Department of Pervasive Computing at the Tampere University of Technology, Finland, as part of the Finland Distinguished Professor Programme (FiDiPro). His research interests include signal processing, embedded systems, electronic design automation, wireless communication, and wireless sensor networks. He received the B.S. degree from the University of Wisconsin at Madison, and the Ph.D. degree from the University of California at Berkeley. He has held industrial positions as a Researcher at the Hitachi America Semiconductor Research Laboratory (San Jose, California), and Compiler Developer at Kuck & Associates (Champaign, Illinois). He has held a visiting research position at the US Air Force Research Laboratory (Rome, New York). He has been a Nokia Distinguished Lecturer (Finland) and Fulbright Specialist (Austria and Germany). He has received the NSF Career Award (USA). He is a Fellow of the IEEE.
Shuvra S. Bhattacharyya is a Professor in the Department of Electrical and Computer Engineering at the University of Maryland, College Park. He holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). From January 2015 through December 2018, he is also a part-time visiting professor in the Department of Pervasive Computing at the Tampere University of Technology, Finland, as part of the Finland Distinguished Professor Programme (FiDiPro). His research interests include signal processing, embedded systems, electronic design automation, wireless communication, and wireless sensor networks. He received the B.S. degree from the University of Wisconsin at Madison, and the Ph.D. degree from the University of California at Berkeley. He has held industrial positions as a Researcher at the Hitachi America Semiconductor Research Laboratory (San Jose, California), and Compiler Developer at Kuck & Associates (Champaign, Illinois). He has held a visiting research position at the US Air Force Research Laboratory (Rome, New York). He has been a Nokia Distinguished Lecturer (Finland) and Fulbright Specialist (Austria and Germany). He has received the NSF Career Award (USA). He is a Fellow of the IEEE.Ed F. Deprettere received the M.Sc degree in Electrical Engineering from Ghent University (Belgium) in 1968, and the Ph.D. degree in Electrical Engineering from Delft University of Technology (The Netherlands) in 1980. At Delft he has held visiting research positions at Stanford University, University of California at Los Angeles, and Philips Research. In 2000 he joined Leiden University as full professor in the Leiden Institute of Advanced Computer Science, where he has been head of the Leiden Embedded Research Center. While in Leiden, he held visiting professor positions at the Indian Institute of Science (Bangalore, India), and the Technical University of Sofia (Bulgaria). He was founder of the Daedalus foundation which was to coordinate the co-operation with Sofia. Ed F. Deprettere is now emeritus professor since 2012.Rainer Leupers received the M.Sc. (Dipl.-Inform.) and Ph.D. (Dr. rer. nat.) degrees in Computer Science with honors from TU Dortmund in 1992 and 1997. From 1997-2001 he was the chief engineer at the Embedded Systems chair at TUDortmund. In 2002, Dr. Leupers joined RWTH Aachen University as a professor for Software for Systems on Silicon. Since then, he has also been a visiting faculty member at the ALARI Institute in Lugano. His research and teaching activities comprise software development tools, processor architectures, and system-level electronic design automation, with focus on application-specific multicore systems. He published numerous books and technical papers, and heserved in committees of leading international conferences, including DAC, DATE, and ICCAD. He was a co-chair of the MPSoC Forum and SCOPES. Dr. Leupers received several scientific awards, including Best Paper Awards at DATE 2000, 2008 and DAC 2002, as well as multiple industry awards. He holds several patents on processor design automation technologies and has been a co-founder of LISATek (now with Synopsys), Silexica, and Secure Elements. He has served as consultant for various companies, as an expert for the European Commission, and in the management boards of large-scale projects like UMIC, HiPEAC, Eurolab4HPC, and ARTIST. He is the coordinator of EU projects TETRACOM and TETRAMAX on academia-industry technology transfer.Jarmo Takala received his M.Sc. (hons) degree in Electrical Engineering and Dr.Tech. degree in Information Technology from Tampere University of Technology, Tampere, Finland (TUT) in 1987 and 1999, respectively. He has held industrial positions as a Research Scientist at VTT-Automation, Tampere, Finland, and a Senior Research Engineer at Nokia Research Center. Since 2000, he has been Professor in Computer Engineering at TUT and held various positions as Head of Department of Computer Engineering, Dean of Faculty of Computing and Electrical Engineering, and currently Vice President. Dr. Takala is a senior member of IEEE and has acted as the Chair of IEEE Signal Processing Society's Design and Implementation of Signal Processing Systems Technical Committee.
Foreword 6
Preface 8
Contents 10
Part I Applications 13
Signal Processing Methods for Light Field Displays 14
1 Introduction 14
2 Light Field Basics 15
2.1 Plenoptic Function 15
2.2 Light Field Parametrization 15
2.3 Light Ray Propagation 16
2.4 Epipolar Plane Images 18
2.5 Fourier Domain Representation 20
2.6 Plenoptic Sampling 21
2.7 Densely Sampled Light Field 23
3 Light Field Displays 24
3.1 Visual Cues 24
3.2 From Ideal to Real Light Field Display 26
3.3 Overview of Current Light Field (Type) Displays 28
3.3.1 Integral Imaging 28
3.3.2 Super-Multiview Displays 30
3.3.3 Projection-Based Displays 32
3.3.4 Holographic Stereograms 33
3.3.5 Tensor Displays 35
4 Display Specific Light Field Analysis 37
4.1 Display-related Ray Propagation 38
4.2 Display Bandwidth 40
4.3 Display-Camera Setup and Optimization 42
4.3.1 Light Field Display Setup Optimization 42
4.3.2 Camera Setup Optimization 44
5 Reconstruction of Densely Sampled Light Field 47
5.1 Plenoptic Modelling, Depth Layering and Rendering 47
5.2 Reconstruction of DSLF in Directional Transform Domain 49
5.2.1 Directional Transforms 49
5.2.2 Shearlet Transform 50
5.2.3 DSLF Reconstruction in Shearlet Domain 52
5.2.4 Other Sparsifying Transforms 55
6 Conclusions 56
References 57
Inertial Sensors and Their Applications 62
1 Introduction to Inertial Sensors 62
1.1 Accelerometers 63
1.2 Gyroscopes 65
1.3 Areas of Application 65
1.3.1 Navigation 66
1.3.2 Automotive 67
1.3.3 Industrial 67
1.3.4 Consumer Products 67
1.3.5 Sport 68
2 Performance of Inertial Sensors 69
2.1 Effect of Different Sources of Error 70
2.1.1 Calibration of Inertial Sensors 71
2.1.2 Allan Variance 72
2.1.3 Modeling the Measurement Errors 73
2.2 Sensor Quality Grade 74
3 Pedestrian Dead Reckoning 76
3.1 INS Mechanization 76
3.2 Step Detection with Accelerometers 79
3.3 Step Length Estimation 80
3.4 PDR Mechanization 82
3.5 Effect of Sensor Quality Grade to the Accuracy of PDR 83
4 Infering Context with Inertial Sensors 87
4.1 Pattern Recognition 88
4.2 Feature Extraction 89
4.3 Classification Accuracy 91
4.4 Areas of Application 92
5 Summary 93
References 93
Finding It Now: Networked Classifiers in Real-Time Stream Mining Systems 97
1 Defining Stream Mining 98
1.1 Motivation 98
1.1.1 Application 1: Semantic Concept Detection in Multimedia Processing Heterogeneous and Dynamic Data in a Resource-Constrained Setting
1.1.2 Application 2: Online Healthcare Monitoring Processing Data in Real Time
1.1.3 Application 3: Analysis of Social Graphs Coping with Decentralized Information and Setup
1.2 From Data Mining to Stream Mining 101
1.2.1 Data Mining 101
1.2.2 Changing Paradigm 102
1.3 Problem Formulation 103
1.3.1 Classifiers 103
1.3.2 Axis for Study 104
1.4 Challenges 106
1.4.1 Coping with Complex Data: Large-Scale, Heterogeneous and Time-Varying 106
1.4.2 Immediacy 106
1.4.3 Distributed Information and Knowledge Extraction 107
1.4.4 Resource Constraints 107
2 Proposed Systematic Framework for Stream Mining Systems 108
2.1 Query Process Modeled as Classifier Chain 108
2.1.1 A-Priori Selectivity 108
2.1.2 Classifier Performance 109
2.1.3 Throughput and Goodput of a Chain of Classifiers 109
2.2 Optimization Objective 110
2.2.1 Misclassification Cost 110
2.2.2 Processing Delay Cost 110
2.2.3 Resource Constraints 110
2.2.4 Optimization Problem 111
2.3 Operating Point Selection 111
2.4 Further Research Areas 112
3 Topology Construction 112
3.1 Linear Topology Optimization: Problem Formulation 112
3.2 Centralized Ordering Algorithms for Fixed Operating Points 113
3.2.1 Optimal Order Search 114
3.2.2 Greedy Algorithm 114
3.3 Joint Order and Operating Point Selection 115
3.3.1 Limits of Centralized Algorithms for Order Selection 116
3.4 Multi-Chain Topology 116
3.4.1 Motivations for Using a Multi-Chain Topology: Delay Tradeoff Between Feature Extraction and Intra-Classifier Communication 116
3.4.2 Number of Chains and Tree Configuration 117
4 Decentralized Approach 117
4.1 Limits of Centralized Approaches and Necessity of a Decentralized Approach 117
4.2 Decentralized Decision Framework 119
4.2.1 Users of the Stream Mining System 119
4.2.2 States Observed by Each Classifier 119
4.2.3 Actions of a Classifier 120
4.2.4 Local Utility of a Classifier 120
4.3 Decentralized Algorithms 121
4.3.1 Exhaustive Search Ordering Algorithm 121
4.3.2 Partial Search Ordering Algorithm 122
4.3.3 Decentralized Ordering and Operating Point Selection 125
4.3.4 Robustness of the Partial Search Algorithm and Convergence Speed 125
4.4 Multi-Agent Learning in Decentralized Algorithm 126
4.4.1 Tradeoff Between Efficiency and Computational Time 126
4.4.2 Safe Experimentation 126
4.5 Parametric Partial Search Order and Operating Point Selection Algorithm 128
4.5.1 Controlling the Screening Probability 128
4.5.2 Comparison of Ordering and Operating Point Selection Algorithms 129
4.5.3 Order Selected by Various Classifiers for Different Ordering Algorithms 129
5 Online Learning for Real-Time Stream Mining 130
5.1 Centralized Online Learning 131
5.1.1 Problem Formulation 131
5.1.2 Active Stream Mining 132
5.1.3 Learning Under Accuracy Drift 134
5.1.4 Learning the Relevant Contexts 135
5.2 Decentralized Online Learning 136
5.2.1 Problem Formulation 136
5.2.2 Cooperative Contextual Bandits 136
5.2.3 Hedged Bandits 138
6 Conclusion 138
References 139
Deep Neural Networks: A Signal Processing Perspective 142
1 Introduction 142
2 Building Blocks of a Deep Neural Network 144
2.1 Neural Networks 144
2.2 Convolutional Layer 147
2.3 Pooling Layer 150
2.4 Network Activations 151
3 Network Training 153
3.1 Loss Functions 154
3.2 Optimization 156
4 Implementation 157
4.1 Platforms 157
4.2 Example: Image Categorization 159
5 System Level Deployment 165
6 Further Reading 167
7 Conclusions 168
References 170
High Dynamic Range Video Coding 173
1 Introduction 173
2 Early Work: HDR Coding for Still Images 174
3 Signal Quantization: Gamma, PQ, and HLG 176
4 Backward-Compatible HDR Coding 179
4.1 Dual-Layer Coding 179
4.1.1 Piecewise Linear Model Representation 182
4.1.2 Multivariate Multiple Regression Predictor 183
4.1.3 MPEG Color Gamut Scalability 184
4.1.4 System-Level Design Issues in Dual-Layer Systems 185
4.2 Single-Layer Methods 186
4.2.1 Philips HDR Codec 187
5 Non-Backward-Compatible HDR Coding 188
5.1 Multi-Layer Non-Backward-Compatible Systems 188
5.1.1 Dolby Non-Backward-Compatible 8-Bit Dual-Layer Codec 189
5.1.2 HEVC Range Extension Proposals 190
5.2 Single-Layer Solutions Using Signal Reshaping 191
5.2.1 MPEG Proposals for Reshaping Methods 192
5.2.2 Encoder Optimization for PQ-Coded HDR Signals 193
5.2.3 Perceptual Quality-Based Quantization Models 194
5.3 The Ultra HD Blu-Ray Disc Format 195
6 Conclusions 195
Appendix: List of Abbreviations 196
References 197
Signal Processing for Control 200
1 Introduction 200
2 Brief Introduction to Control 201
2.1 Stability 205
2.2 Sensitivity to Disturbance 205
2.3 Sensitivity Conservation 206
3 Signal Processing in Control 207
3.1 Simple Cases 208
3.2 Demanding Cases 212
3.3 Exemplary Case 216
4 Conclusions 217
5 Further Reading 217
References 218
MPEG Reconfigurable Video Coding 219
1 Introduction 220
2 Requirements and Rationale of the MPEG RVC Framework 220
3 Rationale for Changing the Traditional Specification Paradigm Based on Sequential Model of Computation 223
3.1 Limits of Previous Monolithic Specifications 225
3.2 Reconfigurable Video Coding Specification Requirements 226
4 Description of the Standard or Normative Components of the Framework 227
4.1 The Toolbox Library 228
4.2 The Cal Actor Language 229
4.2.1 Basic Constructs 229
4.2.2 Priorities and State Machines 230
4.2.3 Cal Subset Language for RVC 230
4.2.4 Non-standard Process Language Extension to Cal 232
4.3 FU Network Language for Codec Configurations 234
4.4 Bitstream Syntax Specification Language BSDL 236
4.5 Instantiation of the ADM 238
4.6 Case Study of New and Existing Codec Configurations 238
4.6.1 Commonalities 238
4.6.2 MPEG-4 Simple Profile (SP) Decoder 239
4.6.3 MPEG-4 AVC Decoder 240
5 Tools and Integrated Environments Supporting Development, Analysis and Synthesis of Implementations 242
5.1 OpenDF Framework 242
5.2 Orcc Framework 243
5.3 CAL2HDL Synthesis 244
5.4 CAL2C Synthesis 245
5.5 Integrated Design Flows Including Design Exploration and Full SW/HW Synthesis Capabilities 247
5.5.1 Turnus Design Exploration Framework 248
5.5.2 Xronos System Design Synthesis Framework 249
5.6 The Tÿcho Framework 250
6 Conclusion 251
References 251
Signal Processing for Wireless Transceivers 256
1 Introduction and System Overview 256
2 Equalization and MIMO Processing 259
2.1 System Model 259
2.2 Optimum Detector and Decoding 261
2.3 Suboptimal Equalization 263
2.4 Channel Estimation 267
2.5 Implementations 269
3 Multicarrier Waveforms 270
3.1 Waveform Processing in OFDM Systems 270
3.1.1 OFDM Principle 270
3.1.2 Synchronization, Adaptive Modulation and Coding, and Multiple Access 273
3.2 Enhanced Multicarrier Waveforms 274
3.2.1 Peak-to-Average Power Ratio Issues and SC-FDMA 275
3.2.2 Enhancing Spectral Containment of OFDM 277
3.2.3 Filterbank Multicarrier Waveforms 281
4 Transceiver RF System Fundamentals and I/Q Signal Processing 283
4.1 RF-System Fundamentals 283
4.2 Complex I/Q Signal Processing Fundamentals 284
4.2.1 Basic Definitions and Connection to Bandpass Signals 284
4.2.2 Analytic Signals and Hilbert Transforms 286
4.3 Frequency Translations and Filtering 287
4.3.1 Frequency Translations for Signals 287
4.3.2 Frequency Translations for Linear Systems and Filters 290
4.4 Radio Architecture Basics 294
4.4.1 Superheterodyne Receiver 294
4.4.2 Direct-Conversion Receiver 295
4.4.3 Low-IF Receiver 297
4.4.4 RF/IF Subsampling Receiver 297
4.5 Transceiver Digital Front-End 298
4.5.1 Traditional vs. Software Defined Radio Models 299
4.6 RF Imperfections and DSP 301
4.6.1 I/Q Imbalance and Mirror-Frequency Interference 302
4.6.2 Transmitter Nonlinearities 303
4.6.3 Receiver and ADC Nonlinearities 303
4.6.4 Oscillator Phase Noise 304
4.6.5 Sampling Jitter 305
5 Concluding Remarks 305
References 306
Signal Processing for Radio Astronomy 316
1 Introduction 317
2 Notation 318
3 Basic Concepts of Interferometry Data Model
3.1 Data Acquisition 320
3.2 Complex Baseband Signal Representation 321
3.3 Data Model 322
3.4 Radio Interferometric Imaging Concepts 325
4 Image Reconstruction 328
4.1 Constructing Dirty Images 329
4.1.1 Beamforming Formulation 329
4.1.2 Constructing Dirty Images by Adaptive Beamforming 331
4.2 Deconvolution 332
4.2.1 The CLEAN Algorithm 332
4.2.2 CLEAN Using Other Dirty Images 334
4.3 Matrix Formulations 335
4.3.1 Matrix Formulation of the Data Model 335
4.3.2 Matrix Formulation of Imaging via Beamforming 338
4.4 Parametric Image Estimation 339
4.4.1 Weighted Least Squares Imaging 339
4.4.2 Estimating the Position of the Sources 342
4.4.3 Preconditioned WLS 343
4.5 Constraints on the Image 345
4.5.1 Non-negativity Constraint 345
4.5.2 Dirty Image as Upper Bound 345
4.5.3 Tightest Upper Bound 346
4.5.4 Constrained WLS Imaging 347
4.5.5 Imaging Using Sparse Reconstruction Techniques 348
4.5.6 Comparison of Regularization Techniques 349
5 Calibration 351
5.1 Non-ideal Measurements 351
5.1.1 Instrumental Effects 351
5.1.2 Propagation Effects 352
5.2 Calibration Algorithms 354
5.2.1 Estimating the Element Gains and Directional Responses 354
5.2.2 Estimating the Ionospheric Perturbation 356
5.2.3 Estimating the General Model 356
6 A Typical Signal Processing Pipeline 357
7 Concluding Remarks and Further Reading 361
References 362
Distributed Smart Cameras and Distributed Computer Vision 366
1 Introduction 366
2 Approaches to Computer Vision 367
3 Early Work in Distributed Smart Cameras 369
4 Challenges 369
5 Camera Calibration 371
6 Tracking 374
6.1 Tracking with Overlapping Fields-of-View 374
6.2 Tracking in Sparse Camera Networks 375
7 Gesture Recognition 377
8 Platform Architectures 378
9 Summary 380
References 380
Part II Architectures 383
Arithmetic 384
1 Number Representation 384
1.1 Binary Representation 385
1.2 Two's Complement Representation 385
1.3 Redundant Representations 385
1.3.1 Signed-Digit Representation 386
1.3.2 Carry-Save Representation 386
1.4 Shifting and Increasing the Word Length 388
1.5 Negation 388
1.6 Finite Word Length Effects 388
1.6.1 Overflow Characteristics 389
1.6.2 Truncation 390
1.6.3 Rounding 390
1.6.4 Magnitude Truncation 390
1.6.5 Quantization of Products 391
2 Addition 391
2.1 Ripple-Carry Addition 392
2.2 Carry-Lookahead Addition 393
2.3 Carry-Select and Conditional Sum Addition 397
2.4 Multi-Operand Addition 398
3 Multiplication 399
3.1 Partial Product Generation 399
3.1.1 Avoiding Sign-Extension 400
3.1.2 Reducing the Number of Rows 401
3.1.3 Reducing the Number of Columns 402
3.2 Summation Structures 403
3.2.1 Sequential Accumulation 403
3.2.2 Array Accumulation 404
3.2.3 Tree Accumulation 404
3.3 Vector Merging Adder 406
3.4 Multiply-Accumulate 407
3.5 Multiplication by Constants 407
3.6 Distributed Arithmetic 409
3.6.1 Reducing the Memory Size 411
3.6.2 Complex Multipliers 413
4 Division 414
4.1 Restoring and Nonrestoring Division 414
4.2 SRT Division 415
4.3 Speeding Up Division 416
4.4 Square Root Extraction 417
5 Floating-Point Representation 417
5.1 Normalized Representations 418
5.2 IEEE Standard for Floating-Point Arithmetic, IEEE 754 418
5.3 Addition and Subtraction 419
5.4 Multiplication 420
5.5 Quantization Error 421
6 Computation of Elementary Functions 421
6.1 CORDIC 421
6.2 Polynomial and Piecewise Polynomial Approximations 424
6.3 Table-Based Methods 426
7 Further Reading 427
References 427
Coarse-Grained Reconfigurable Array Architectures 430
1 Application Domain of Coarse-Grained Reconfigurable Arrays 430
2 CGRA Basics 432
3 CGRA Design Space 435
3.1 Tight Versus Loose Coupling 435
3.2 CGRA Control 438
3.2.1 Reconfigurability 438
3.2.2 Scheduling and Issuing 440
3.2.3 Thread-Level and Data-Level Parallelism 442
3.3 Interconnects and Register Files 443
3.3.1 Connections 443
3.3.2 Register Files 445
3.3.3 Predicates, Events and Tokens 446
3.4 Computational Resources 446
3.5 Memory Hierarchies 449
3.6 Compiler Support 451
3.6.1 Intermediate Code Generation and Optimization 451
3.6.2 CGRA Code Mapping and Scheduling Techniques 452
4 Case Study: ADRES 455
4.1 Mapping Loops on ADRES CGRAs 456
4.1.1 Modulo Scheduling Algorithms for CGRAs 456
4.1.2 Loop Transformations 457
4.1.3 Data Flow Manipulations 461
4.2 ADRES Design Space Exploration 463
4.2.1 Example ADRES Instances 463
4.2.2 Design Space Exploration Example 466
5 Conclusions 467
6 Further Reading 468
References 468
High Performance Stream Processing on FPGA 476
1 Introduction 476
2 The FPGA-Based Processing Element (FPE) 477
3 Case Study: Sphere Decoding for MIMO Communications 479
4 FPE-Based Pre-processing Using SQRD 482
4.1 FPE Coprocessors for Arithmetic Acceleration 482
4.2 SQRD Using FPGA 484
5 FSD Tree-Search for 802.11n 485
5.1 FPE Coprocessors for Data Dependent Operations 487
5.2 SIMD Implementation of 802.11n FSD MCS 488
6 Stream Processing for FPGA Accelerators 490
6.1 Streaming Processing Elements 491
6.2 Instruction Coding 493
7 Streaming Block Processing 494
7.1 Loop Execution Without Overheads 495
7.2 Block Data Memory Access 497
7.3 Off-sFPE Communications 499
7.4 Stream Frame Processing Efficiency 499
8 Experiments 500
9 Summary 503
References 504
Application-Specific Accelerators for Communications 506
1 Introduction 506
1.1 Coarse Grain Versus Fine Grain Accelerator Architectures 509
1.2 Hardware/Software Workload Partition Criteria 510
2 Hardware Accelerators for Communications 512
2.1 MIMO Channel Equalization Accelerator 513
2.2 MIMO Detection Accelerators 515
2.2.1 Maximum-Likelihood (ML) Detection 516
2.2.2 Sphere Detection 516
2.2.3 Computational Complexity of Sphere Detection 518
2.2.4 Depth-First Sphere Detector Architecture 519
2.2.5 K-Best Detector Architecture 521
2.3 Channel Decoding Accelerators 522
2.3.1 Turbo Decoder Accelerator Architecture 522
2.3.2 LDPC Decoder Accelerator Architecture 529
2.4 Digital Predistortion 533
2.4.1 Full-Band DPD Mobile GPU Accelerator Architecture 535
2.4.2 Sub-band FPGA Accelerator Architecture 536
3 Summary 539
4 Further Reading 540
References 540
System-on-Chip Architectures for Data Analytics 545
1 Introduction 545
2 Algorithm/Architecture Co-design: Analytic Architecture for SMART SoC 546
2.1 Architectural Platform 547
2.2 Algorithm/Architecture Co-design: Abstraction at the System Level 548
2.2.1 Levels of Abstraction 548
2.2.2 Joint Exploration of Algorithms and Architecture 550
2.3 Algorithmic Intrinsic Complexity Metrics and Assessment 551
2.3.1 Number of Operations 552
2.3.2 Degree of Parallelism 554
2.3.3 Data Transfer Rate 558
2.3.4 Data Storage Requirement 561
2.4 Intelligent Parallel and Reconfigurable Computing 566
3 AAC Case Studies 567
3.1 Mapping Motion-Compensated Frame Rate Up-Convertor onto Multi-Core Platform via Complexity Metrics Quantification 568
3.2 Reconfigurable Interpolation 570
References 576
Architectures for Stereo Vision 578
1 Introduction 578
2 Algorithms 580
2.1 Epipolar Geometry and Rectification 580
2.2 Stereo Correspondence 583
2.2.1 Classical Disparity Estimation 584
2.2.2 Disparity Estimation Using Deep-Learning 587
2.3 Algorithm Example: Semi-global Matching 587
3 Architectures 589
3.1 GPU-Based Implementations 591
3.2 Dedicated Architectures (FPGA and VLSI) 592
3.3 Other Architectures 593
3.4 Comparison Studies 594
3.5 Current Trends 595
3.6 Implementation Example: Semi-global Matching on the GPU 595
3.6.1 Parallelization Principles 595
3.6.2 Rank Transform and Median Filter Kernel 596
3.6.3 SGM Kernel 597
3.6.4 Performance 601
3.7 Implementation Example: VLSI Architecture for Semi-global Matching 601
3.7.1 Parallelization 601
3.7.2 Architecture 603
3.7.3 Performance 605
4 Summary 606
5 Further Reading 607
References 607
Hardware Architectures for the Fast Fourier Transform 614
1 Introduction 614
2 FFT Algorithms 615
2.1 The Cooley-Tukey Algorithm 615
2.2 Representation Using Flow Graphs 617
2.3 Binary Tree Representation 618
2.4 Triangular Matrix Representation 620
2.5 The Radix in FFTs 621
2.6 Non-power-of-two and Mixed-Radix FFTs 622
3 Building Blocks for FFT Hardware Architectures 622
3.1 Butterflies 623
3.2 Rotators 623
3.2.1 Multiplier-Based General Rotators 625
3.2.2 Multi-Stage General Rotators 626
3.2.3 Simplified Multiplier-Based Rotators 628
3.2.4 Simplified Multi-Stage Rotators 630
3.2.5 Rotators Based on Trigonometric Identities 630
3.3 Shuffling Circuits 630
4 FFT Hardware Architectures 632
4.1 Architecture Selection 632
4.2 Fully Parallel FFT 633
4.3 Iterative FFT Architectures 633
4.4 Pipelined FFT Architectures 635
4.4.1 Serial Pipelined FFT Architectures 636
4.4.2 Parallel Pipelined FFT Architectures 639
5 Bit Reversal for FFT Architectures 641
5.1 The Bit Reversal Algorithm 642
5.2 Bit Reversal for Serial Data 642
5.3 Bit Reversal for Parallel Data 643
6 Conclusions 643
References 644
Programmable Architectures for Histogram of Oriented Gradients Processing 649
1 Introduction 649
1.1 Chapter Breakdown 651
2 HOG Algorithm 651
2.1 Profiling HOG 654
3 IPPro Introduction 654
4 HOG Deployment on IPPro 658
4.1 Algorithm Partitioning 658
4.2 Instruction Mapping and Scheduling on a Single IPPro 661
4.3 Instruction Mapping and Scheduling on Multiple IPPro 662
4.4 Results Generation: Initial Architecture 663
5 Profiling of Initial HOG Implementation 664
5.1 Normalize Gamma and Color 665
5.2 Compute Gradients 666
5.3 Weighted Vote into Spatial and Orientation Cells 667
5.4 Normalize over Overlapping Spatial Blocks 668
5.5 Collect HOGs over Detection Window 670
5.6 Linear SVM 670
5.7 Summary of HOG Profiling 670
6 IPPro Optimisations 671
6.1 Register Size 672
6.2 Mapping Strategy, Input Data Pattern 673
6.3 Coprocessor Development 674
6.4 Implementation of Coprocessor 676
6.4.1 Serial Coprocessor (Temporal Parallelism) 677
6.4.2 Parallel Coprocessor 677
6.4.3 Architecture Choice 678
6.4.4 IPPro Coprocessor Interface Design 679
6.4.5 Summary of Coprocessor Impact 680
7 Conclusions 681
References 681
Part III Design Methods and Tools 683
Methods and Tools for Mapping Process Networks onto Multi-Processor Systems-On-Chip 684
1 Introduction 684
2 KPN Design Flows for Multiprocessor Systems 686
3 Methods 688
3.1 System Specification 689
3.2 System Synthesis 690
3.3 Performance Analysis 692
3.4 Design Space Exploration 695
4 Specification, Synthesis, Analysis, and Optimization in DOL 696
4.1 Distributed Operation Layer 697
4.2 System Specification 698
4.3 System Synthesis 700
4.3.1 Functional Simulation Generation 700
4.3.2 Software Synthesis 702
4.4 Performance Analysis 703
4.4.1 Modular Performance Analysis (MPA) 704
4.4.2 Integration of MPA into the DOL Design Flow 706
4.5 Design Space Exploration 708
4.6 Results of the DOL Framework 711
5 Concluding Remarks 715
References 716
Intermediate Representations for Simulation and Implementation 719
1 The Role of Intermediate Representations 719
1.1 Forms of Representations 720
1.2 Representation for Parallel and Distributed Hardware 720
2 Untimed Representations 722
2.1 Representation of System Property Intervals 723
2.1.1 Specification of Process Mode Changes 724
2.1.2 Specification of Latency Constraints 725
2.1.3 Concluding Remarks on System Property Intervals 726
2.2 Representation of Functions Driven by State Machines 727
2.2.1 Describing an Application in FunState 727
2.2.2 Examples of Representation of Different Models of Computation 729
2.2.3 Representation of Schedules 732
2.3 Concluding Remarks on Untimed Representations 733
3 Timed Representations 733
3.1 Job Configuration Networks 733
3.1.1 Implementation of a Job Configuration Network 733
3.2 IPC Graphs 734
3.2.1 Timing Analysis of IPC Graphs 736
3.3 Timed Configuration Graphs 739
3.4 Set of Models 739
3.4.1 Modeling a Tiled 16 Cores Processor 741
3.5 Construction of Timed Configuration Graphs 743
3.5.1 Abstract Interpretation of TCFGs 744
4 Chapter Summary 747
References 747
Throughput Analysis of Dataflow Graphs 749
1 Introduction 749
2 Terminology 751
2.1 Synchronous and Cyclo-Static Dataflow Graphs 751
2.1.1 Auto-Concurrency and Ordering of Firings 752
2.1.2 Structural Invariants 753
2.1.3 Self-timed Execution and Throughput 754
2.2 Max-plus Algebra 755
3 Maximum Cycle Ratio Analysis 756
3.1 Max-plus Characterization 757
3.2 Computing the Maximum Cycle Ratio 759
3.2.1 The Power Method 759
3.2.2 Policy Iteration 761
3.2.3 Parametric Paths 762
3.3 Discussion 764
4 Single-Rate Approximations 764
4.1 Characterization of CSDF Constraints 764
4.2 Transforming the CSDF Constraints 766
4.2.1 Changing Counting Units 767
4.3 Computing Strictly Periodic Schedules 769
4.4 Discussion 771
5 Unfolding Actor Firings 772
5.1 Multi-Rate Equivalents 773
5.2 A General Transformation 775
5.3 Discussion 777
6 Throughput Analysis 777
6.1 State-Space Exploration 778
6.2 Incremental Unfolding 779
6.3 Comparing the Two Approaches 781
6.4 Discussion 782
References 782
Dataflow Modeling for Reconfigurable Signal Processing Systems 785
1 Reconfigurable Signal Processing Systems 785
2 Reconfigurable Dataflow Models 787
2.1 Reconfiguration Semantics 788
2.2 Reconfigurable Dataflow Models 790
2.2.1 Hierarchy-Based Reconfigurable Dataflow Meta-Models 790
2.2.2 Statically Analyzable Reconfigurable Dataflow Models 792
2.3 Dynamic Dataflow moc and Reconfigurability 794
2.3.1 Classification of Dynamic Dataflow Graphs 796
2.3.2 Reconfigurable Semantics for Dynamic Dataflow moc 797
3 Software Implementation Techniques for Reconfigurable Dataflow Specifications 799
3.1 Compile-Time Parameterized Quasi-Static Scheduling 799
3.2 Multicore Runtime for pisdf Graphs 802
3.3 Compilation Flow for spdf Graphs 805
3.4 Software Reconfiguration for Dynamic Dataflow Graphs 807
4 Dataflow-Based Techniques for Hardware Reconfigurable Computing Platforms 809
4.1 Dataflow-Driven Coarse Grained Reconfiguration 810
4.1.1 Heterogeneous Coarse-Grained and Runtime Reconfigurable Architectures 812
4.1.2 Coarse-Grained and Runtime Reconfigurable Arrays 815
4.2 Fine-Grained Dataflow-Driven Reconfiguration 816
References 818
Integrated Modeling Using Finite State Machines and DataflowGraphs 823
1 Intro 823
2 Modeling Approaches 824
2.1 Dataflow Graphs 824
2.2 *charts 825
2.2.1 Refining Dataflow Actors via FSMs 826
2.2.2 Refining FSM States via Dataflow Graphs 829
2.3 Extended Codesign Finite State Machines 830
2.4 SysteMoC 834
2.5 Further Approaches 837
3 Scheduling Dataflow Graphs 838
3.1 Modeling Static-Order Schedules 838
3.2 Quasi-Static and Dynamic Schedule Modeling 840
3.2.1 Actor Execution Model 841
3.2.2 Cluster Execution Model 842
3.2.3 Scheduling Examples 843
4 Exploiting Static MoCs for Scheduling 845
4.1 Scheduling Overhead 846
4.2 Cluster FSM Computation for QSS 846
5 Quasi-Static Scheduling in the Presence of Bounded Channels 849
5.1 Channel Capacity Adjustment Problem 849
5.2 Channel Capacity Adjustment Algorithm 852
5.2.1 Input-to-Input Back Pressure 853
5.2.2 Output-to-Output Back Pressure 854
5.2.3 Input-to-Output Back Pressure 856
6 Conclusions 859
References 860
Kahn Process Networks and a Reactive Extension 863
1 Introduction 864
1.1 Motivation 864
1.2 Example 865
1.3 Preliminaries 866
2 Denotational Semantics 867
3 Operational Semantics 869
3.1 Labeled Transition Systems 870
3.1.1 Semantics 870
3.1.2 Determinacy 872
3.2 Operational Semantics 873
4 The Kahn Principle 875
5 Analyzability Results 877
6 Implementing Kahn Process Networks 878
6.1 Implementing Atomic Processes 878
6.2 Correctness Criteria 879
6.3 Run-Time Scheduling and Buffer Management 880
7 Extensions of KPN 884
7.1 Events 885
7.2 Time 886
8 Reactive Process Networks 887
8.1 Introduction 887
8.2 A Reactive Process Network Example 890
8.3 Design Considerations of RPN 891
8.3.1 Streams, Events and Time 891
8.3.2 Semantic Model 892
8.3.3 Communicating Events 893
8.4 Operational Semantics of RPN 893
8.5 Implementation Issues 896
8.5.1 Coordinating Streaming and Events 896
8.5.2 Deadlock Detection and Resolution 896
8.6 Analyzable Models Embedded in RPN 897
9 Bibliography 898
References 901
Decidable Signal Processing Dataflow Graphs 905
1 Introduction 905
2 SDF (Synchronous Dataflow) 907
2.1 Static Analysis 908
2.2 Software Synthesis from SDF Graph 910
2.3 Static Scheduling Techniques 912
2.3.1 Scheduling Techniques for Single Processor Implementations 912
2.4 Parallel Scheduling of SDF Graphs 913
2.4.1 Scheduling Objectives 915
2.4.2 Execution Strategies 916
2.4.3 Scheduling of Multiple SDF Graphs 918
2.5 Hardware Synthesis from SDF Graph 919
3 Cyclo-Static Dataflow (CSDF) 919
3.1 Static Analysis 921
3.2 Static Scheduling and Buffer Size Reduction 922
3.3 Hierarchical Composition 923
4 Other Decidable Dataflow Models 924
4.1 FRDF (Fractional Rate Dataflow) 924
4.2 SPDF (Synchronous Piggybacked Dataflow) 928
4.3 SSDF (Scalable SDF) 931
References 933
Systolic Arrays 936
1 Introduction 936
2 Systolic Array Computing Algorithms 938
2.1 Convolution Systolic Array 938
2.2 Linear System Solver Systolic Array 939
2.3 Sorting Systolic Arrays 941
3 Formal Systolic Array Design Methodology 942
3.1 Loop Representation, Regular Iterative Algorithm (RIA), and Index Space 942
3.2 Localized and Single Assignment Algorithm Formulation 944
3.3 Data Dependence and Dependence Graph 945
3.4 Mapping an Algorithm to a Systolic Array 946
3.5 Linear Schedule and Assignment 948
4 Wavefront Array Processors 950
4.1 Synchronous Versus Asynchronous Global On-Chip Communication 950
4.2 Wavefront Array Processor Architecture 951
4.3 Mapping Algorithms to Wavefront Arrays 951
4.4 Example: Wavefront Processing for Matrix Multiplication 952
4.5 Comparison of Wavefront Arrays Against Systolic Arrays 954
5 Hardware Implementations of Systolic Array 955
5.1 Warp and iWARP 955
5.2 SAXPY Matrix-1 956
5.3 Transputer 958
5.4 TMS 32040 959
6 Recent Developments and Real World Applications 960
6.1 Block Motion Estimation 960
6.2 Wireless Communication 963
6.3 Deep Neural Network 966
7 Summary 972
References 973
Compiling for VLIW DSPs 975
1 VLIW DSP Architecture Concepts and Resource Modeling 975
1.1 Resource Modeling 978
1.2 Latency and Register Write Models 979
1.3 Clustered VLIW: Partitioned Register Sets 981
1.4 Control Hazards 982
1.5 Hardware Loops 983
1.6 Examples of VLIW DSP Processors 984
2 Case Study: TI 'C6x DSP Processor Family 984
2.1 TI 'C6201 DSP Processor Architecture 985
2.2 SIMD and Floatingpoint Support 989
2.3 Programming Models 990
3 VLIW DSP Code Generation Overview 990
4 Instruction Selection and Resource Allocation 992
5 Cluster Assignment for Clustered VLIW Architectures 994
6 Register Allocation and Generalized Spilling 996
7 Instruction Scheduling 999
7.1 Local Instruction Scheduling 999
7.2 Modulo Scheduling for Loops 1001
7.3 Global Instruction Scheduling 1004
7.4 Generated Instruction Schedulers 1006
8 Integrated Code Generation for VLIW and Clustered VLIW 1007
8.1 Integrated Code Generation at Basic Block Level 1009
8.2 Loop-Level Integrated Code Generation 1010
9 Concluding Remarks 1010
References 1011
Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems 1017
1 Introduction 1017
1.1 MPSoCs and MPSoC Compilers 1017
1.2 Challenges of Building MPSoC Compilers 1019
2 Foundation Elements of MPSoC Compilers 1020
2.1 Programming Models 1021
2.1.1 Mainstream Parallel Programming Models 1023
2.1.2 Dataflow Programming Models 1025
2.2 Platform Description for MPSoC Compilers 1028
2.3 Software Parallelization 1030
2.3.1 Intermediate Representation (IR) 1030
2.3.2 Granularity and Partitioning 1033
2.3.3 Parallelism Patterns 1034
2.3.4 Flow and Dependence Analysis 1035
2.4 Software Distribution 1039
2.4.1 Accelerator Offloading 1039
2.4.2 Mapping and Scheduling of Dataflow MoCs 1040
Scheduling Approaches 1041
Computing a Schedule 1042
Centralized Control Flow 1042
Distributed Control Flow 1043
2.5 Code Generation 1045
3 Case Studies 1046
3.1 Academic Research 1047
3.1.1 Shapes 1047
3.1.2 Daedalus 1048
3.1.3 PREESM 1050
3.2 Industrial Case Studies 1051
3.2.1 TI Keystone Multi-Core DSP Platform 1051
3.2.2 Silexica: SLX Tool Suite 1053
4 Summary 1054
References 1054
Analysis of Finite Word-Length Effects in Fixed-Point Systems 1059
1 Introduction 1059
2 Background 1062
2.1 Floating-Point vs. Fixed-Point Arithmetic 1063
2.2 Finite Word-Length Effects 1063
3 Effect of Signal Quantization 1065
3.1 Error Metrics 1066
3.2 Analytical Evaluation of the Round-Off Noise 1067
3.2.1 Quantization Noise Bounds 1067
3.2.2 Round-Off Noise Power 1073
3.2.3 Probability Density Function 1078
3.3 Simulation-Based and Mixed Approaches 1079
3.3.1 Fixed-Point Simulation-Based Evaluation 1079
3.3.2 Mixed Approach 1081
4 Effect of Coefficient Quantization 1081
4.1 Measurement Parameters 1083
4.2 L2-Sensitivity 1084
4.3 Analytical Approaches to Compute the L2-Sensitivity 1086
5 System Stability Due to Signal Quantization 1087
5.1 Analysis of Limit Cycles in Digital Filters 1088
5.2 Simulation-Based LC Detection Procedures 1089
6 Summary 1090
References 1090
Models of Architecture for DSP Systems 1098
1 Introduction 1098
2 The Context of Models of Architecture 1100
2.1 Models of Architecture in the Y-Chart Approach 1100
2.2 Illustrating Iterative Design Process and Y-Chart on an Example System 1101
2.3 On the Separation Between Application and Architecture Concerns 1104
2.4 Scope of This Chapter 1105
3 The Model of Architecture Concept 1105
3.1 Definition of an MoA 1105
3.2 Example of an MoA: The Linear System-Level Architecture Model (LSLA) 1108
4 Architecture Design Languages and Their Architecture Models 1111
4.1 The AADL Quasi-MoA 1111
4.1.1 The Features of the AADL Quasi-MoA 1112
4.1.2 Combining Application and Architecture in AADL 1114
4.1.3 Conclusions on the AADL Quasi-MoA 1115
4.2 The MCA SHIM Quasi-MoA 1115
4.2.1 Conclusions on MCA SHIM Quasi-MoA 1117
4.3 The UML MARTE Quasi-MoAs 1117
4.3.1 The UML MARTE Quasi-MoAs 1 and 4 1119
4.3.2 The UML MARTE Quasi-MoAs 2 and 3 1120
4.3.3 Conclusions on UML MARTE Quasi-MoAs 1121
4.4 Conclusions on ADL Languages 1122
5 Formal Quasi-MoAs 1122
5.1 The AAA Methodology Quasi-MoA 1122
5.2 The CHARMED Quasi-MoA 1124
5.3 The System-Level Architecture Model (S-LAM) Quasi-MoA 1125
5.4 The MAPS Quasi-MoA 1128
5.5 Evolution of Formal Architecture Models 1129
6 Concluding Remarks on MoA and Quasi-MoAs for DSP Systems 1129
List of Acronyms 1131
References 1132
Optimization of Number Representations 1135
1 Introduction 1135
2 Fixed-Point Data Type and Arithmetic Rules 1136
2.1 Fixed-Point Data Type 1136
2.2 Fixed-Point Arithmetic Rules 1138
2.3 Fixed-Point Conversion Examples 1139
3 Range Estimation for Integer Word-Length Determination 1141
3.1 L1-Norm Based Range Estimation 1141
3.2 Simulation Based Range Estimation 1142
3.3 C++ Class Based Range Estimation Utility 1143
4 Floating-Point to Integer C Code Conversion 1145
4.1 Fixed-Point Arithmetic Rules in C Programs 1145
4.2 Expression Conversion Using Shift Operations 1147
4.3 Integer Code Generation 1148
4.3.1 Shift Optimization 1148
4.4 Implementation Examples 1149
5 Word-Length Optimization 1152
5.1 Finite Word-Length Effects 1153
5.2 Fixed-Point Simulation Using C++ gFix Library 1155
5.3 Word-Length Optimization Method 1156
5.3.1 Signal Grouping 1157
5.3.2 Determination of Sign and Integer Word-Length 1158
5.3.3 Determination of the Minimum Word-Length for Each Group 1158
5.3.4 Determination of the Minimum Hardware Cost Word-Length Vector 1160
5.4 Optimization Example 1161
6 Summary and Related Works 1163
References 1164
Dynamic Dataflow Graphs 1166
1 Motivation for Dynamic DSP-Oriented Dataflow Models 1166
2 Boolean Dataflow 1168
3 CAL 1169
4 Parameterized Dataflow 1171
5 Enable-Invoke Dataflow 1174
6 Scenario-Aware Dataflow 1177
6.1 SADF Graphs 1177
6.2 Analysis 1181
6.3 Synthesis 1183
7 Dynamic Polyhedral Process Networks 1185
7.1 Weakly Dynamic Programs 1185
7.2 Dynamic Loop-Bounds 1190
7.3 Dynamic While-Loops 1191
7.4 Parameterized Polyhedral Process Networks 1195
8 Summary 1199
References 1199
Erscheint lt. Verlag | 13.10.2018 |
---|---|
Zusatzinfo | XIII, 1210 p. 611 illus., 225 illus. in color. |
Verlagsort | Cham |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik |
Technik ► Elektrotechnik / Energietechnik | |
Technik ► Nachrichtentechnik | |
Schlagworte | Astronomical Signal Processing • audio processing • brain-machine interfaces • Compilers • Dataflow Graphs • Defined Radio • digital signal processing • DSP • Embedded Control • field-programmable gate arrays • FPGAs • Low Power • Medical Imaging • Multiprocessor Modeling • Reconfigurable Video Coding • Smart Camera • System-on-Chip • wireless communication |
ISBN-10 | 3-319-91734-X / 331991734X |
ISBN-13 | 978-3-319-91734-4 / 9783319917344 |
Haben Sie eine Frage zum Produkt? |
Größe: 57,8 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich