Handbook of Signal Processing Systems (eBook)
XXXVIII, 1117 Seiten
Springer US (Verlag)
978-1-4419-6345-1 (ISBN)
Shuvra S. Bhattacharyya is a Professor in the Department of Electrical and Computer Engineering University of Maryland at College Park. He holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS. Dr. Bhattacharyya is coauthor or coeditor of five books and the author or coauthor of more than 150 refereed technical articles. Dr. Bhattacharyya is director of the Maryland DSPCAD Research Group. He serves as associate editor for the EURASIP Journal on Embedded Systems, International Journal of Embedded Systems, and Journal of Signal Processing Systems. Dr. Bhattacharyya has served as Chair of the IEEE Signal Processing Society Technical Committee on Design and Implementation of Signal Processing Systems (2008-2009). Dr. Bhattacharyya has been a Nokia Distinguished Lecturer (Finland, 2006) and Fulbright Senior Specialist (Germany, 2005 and Austria, 2009). He has received the NSF Career Award (1998). He has received Best Paper Awards at the International Workshop on Systems, Architectures, Modeling, and Simulation (2008); and IEEE Workshop on Signal Processing Systems (2007).
It gives me immense pleasure to introduce this timely handbook to the research/- velopment communities in the ?eld of signal processing systems (SPS). This is the ?rst of its kind and represents state-of-the-arts coverage of research in this ?eld. The driving force behind information technologies (IT) hinges critically upon the major advances in both component integration and system integration. The major breakthrough for the former is undoubtedly the invention of IC in the 50's by Jack S. Kilby, the Nobel Prize Laureate in Physics 2000. In an integrated circuit, all components were made of the same semiconductor material. Beginning with the pocket calculator in 1964, there have been many increasingly complex applications followed. In fact, processing gates and memory storage on a chip have since then grown at an exponential rate, following Moore's Law. (Moore himself admitted that Moore's Law had turned out to be more accurate, longer lasting and deeper in impact than he ever imagined. ) With greater device integration, various signal processing systems have been realized for many killer IT applications. Further breakthroughs in computer sciences and Internet technologies have also catalyzed large-scale system integration. All these have led to today's IT revolution which has profound impacts on our lifestyle and overall prospect of humanity. (It is hard to imagine life today without mobiles or Internets!) The success of SPS requires a well-concerted integrated approach from mul- ple disciplines, such as device, design, and application.
Shuvra S. Bhattacharyya is a Professor in the Department of Electrical and Computer Engineering University of Maryland at College Park. He holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS. Dr. Bhattacharyya is coauthor or coeditor of five books and the author or coauthor of more than 150 refereed technical articles. Dr. Bhattacharyya is director of the Maryland DSPCAD Research Group. He serves as associate editor for the EURASIP Journal on Embedded Systems, International Journal of Embedded Systems, and Journal of Signal Processing Systems. Dr. Bhattacharyya has served as Chair of the IEEE Signal Processing Society Technical Committee on Design and Implementation of Signal Processing Systems (2008-2009). Dr. Bhattacharyya has been a Nokia Distinguished Lecturer (Finland, 2006) and Fulbright Senior Specialist (Germany, 2005 and Austria, 2009). He has received the NSF Career Award (1998). He has received Best Paper Awards at the International Workshop on Systems, Architectures, Modeling, and Simulation (2008); and IEEE Workshop on Signal Processing Systems (2007).
Handbook of Signal Processing Systems 3
Foreword 7
Preface 9
Contents 11
List of Contributors 31
Part IApplicationsManaging Editor: Shuvra Bhattacharyya 39
Signal Processing for Control 40
1 Introduction 40
2 Brief introduction to control 41
2.1 Stability 45
2.2 Sensitivity to disturbance 46
2.3 Sensitivity conservation 46
3 Signal processing in control 47
3.1 simple cases 49
3.2 Demanding cases 52
3.3 Exemplary case 56
4 Conclusions 57
5 Further reading 57
References 58
Digital Signal Processing inHomeEntertainment 60
1 Introduction 60
1.1 The Personal Computer and the Digital Home 61
2 The Digital Photo Pipeline 62
2.1 Image Capture and Compression 62
2.2 Focus and exposure control 63
2.3 Pre-processing of CCD data 63
2.4 White Balance 64
2.5 Demosaicing 64
2.6 Color transformations 64
2.7 Post Processing 64
2.8 Preview display 64
2.9 Compression and storage 64
2.10 Photo Post-Processing 65
2.11 Storage and Search 65
2.12 Digital Display 66
3 The Digital Music Pipeline 66
3.1 Compression 66
3.2 Broadcast Radio 67
3.3 Audio Processing 68
4 The Digital Video Pipeline 68
4.1 Digital Video Broadcasting 70
4.2 New Portable Digital Media 70
4.3 Digital Video Recording 70
4.4 Digital Video Processing 72
4.5 Digital Video Playback and Display 72
5 Sharing Digital Media in the Home 72
5.1 DLNA 73
5.2 Digital Media Server 73
5.3 Digital Media Player 73
6 Copy Protection and DRM 74
6.1 Microsoft Windows DRM 75
7 Summary and Conclusions 77
8 To Probe Further 78
References 78
MPEG Reconfigurable Video Coding 79
1 Introduction 80
2 Requirements and rationale of the MPEG RVC framework 80
2.1 Limits of previous monolithic specifications 82
2.2 Reconfigurable Video Coding specification requirements 83
3 Description of the standard or normative components of theframework 84
3.1 The toolbox library 84
3.2 The CAL Actor Language 85
3.2.1 Basic Constructs 86
3.2.2 Priorities and State Machines 87
3.2.3 CAL subset language for RVC 88
3.3 FU Network language for the codec configurations 89
3.4 Bitstream syntax specification language BSDL 91
3.5 Instantiation of the ADM 93
3.6 Case study of new and existing codec configurations 93
3.6.1 Commonalities 93
3.6.2 MPEG-4 Simple Profile (SP) Decoder 94
3.6.3 MPEG-4 AVC Decoder 95
4 The procedures and tools supporting decoder implementations 97
4.1 OpenDF supporting framework 97
4.2 CAL2HDL synthesis 98
4.3 CAL2C synthesis 100
5 Conclusion 101
References 102
Signal Processing for High-Speed Links 104
1 Introduction 104
2 System Models 107
2.1 The Transmitter 107
2.2 The Channel 109
2.2.1 Back-plane Channel Model 109
2.2.2 Optical Fiber Channel Model 111
2.3 Receiver Models 113
2.3.1 O/E Detectors 114
2.3.2 Analog Front-End (AFE) 115
2.3.3 Analog-to-Digital Converter (ADC) 116
2.3.4 Clock-Recovery Unit 116
2.4 Noise Models 118
2.4.1 Back-plane Noise Models 118
2.4.2 Optical Fiber Noise Models 119
3 Signal Processing Methods 120
3.1 Modulation Formats 120
3.2 Adaptive Equalization 120
3.3 Equalization of Nonlinear Channels 122
3.4 Maximum Likelihood Sequence Estimation (MLSE) 123
4 Case Study of an OC-192 EDC-based Receiver 126
4.1 MLSE Receiver Architecture 126
4.2 MLSE Equalizer Algorithm and VLSI Architecture 127
4.3 Measured Results 129
5 Advanced Techniques 131
5.1 FEC for Low-power Back-Plane Links 131
5.2 Coherent Detection in Optical Links 133
6 Concluding Remarks 134
7 Acknowledgements 135
References 135
Video Compression 137
1 Evolution of Video Coding Standards 137
2 Basic Components of Video Coding Systems 139
2.1 Color Processing 139
2.2 Prediction 140
2.2.1 Temporal Prediction 141
2.2.2 Spatial Prediction 142
2.2.3 Coding Structure 142
2.3 Transform and Quantization 144
2.4 Entropy Coding 145
3 Emergent Video Applications and Corresponding Coding Systems 147
3.1 HDTV Applications and H.264/AVC 147
3.2 Streaming and Surveillance Applications and Scalable Video Coding 149
3.3 3D Video Applications and Multiview Video Coding 151
4 Conclusion and Future Directions 153
References 154
Low-powerWireless Sensor Network Platforms 156
1 Characteristics of Low-powerWSNs 156
1.1 Quality of Service Requirements 158
1.2 Services for Signal Processing Application 159
2 Key Standards and Industry Specifications 160
2.1 IEEE 1451 160
2.2 WSN Communication Standards 160
3 Software Components 162
3.1 Sensor Operating Systems 163
3.2 Middlewares 163
4 Hardware Platforms and Components 164
4.1 Communication subsystem 164
4.2 Computing subsystem 165
4.3 Sensing subsystem 166
4.4 Power subsystem 167
4.5 Existing Platforms 168
5 Medium Access Control Features and Services 169
5.1 MAC Technologies 169
5.2 Unsynchronized Low Duty-Cycle MAC Protocols 170
5.3 Synchronized Low Duty-Cycle MAC Protocols 171
5.4 Performance 173
6 Routing and Transport 176
6.1 Services 176
6.2 Routing Paradigms and Technologies 177
6.2.1 Node-centric Routing 177
6.2.3 Location-based Routing 178
6.2.4 Multipath Routing 179
6.2.5 Cost-based Routing 179
6.3 Hybrid Transport and Routing Protocols 180
7 Embedded WSN Services 180
7.1 Localization 181
7.2 Synchronization 182
8 Experiments 184
8.1 Low-Energy TUTWSN 185
8.1.1 Scalability 185
8.1.2 Power Consumption 186
8.1.3 Availability and End-to-end Reliability 186
8.2 Low-Latency TUTWSN 187
8.2.1 Power Consumption and Network Lifetime 188
8.2.2 Delay and Throughput 188
9 Summary 189
References 190
Signal Processing for Cryptography and Security Applications 194
1 Introduction 194
2 Efficient Implementation 195
2.1 Secret-Key Algorithms and Implementations 195
2.1.1 Cryptographic Hash Functions and Implementations 195
2.2 Public-Key Algorithms and Implementations 197
2.2.1 Efficient Modular Multiplication 197
2.2.2 Implementations of Public-Key algorithms 199
2.2.3 Other Ideas for Arithmetic Borrowed from Signal Processing 200
2.3 Architecture 201
2.3.1 Datapath 201
2.3.2 Interface 202
2.3.3 Programmability and Security 202
2.3.4 Low Power Architecture 203
3 Secure Implementations: Side Channel Attacks 203
3.1 DSP Techniques Used in Side Channel Analysis 204
4 Working with Fuzzy Secrets 205
4.1 Fuzzy Secrets: Properties and Applications in Cryptography 205
4.2 Generating Cryptographic Keys from Fuzzy Secrets 206
5 Conclusion and Future Work 207
References 207
High-Energy Physics 211
1 Introduction to High-Energy Physics 211
2 The Compact Muon Solenoid Experiment 214
2.1 Physics Research at the CMS Experiment 215
The Higgs Boson 215
Supersymmetry 215
Extra Dimensions 216
Quark Compositeness 216
2.2 Sensors and Triggering Systems 216
2.3 The Super Large Hadron Collider Upgrade 220
3 Characteristics of High-Energy Physics Applications 221
3.1 Technical Characteristics 221
Number of Devices 221
Data Bandwidth 222
Interconnection 222
End-to-End Latency 223
3.2 Characteristics of the Development Process 224
Wide-Scale Collaboration and Disparate Design Styles 225
Frequent Adjustments and Alterations 225
Periodic Upgrades 225
Long Design Times 226
3.3 Evolving Specifications 226
3.4 Functional Verification and Testing 226
3.5 Unit Modification 227
4 Techniques for Managing Design Challenges 228
4.1 Managing Hardware Cost and Complexity 228
4.2 Managing Variable Latency and Variable Resource Use 230
4.3 Scalable and Modular Designs 233
4.4 Dataflow-Based Tools 234
4.5 Language-Independent Functional Testing 235
4.6 Efficient Design Partitioning 236
5 Example Problems from the CMS Level-1 Trigger 236
5.1 Particle Identification 237
5.2 Jet Reconstruction 238
The Jet Overlap Problem 239
6 Conclusion 240
References 241
Medical Image Processing 244
1 Introduction 244
2 Medical Imaging Modalities 245
2.1 Radiography 245
2.2 Computed Tomography 245
2.3 Nuclear Medicine 246
2.4 Magnetic Resonance Imaging 246
2.5 Ultrasound 247
3 Medical Applications 248
3.1 Imaging Pipeline 248
4 Image Reconstruction 250
4.1 CT Reconstruction 251
4.2 PET/SPECT Reconstruction 254
4.3 Reconstruction in MRI 255
4.4 Ultrasound Reconstruction 255
5 Image Preprocessing 255
6 Image Segmentation 258
6.1 Classification of Segmentation Techniques 258
6.2 Threshold-based 259
6.3 Deformable Model-based 261
7 Image Registration 263
7.1 Geometry-based 264
7.2 Intensity-based 264
7.2.1 Similarity 264
7.3 Transformation Models 265
7.3.1 Optimization Algorithms 266
8 Image Visualization 266
8.1 Surface Rendering 267
8.2 Volume Rendering 267
9 Medical Image Processing Platforms 268
9.1 Computing Requirements 268
9.2 Platforms For Accelerated Implementation 269
10 Summary 270
11 Further Reading 271
References 271
Signal Processing for Audio HCI 274
1 Introduction 274
2 Microphone Arrays 275
2.1 Source Localization 275
2.2 Beamforming 277
2.3 Practical Issues 277
3 Audio Reproduction 278
3.1 Head-related transfer function 279
3.2 Reverberation Simulation 279
3.3 User Motion Tracking 281
3.4 Signal Processing Pipeline 281
4 Sample Application: Fast HRTF Measurement System 283
5 Sample Application: Auditory Scene Capture and Reproduction 287
6 Sample Application: Acoustic Field Visualization 291
References 294
Distributed Smart Cameras and Distributed Computer Vision 297
1 Introduction 297
2 Approaches to Computer Vision 298
3 Early Work in Distributed Smart Cameras 300
4 Approaches to Distributed Computer Vision 300
4.1 Challenges 301
4.2 Platform Architectures 303
4.3 Mid-Level Fusion for Gesture Recognition 305
4.4 Target-Level Fusion for Tracking 305
4.5 Calibration 306
4.6 Sparse Camera Networks 307
5 Summary 308
References 309
Part II ArchitecturesManaging Editor: Jarmo Takala 311
Arithmetic 312
1 Number representation 312
1.1 Binary representation 313
1.2 Two’s complement representation 313
1.3 Redundant representations 313
1.3.1 Signed-digit representation 314
1.3.2 Carry-save representation 314
1.4 Shifting and increasing the wordlength 315
1.5 Negation 316
1.6 Finite-wordlength effects 316
1.6.1 Overflow characteristics 317
1.6.2 Truncation 318
1.6.3 Rounding 318
1.6.4 Magnitude truncation 318
1.6.5 Quantization of products 319
2 Addition 319
2.1 Ripple-carry addition 320
2.2 Carry-lookahead addition 320
2.3 Carry-select and conditional sum addition 324
2.4 Multi-operand addition 325
3 Multiplication 327
3.1 Partial product generation 327
3.1.1 Avoiding sign-extension 328
3.1.2 Reducing the number of rows 328
3.1.3 Reducing the number of columns 330
3.2 Summation structures 330
3.2.1 Sequential accumulation 330
3.2.2 Array accumulation 332
3.2.3 Tree accumulation 332
3.3 Vector merging adder 334
3.4 Multiply-accumulate 334
3.5 Multiplication by a constant 334
3.6 Distributed arithmetic 337
3.6.1 Reducing the memory size 338
3.6.2 Complex Multipliers 340
4 Division 341
4.1 Restoring and nonrestoring division 342
4.2 SRT division 343
4.3 Speeding up division 343
4.4 Square root extraction 344
5 Floating-point representation 344
5.1 Normalized representations 345
5.2 IEEE 754 345
5.3 Addition and subtraction 346
5.4 Multiplication 347
5.5 Quantization error 347
6 Computation of elementary function 348
6.1 CORDIC 348
6.2 Polynomial and piecewise polynomial approximations 350
6.3 Table-based methods 352
7 Further reading 353
References 354
Application-Specific Accelerators for Communications 357
1 Introduction 358
1.1 Coarse Grain Versus Fine Grain Accelerator Architectures 359
1.2 Hardware/Software Workload Partition Criteria 361
2 Hardware Accelerators for Communications 363
2.1 MIMO Channel Equalization Accelerator 364
2.2 MIMO Detection Accelerators 366
2.2.1 Maximum-Likelihood (ML) Detection 367
2.2.2 Sphere Detection 367
2.2.3 Computational Complexity of Sphere Detection 369
2.2.4 Depth-First Sphere Detector Architecture 371
2.2.5 K-Best Detector Architecture 372
2.3 Channel Decoding Accelerators 372
2.3.1 Viterbi Decoder Accelerator Architecture 373
2.3.2 Turbo Decoder Accelerator Architecture 377
2.3.3 LDPC Decoder Accelerator Architecture 383
3 Summary 386
References 388
FPGA-based DSP 391
1 Introduction 391
2 FPGA Technology 393
2.1 FPGA Fundamentals 393
2.2 The Evolution of FPGA 394
2.2.1 Age 1: Custom Glue Logic 394
2.2.2 Age 2: Mid-density Logic 395
2.2.3 Age 3: Heterogeneous System-on-Programmable-Chip 395
3 State-of-the-art FPGA Technology for DSP Applications 396
3.1 FPGA Programmable Logic Technology 397
3.1.1 Virtex® -5 Configurable Logic Block 397
3.1.2 Stratix® -IV Adaptive Logic Module 398
3.2 FPGA DSP-datapath for DSP 399
3.2.1 Virtex® -5 DSP48 Slices 400
3.2.2 Stratix® -IV DSP Module 401
3.3 FPGA Memory Hierarchy in DSP System Implementation 402
3.3.1 Virtex R Memory Hierarchy 402
3.3.2 Stratix R Memory Hierarchy 403
3.4 Discussion 404
4 Core-Based Implementation of FPGA DSP Systems 405
4.1 Strategy Overview 405
4.2 Core-based Implementation of DSP Applications from Simulink 406
4.3 Other Core-based Development Approaches 407
5 Programmable Processor-Based FPGA DSP System Design 408
5.1 Application Specific Microprocessor Technology for FPGA DSP 409
5.1.1 Hardcore FPGA-based Processors for DSP Systems 409
5.1.2 Microblaze R -Centric Reconfigurable Processor Design 410
5.1.3 Nios-Centric Reconfigurable Processor Design 411
5.2 Reconfigurable Processor Design for FPGA 412
5.2.1 Molen 412
5.2.2 VIPERS 413
5.3 Discussion 414
6 System Level Design for FPGA DSP 415
6.1 Deadalus 415
6.2 Compaan/LAURA 416
6.3 Koski 416
6.4 Requirements for FPGA DSP System Design 417
7 Summary 418
References 418
General-Purpose DSP Processors 421
1 Introduction 421
2 Arithmetic Type 423
3 Data Path 423
3.1 Fixed-Point Data Paths 423
3.1.1 Multiplier and ALU 424
3.1.2 Registers 425
3.1.3 Shifters 426
3.1.4 OverflowManagement 428
3.2 Floating-Point Data Paths 430
3.3 Special Function Units 431
4 Memory Architecture 432
5 Address Generation 437
6 Program Control 439
7 Conclusions 439
8 FurtherReading 440
References 440
Application Specific Instruction Set DSP Processors 442
1 Introduction 442
1.1 ASIP Definition 442
1.2 Difference Between ASIP and General CPU 443
2 DSP ASIP Design Flow 444
3 Architecture and Instruction Set Design 447
3.1 Code Profiling 447
3.2 Using an Available Architecture 449
3.3 Generating a Custom Architecture 452
4 Instruction Set Design 454
4.1 Instruction Set Specification 455
4.2 Instructions for Functional Acceleration 456
4.3 Instruction Coding 457
4.4 Instruction and Architecture Optimization 458
4.5 Design Automation of an Instruction Set Architecture 460
5 ASIP Instruction Set for Radio Baseband Processors 461
6 ASIP Instruction Set for Video and Image Compression 464
7 Programming Toolchain and Benchmarking 466
8 ASIP Firmware Design and Benchmarking 467
8.1 Firmware Design 468
8.2 Benchmarking and Instruction Usage Profiling 469
9 Microarchitecture Design for ASIP 471
10 Further Reading 473
References 473
Coarse-Grained Reconfigurable Array Architectures 475
1 Application Domain of Coarse-Grained Reconfigurable Arrays 475
2 CGRABasics 477
3 CGRA Design Space 479
3.1 Tight versus Loose Coupling 480
3.2 CGRA Control 482
3.2.1 Reconfigurability 482
3.2.2 Scheduling and Issuing 484
3.2.3 Thread-level and Data-level Parallelism 485
3.3 Interconnects and Register Files 487
3.3.1 Connections 487
3.3.2 Register Files 487
3.3.3 Predicates, Events and Tokens 488
3.4 Computational Resources 489
3.5 Memory Hierarchies 491
3.6 Compiler Support 492
4 Case Study: ADRES 494
4.1 Mapping Loops onto ADRES CGRAs 495
4.1.1 Modulo Scheduling Algorithms for CGRAs 495
4.1.2 Loop Transformations 496
Loop Unrolling 496
Loop Fusion, Loop Interchange, Loop Combination and Data Context Switching 497
Live-in Variables 497
Predication 498
Kernel-Only Loops 499
4.1.3 Data Flow Manipulations 500
4.2 ADRES Design Space Exploration 501
4.2.1 Example ADRES Instances 502
4.2.2 Design Space Exploration Example 505
5 Conclusions 506
References 507
Multi-core Systems on Chip 511
1 Introduction 511
2 Analytical Model 517
2.1 Performance Comparison 518
2.2 Energy Comparison 522
2.3 DSP Applications on a MPSoC Organization 523
3 MPSoC Classification 526
3.1 Architecture View Point 527
3.2 Organization View Point 529
4 Successful MPSoC Designs 530
4.1 OMAP Platform 530
4.2 Samsung S3C6410/S5PC100 532
4.3 Other MPSoCs 534
5 Open Problems 536
5.1 Interconnection Mechanism 536
5.2 MPSoC Programming Models 537
6 Future Research Challenges 538
References 539
DSP Systems using Three-Dimensional (3D) Integration Technology 541
1 Introduction 541
2 Overview of 3D Integration Technology 543
3 3D Logic-Memory Integration Paradigm for DSP Systems:Rationale and Opportunities 544
4 3D DRAM Design 547
5 Case Study I: 3D Integrated VLIW Digital Signal Processor 549
5.1 3D DRAM Stacking in VLIW Digital Signal Processors 549
5.2 3D DRAM L2 Cache 550
5.3 Performance Evaluation 552
6 Case Study II: 3D Integrated H.264 Video Encoder 553
6.1 H.264 Video Encoding Basics 554
6.2 3D DRAM Stacking in H.264 Video Encoder 556
6.2.1 Architecting DRAM for Image Storage 556
6.2.2 Motion Estimation Memory Access 559
6.3 Performance Evaluation 561
7 Conclusion 564
References 564
Mixed Signal Techniques 568
1 Introduction 568
2 General Properties of Data Converters 569
2.1 Sample and Hold Circuits 570
2.2 Quantization 572
2.3 Performance Specifications 573
2.3.1 Static Performance 573
2.3.2 Dynamic Performance 574
3 Analog to Digital Converters 574
3.1 Nyquist-Rate A/D Converters 575
3.1.1 Integrating Converters 575
3.1.2 Successive-Approximation Converters 576
3.1.3 Flash Converters 576
3.1.4 Sub-Ranging Converters 577
3.2 Oversampled A/D Converters 578
3.2.1 Benefits of Oversampling 578
3.2.2 First-Order Sigma-DeltaModulator 579
3.2.3 Higher-Order Sigma-DeltaModulators 580
3.2.4 Sigma-Delta A/D Converters 581
4 Digital to Analog Converters 582
4.1 Nyquist-rate D/A Converters 583
4.1.1 R-2R D/A Converters 583
4.1.2 Current-Steering D/A Converters 583
4.1.3 Thermometer-Code D/A Converters 583
4.1.4 Segmented D/A Converters 584
4.2 Oversampled D/A Converters 585
5 Switched-Capacitor Circuits 586
5.1 Amplifiers 587
5.2 Switched Capacitors 587
5.3 Integrators 588
5.4 First-Order Filter 589
5.5 Second-Order Filter 590
5.6 Differential SC Circuits 591
5.7 Performance Limitations 592
6 Frequency Synthesis 592
6.1 Phase-Locked Loops 592
6.2 Fractional-N Frequency Synthesis 593
6.3 Delay-Locked Loops 594
6.4 Direct Digital Synthesis 595
References 596
Part III Programming and Simulation Tools Managing Editor: Rainer Leupers 597
C Compilers and Code Optimization for DSPs 598
1 Introduction 598
2 C Compilers for DSPs 599
2.1 Industrial Context 599
2.2 Compiler and Language Extensions 600
2.2.1 Compiler-Known Functions 600
2.2.2 DSP-C and Embedded C 601
2.3 Compiler Benchmarks 601
3 Code Optimizations 602
3.1 Address Code Optimizations 603
3.1.1 Pointer-to-Array Conversion 603
3.1.2 Offset Assignment 605
3.2 Control Flow Optimizations 605
3.2.1 Zero Overhead Loops 605
3.2.2 If-Conversion 607
3.3 Loop Optimizations 608
3.3.1 Loop Unrolling 609
3.3.2 Loop Tiling 610
3.3.3 Loop Reversal 610
3.3.4 Loop Vectorization 612
3.3.5 Loop Invariant Code Motion 613
3.3.6 Software Pipelining 614
3.4 Memory Optimizations 615
3.4.1 Dual Memory Bank Assignment 615
3.5 Optimizations for Code Size 617
3.5.1 Generic Optimizations for Code Compaction 618
3.5.2 Coalescing of Repeated Instruction Sequences 620
References 621
Compiling for VLIW DSPs 625
1 VLIW DSP architecture concepts and resource modeling 625
Resource modeling 628
Latency and register write models 629
Clustered VLIW: Partitioned Register Sets 631
Control Hazards 631
Hardware Loops 632
Examples of VLIW DSP Processors 632
2 Case study: TI ’C62x DSP processor family 633
3 VLIW DSP code generation overview 637
4 Instruction selection and resource allocation 639
5 Cluster assignment for clustered VLIW architectures 641
6 Register allocation and generalized spilling 642
7 Instruction scheduling 644
7.1 Local instruction scheduling 645
7.2 Modulo scheduling for loops 646
7.3 Global instruction scheduling 649
7.4 Generated instruction schedulers 651
8 Integrated code generation for VLIW and clustered VLIW 652
9 Concluding remarks 655
Trademarks and Acknowledgements 655
References 656
Software Compilation Techniques for MPSoCs 661
1 Introduction 662
1.1 MPSoCs and MPSoC Compilers 662
1.2 Challenges of Building MPSoC Compilers 663
2 Foundation Elements of MPSoC Compilers 664
2.1 Programming Models 665
2.1.1 Parallel ProgrammingModels 667
2.1.2 Embedded Parallel ProgrammingModels 668
2.2 Granularity, Parallelism and Intermediate Representation 671
2.2.1 Granularity and Partitioning 672
2.2.2 Parallelism 674
2.2.3 Flow and Dependence Analysis 675
Summary 678
2.3 Platform Description for MPSoC compilers 678
Summary 680
2.4 Mapping and Scheduling 681
2.4.1 Scheduling Approaches 682
2.4.2 Computing a Schedule 683
Centralized control flow: 683
Distributed control flow: 684
Summary 686
2.5 Code Generation 686
Summary 687
3 Case Studies 688
SHAPES 688
TI OMAP 690
HOPES 692
Daedalus 693
TCT 694
MPA 695
CoMPSoC/C-HEAP 695
MAPS 696
Summary 696
References 697
DSP Instruction Set Simulation 701
1 Introduction/Overview 701
2 Interpretive Simulation 703
2.1 The Classic Interpreter 704
2.2 Threaded Code 705
2.3 Interpreter Optimizations 707
3 Compiled Simulation 708
3.1 Basic Principles 708
3.2 Simulation of Pipelined Processors 710
3.3 Static Binary Translation 711
4 Dynamically Compiled Simulation 712
4.1 Basic Simulation Principles 712
4.2 Optimizations 712
5 Hardware Supported Simulation 713
5.1 Interfacing Simulators with Hardware 713
5.2 Simulation in Hardware 714
6 Generation of Instruction Set Simulators 716
6.1 Processor Description Languages 716
6.2 Retargetable Simulation 718
7 Related Work 719
Interpretive Simulation 719
Compiled Simulation 720
Dynamically Compiled Simulation 720
Retargetable Simulation 721
Hardware Supported Simulation 723
References 724
Optimization of Number Representations 728
1 Introduction 728
2 Fixed-point Data Type and Arithmetic Rules 729
2.1 Fixed-Point Data Type 729
2.2 Fixed-point Arithmetic Rules 731
2.3 Fixed-point Conversion Examples 732
3 Range Estimation for Integer Word-length Determination 734
3.1 L1-norm based Range Estimation 735
3.2 Simulation based Range Estimation 735
3.3 C++ Class based Range Estimation Utility 736
4 Floating-point to Integer C Code Conversion 738
4.1 Fixed-Point Arithmetic Rules in C Programs 738
4.2 Expression Conversion Using Shift Operations 740
4.3 Integer Code Generation 741
4.3.1 Shift Optimization 741
4.4 Implementation Examples 742
5 Word-length Optimization 747
5.1 Finite Word-length Effects 747
5.2 Fixed-point Simulation using C++ gFix Library 749
5.3 Word-length Optimization Method 750
5.3.1 Signal Grouping 751
5.3.2 Determination of Sign and IntegerWord-Length 752
5.3.3 Determination of the Minimum Word-Length for Each Group 753
5.3.4 Determination of the Minimum Hardware Cost Word-Length Vector 753
5.4 Optimization Example 756
6 Summary and Related Works 757
References 758
Intermediate Representations for Simulation and Implementation 760
1 Background 760
2 Untimed Representations 763
2.1 System Property Intervals 763
2.1.1 Specification of Latency Constraints 766
2.2 Functions Driven by State Machines 767
2.2.1 Examples of Representation of DifferentModels of Computation 769
2.2.2 Representation of Schedules 772
3 Timed Representations 773
3.1 Job Configuration Networks 773
3.2 IPC Graphs 774
3.2.1 Timing Analysis of IPC Graphs 776
3.3 Timed Configuration Graphs 779
3.4 Set of Models 779
3.4.1 Modeling a Tiled 16 Cores Processor 780
3.5 Construction of Timed Configuration Graphs 783
3.5.1 Abstract Interpretation of TCFGs 784
4 Summary 786
References 787
Embedded C for Digital Signal Processing 789
1 Introduction 789
2 Typical DSP Architecture 790
3 Fixed Point Types 791
3.1 Fixed Point Sizes 792
3.2 Fixed Point Constants 793
3.3 Fixed Point Conversions And Arithmetical Operations 794
3.4 Fixed Point Support Functions 797
4 Memory Spaces 798
5 Applications 800
5.1 FIR Filter 800
5.2 Sine Function in Fixed Point 801
6 Named Registers 802
7 Hardware I/O Addressing 803
8 History and Future of Embedded C 805
9 Conclusions 806
References 807
Part IV Design Methods Managing Editor: Ed Deprettere 808
Signal Flow Graphs and Data Flow Graphs 809
1 Introduction 809
2 Signal Flow Graphs 810
2.1 Notation 810
2.2 Transfer Function Derivation of SFG 811
2.2.1 Mason’s Gain Formula 811
2.2.2 Equations-solving Based Transfer Function Derivation 813
3 Data Flow Graphs 814
3.1 Notation 814
3.2 Synchronous Data Flow Graph 815
3.3 Construct an Equivalent Single-rate DFG from the Multi-rate DFG 815
3.4 Equivalent Data Flow Graphs 817
3.4.1 Retiming 817
3.4.2 Pipelining 818
4 Applications to Hardware Design 820
4.1 Unfolding 821
4.1.1 The DFG Based Unfolding 821
4.1.2 Applications to Parallel Processing 823
Word-level Parallel Processing 823
Bit-level Parallel Processing 823
4.1.3 Infinite Unfolding of DFG 825
4.2 Folding 826
5 Applications to Software Design 829
5.1 Intra-iteration and Inter-iteration Precedence Constraints 829
5.2 Definition of Critical Path and Iteration Bound 829
5.3 Scheduling 830
5.3.1 The Scheduling Algorithm 831
5.3.2 Minimum Cost Solution 832
5.3.3 Scheduling of EdgesWith Delays 833
6 Conclusions 834
References 834
Systolic Arrays 835
1 Introduction 835
2 Systolic Array Computing Algorithms 837
2.1 Convolution Systolic Array 837
2.2 Linear System Solver Systolic Array 838
2.3 Sorting Systolic Arrays 840
3 Formal Systolic Array Design Methodology 841
3.1 Loop Representation, Regular Iterative Algorithm (RIA), and Index Space 841
3.2 Localized and Single Assignment Algorithm Formulation 842
3.3 Data Dependence and Dependence Graph 844
3.4 Mapping an Algorithm to a Systolic Array 845
3.5 Linear Schedule and Assignment 847
4 Wavefront Array Processors 849
4.1 Synchronous versus Asynchronous Global On-chip Communication 849
4.2 Wavefront Array Processor Architecture 850
4.3 Mapping Algorithms to Wavefront Arrays 850
4.4 Example: Wavefront Processing for Matrix Multiplication 851
4.5 Comparison of Wavefront Arrays against Systolic Arrays 853
5 Hardware Implementations of Systolic Array 853
5.1 Warp and iWARP 853
5.2 SAXPY Matrix-1 855
5.3 Transputer 857
5.4 TMS 32040 858
6 Recent Developments and Real World Applications 859
6.1 Block Motion Estimation 859
6.2 Wireless Communication 862
7 Conclusions 865
References 866
Decidable Signal Processing Dataflow Graphs: Synchronous and Cyclo-Static Dataflow Graphs 868
1 Introduction 868
2 SDF (Synchronous Dataflow) 869
2.1 Static Analysis 870
2.2 Software Synthesis from SDF graph 872
2.3 Static Scheduling Techniques 875
2.3.1 Scheduling Techniques for Single Processor Implementations 875
2.3.2 Scheduling Techniques for Multiprocessor Implementations 875
3 Cyclo-Static Data Flow (CSDF) 877
3.1 Static Analysis 878
3.2 Static Scheduling and Buffer Size Reduction 879
3.3 Hierarchical Composition 880
4 Other Decidable Dataflow Models 881
4.1 FRDF (Fractional Rate Data Flow) 881
4.2 SPDF (Synchronous Piggybacked Data Flow) 885
4.3 SSDF (Scalable SDF) 888
References 890
Mapping Decidable Signal Processing Graphs into FPGA Implementations 892
1 Introduction 892
1.1 Chapter breakdown 893
2 FPGA hardware platforms 894
2.1 FPGA logic blocks 895
2.2 FPGA DSP functionality 896
2.3 FPGA memory organization 897
2.4 FPGA design strategy 898
3 Circuit architecture derivation 899
3.1 Basic mapping of DSP functionality to FPGA 900
3.2 Retiming 901
3.3 Cut-set theorem 902
3.4 Application to recursive structures 905
4 Circuit architecture optimization 907
4.1 Folding 907
4.2 Unfolding 909
4.3 Adaptive LMS filter example 909
4.4 Towards FPGA-based IP core generation 911
5 Conclusions 912
5.1 Incorporation of FPGA cores into data flow based systems 913
5.2 Application of techniques for low power FPGA 913
References 914
Dynamic and Multidimensional Dataflow Graphs 916
1 Multidimensional synchronous data flowgraphs (MDSDF) 916
1.1 Basics 917
1.2 Arbitrary sampling 918
1.3 A complete example 921
1.4 Initial tokens on arcs 922
2 Windowed synchronous/cyclo-static dataflow 923
2.1 Windowed synchronous/cyclo-static data flow graphs 924
2.2 Balance equations 926
2.2.1 A complete example 928
3 Motivation for dynamic DSP-oriented dataflow models 929
4 Boolean dataflow 931
5 The Stream-based Function Model 933
5.1 The concern for an actor model 933
5.2 The stream-based function actor model 934
5.3 The formal model 935
5.4 Communication and scheduling 936
5.5 Composition of SBF actors 937
6 CAL 938
7 Parameterized dataflow 940
8 Enable-invoke dataflow 943
9 Summary 945
References 946
Polyhedral Process Networks 948
1 Introduction 948
2 Overview 949
3 Polyhedral Concepts 951
3.1 Polyhedral Sets and Relations 951
3.2 Lexicographic Order 954
3.3 Polyhedral Models 954
3.4 Piecewise Quasi-Polynomials 956
3.5 Polyhedral Process Networks 956
4 Polyhedral Analysis Tools 957
4.1 Parametric Integer Programming 958
4.2 Emptiness Check 959
4.3 Parametric Counting 959
4.4 Computing Parametric Upper Bounds 960
4.5 Polyhedral Scanning 961
5 Dataflow Analysis 963
5.1 Standard Dataflow Analysis 963
5.2 Reuse 966
6 Channel Types 967
6.1 FIFOs and Reordering Channels 967
6.2 Multiplicity 968
6.3 Internal Channels 970
6.3.1 Registers 970
6.3.2 Shift Registers 971
7 Scheduling 971
7.1 Two Processes 972
7.2 More than Two Processes 973
7.3 Blocking Writes 974
7.4 Linear Transformations 976
8 Buffer Size Computation 978
8.1 FIFOs 978
8.2 Reordering Channels 979
8.3 Accuracy of the Buffer Sizes 981
9 Summary 981
References 981
Kahn Process Networks and a Reactive Extension 983
1 Introduction 984
2 Denotational Semantics 987
3 Operational Semantics 989
3.1 Labeled transition systems 990
3.2 Operational Semantics 993
4 The Kahn Principle 995
5 Analizability Results 997
6 Implementing Kahn Process Networks 998
6.1 Implementing Atomic Processes 998
6.2 Correctness Criteria 998
6.3 Run-time Scheduling and Buffer Management 1000
7 Extensions of KPN 1003
Events 1004
Time 1005
8 Reactive Process Networks 1006
A Reactive Process Network Example 1008
8.1 Introduction 1006
8.2 Design Considerations of RPN 1010
Streams, Events and Time 1010
Semantic Model 1011
Communicating Events 1011
8.3 Operational Semantics of RPN 1012
8.4 Implementation Issues 1014
Coordinating Streaming and Events 1014
Deadlock Detection and Resolution 1015
8.5 Analyzable Models Embedded in RPN 1016
9 Bibliography 1016
References 1019
Methods and Tools for Mapping Process Networks onto Multi-Processor Systems-On-Chip 1023
1 Introduction 1023
2 KPN Design Flows for Multiprocessor Systems 1025
3 Methods 1027
3.1 System Specification 1028
3.2 System Synthesis 1029
3.3 Performance Analysis 1030
3.4 Design Space Exploration 1033
4 Specification, Synthesis, Analysis, and Optimization in DOL 1035
4.1 Distributed Operation Layer 1035
4.2 System Specification 1036
4.3 System Synthesis 1038
4.3.1 Functional Simulation Generation 1039
4.3.2 Software Synthesis 1040
4.4 Performance Analysis 1042
4.4.1 Modular Performance Analysis (MPA) 1043
4.4.2 Integration of MPA into the DOL Design Flow 1044
4.5 Design Space Exploration 1046
4.6 Results of the DOL Framework 1049
5 Concluding Remarks 1052
References 1053
Integrated Modeling using Finite State Machines and Dataflow Graphs 1057
1 Introduction 1057
2 Modeling Approaches 1058
2.1 *charts 1058
2.1.1 Heterogeneity 1059
2.1.2 Synchronous Dataflow Graphs 1060
2.1.3 Dynamic Dataflow 1061
2.2 The California Actor Language 1062
2.3 Extended Codesign Finite State Machines 1065
2.4 SysteMoC 1069
3 Design Methodologies 1073
3.1 Ptolemy II 1073
3.2 The OpenDF Design Flow 1074
3.3 SystemCoDesigner 1075
3.3.1 Overview 1075
3.3.2 Exploiting Static MoCs 1077
4 Integrated Finite State Machines for Multidimensional Data Flow 1080
4.1 Array-OL 1081
4.1.1 Mode-Automata 1082
4.1.2 Gaspard2 Design Flow 1083
4.2 Windowed Data Flow 1085
4.2.1 Communication Order 1086
4.2.2 Interaction with a Finite State Machine for Communication Control 1087
4.3 Exploiting Knowledge about the Model of Computation 1089
References 1090
Index 1092
Erscheint lt. Verlag | 10.9.2010 |
---|---|
Zusatzinfo | XXXVIII, 1117 p. 100 illus. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Informatik ► Netzwerke | |
Informatik ► Theorie / Studium ► Algorithmen | |
Informatik ► Weitere Themen ► Hardware | |
Technik ► Elektrotechnik / Energietechnik | |
Technik ► Nachrichtentechnik | |
Schlagworte | Astronomical Signal Processing • audio processing • brain-machine interfaces • Coding • Compilers • currentjm • data structures • digital signal processing • DSP • Embedded Control • field-programmable gate arrays • FPGAs • Image Processing • Low Power • Medical Imaging • Multiprocessor Modeling • Network Signal Processing • Optimization • Reconfigurable • sensor networks • Signal Processing • software-defined radio • System-on-Chip |
ISBN-10 | 1-4419-6345-6 / 1441963456 |
ISBN-13 | 978-1-4419-6345-1 / 9781441963451 |
Haben Sie eine Frage zum Produkt? |
Größe: 25,2 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich