OpenCL Programming Guide
Addison-Wesley Educational Publishers Inc (Verlag)
978-0-321-74964-2 (ISBN)
- Titel ist leider vergriffen;
keine Neuauflage - Artikel merken
Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.
Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes
Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
Programming with OpenCL C and the runtime API
Using buffers, sub-buffers, images, samplers, and events
Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
Simplifying development with the C++ Wrapper API
Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
Source code for this book is available at https://code.google.com/p/opencl-book-samples/
Aaftab Munshi is the spec editor for the OpenGL ES 1.1, OpenGL ES 2.0, and OpenCL specifications and coauthor of the book OpenGL ES 2.0 Programming Guide (with Dan Ginsburg and Dave Shreiner, published by Addison-Wesley, 2008). He currently works at Apple. Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCL’s design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D. in computer science for his work on type systems for extensible records and variants. He has been working at AMD since 2008. Timothy G. Mattson is an old-fashioned parallel programmer, having started in the mid-eighties with the Caltech Cosmic Cube and continuing to the present. Along the way, he has worked with most classes of parallel computers (vector supercomputers, SMP, VLIW, NUMA, MPP, clusters, and many-core processors). Tim has published extensively, including the books Patterns for Parallel Programming (with Beverly Sanders and Berna Massingill, published by Addison-Wesley, 2004) and An Introduction to Concurrency in Programming Languages (with Matthew J. Sottile and Craig E. Rasmussen, published by CRC Press, 2009). Tim has a Ph.D. in chemistry for his work on molecular scattering theory. He has been working at Intel since 1993. James Fung has been developing computer vision on the GPU as it progressed from graphics to general-purpose computation. James has a Ph.D. in electrical and computer engineering from the University of Toronto and numerous IEEE and ACM publications in the areas of parallel GPU Computer Vision and Mediated Reality. He is currently a Developer Technology Engineer at NVIDIA, where he examines computer vision and image processing on graphics hardware. Dan Ginsburg currently works at Children’s Hospital Boston as a Principal Software Architect in the Fetal-Neonatal Neuroimaging and Development Science Center, where he uses OpenCL for accelerating neuroimaging algorithms. Previously, he worked for Still River Systems developing GPU-accelerated image registration software for the Monarch 250 proton beam radiotherapy system. Dan was also Senior Member of Technical Staff at AMD, where he worked for over eight years in a variety of roles, including developing OpenGL drivers, creating desktop and hand-held 3D demos, and leading the development of handheld GPU developer tools. Dan holds a B.S. in computer science from Worcester Polytechnic Institute and an M.B.A. from Bentley University.
Figures xv Tables xxi
Listings xxv
Foreword xxix
Preface xxxiii
Acknowledgments xli
About the Authors xliii
Part I: The OpenCL 1.1 Language and API 1
Chapter 1: An Introduction to OpenCL 3
What Is OpenCL, or . . . Why You Need This Book 3
Our Many-Core Future: Heterogeneous Platforms 4
Software in a Many-Core World 7
Conceptual Foundations of OpenCL 11
OpenCL and Graphics 29
The Contents of OpenCL 30
The Embedded Profile 35
Learning OpenCL 36
Chapter 2: HelloWorld: An OpenCL Example 39
Building the Examples 40
HelloWorld Example 45
Checking for Errors in OpenCL 57
Chapter 3: Platforms, Contexts, and Devices 63
OpenCL Platforms 63
OpenCL Devices 68
OpenCL Contexts 83
Chapter 4: Programming with OpenCL C 97
Writing a Data-Parallel Kernel Using OpenCL C 97
Scalar Data Types 99
Vector Data Types 102
Other Data Types 108
Derived Types 109
Implicit Type Conversions 110
Explicit Casts 116
Explicit Conversions 117
Reinterpreting Data as Another Type 121
Vector Operators 123
Qualifiers 133
Keywords 141
Preprocessor Directives and Macros 141
Restrictions 146
Chapter 5: OpenCL C Built-In Functions 149
Work-Item Functions 150
Math Functions 153
Integer Functions 168
Common Functions 172
Geometric Functions 175
Relational Functions 175
Vector Data Load and Store Functions 181
Synchronization Functions 190
Async Copy and Prefetch Functions 191
Atomic Functions 195
Miscellaneous Vector Functions 199
Image Read and Write Functions 201
Chapter 6: Programs and Kernels 217
Program and Kernel Object Overview 217
Program Objects 218
Kernel Objects 237
Chapter 7: Buffers and Sub-Buffers 247
Memory Objects, Buffers, and Sub-Buffers Overview 247
Creating Buffers and Sub-Buffers 249
Querying Buffers and Sub-Buffers 257
Reading, Writing, and Copying Buffers and Sub-Buffers 259
Mapping Buffers and Sub-Buffers 276
Chapter 8: Images and Samplers 281
Image and Sampler Object Overview 281
Creating Image Objects 283
Creating Sampler Objects 292
OpenCL C Functions for Working with Images 295
Transferring Image Objects 299
Chapter 9: Events 309
Commands, Queues, and Events Overview 309
Events and Command-Queues 311
Event Objects 317
Generating Events on the Host 321
Events Impacting Execution on the Host 322
Using Events for Profiling 327
Events Inside Kernels 332
Events from Outside OpenCL 333
Chapter 10: Interoperability with OpenGL 335
OpenCL/OpenGL Sharing Overview 335
Querying for the OpenGL Sharing Extension 336
Initializing an OpenCL Context for OpenGL Interoperability 338
Creating OpenCL Buffers from OpenGL Buffers 339
Creating OpenCL Image Objects from OpenGL Textures 344
Querying Information about OpenGL Objects 347
Synchronization between OpenGL and OpenCL 348
Chapter 11: Interoperability with Direct3D 353
Direct3D/OpenCL Sharing Overview 353
Initializing an OpenCL Context for Direct3D Interoperability 354
Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357
Acquiring and Releasing Direct3D Objects in OpenCL 361
Processing a Direct3D Texture in OpenCL 363
Processing D3D Vertex Data in OpenCL 366
Chapter 12: C++ Wrapper API 369
C++ Wrapper API Overview 369
C++ Wrapper API Exceptions 371
Vector Add Example Using the C++ Wrapper API 374
Chapter 13: OpenCL Embedded Profile 383
OpenCL Profile Overview 383
64-Bit Integers 385
Images 386
Built-In Atomic Functions 387
Mandated Minimum Single-Precision Floating-Point Capabilities 387
Determining the Profile Supported by a Device in an OpenCL C Program 390
Part II: OpenCL 1.1 Case Studies 391
Chapter 14: Image Histogram 393
Computing an Image Histogram 393
Parallelizing the Image Histogram 395
Additional Optimizations to the Parallel Image Histogram 400
Computing Histograms with Half-Float or Float Values for Each Channel 403
Chapter 15: Sobel Edge Detection Filter 407
What Is a Sobel Edge Detection Filter? 407
Implementing the Sobel Filter as an OpenCL Kernel 407
Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411
Graph Data Structures 412
Kernels 414
Leveraging Multiple Compute Devices 417
Chapter 17: Cloth Simulation in the Bullet Physics SDK 425
An Introduction to Cloth Simulation 425
Simulating the Soft Body 429
Executing the Simulation on the CPU 431
Changes Necessary for Basic GPU Execution 432
Two-Layered Batching 438
Optimizing for SIMD Computation and Local Memory 441
Adding OpenGL Interoperation 446
Chapter 18: Simulating the Ocean with Fast Fourier Transform 449
An Overview of the Ocean Application 450
Phillips Spectrum Generation 453
An OpenCL Discrete Fourier Transform 457
A Closer Look at the FFT Kernel 463
A Closer Look at the Transpose Kernel 467
Chapter 19: Optical Flow 469
Optical Flow Problem Overview 469
Sub-Pixel Accuracy with Hardware Linear Interpolation 480
Application of the Texture Cache 480
Using Local Memory 481
Early Exit and Hardware Scheduling 483
Efficient Visualization with OpenGL Interop 483
Performance 484
Chapter 20: Using OpenCL with PyOpenCL 487
Introducing PyOpenCL 487
Running the PyImageFilter2D Example 488
PyImageFilter2D Code 488
Context and Command-Queue Creation 492
Loading to an Image Object 493
Creating and Building a Program 494
Setting Kernel Arguments and Executing a Kernel 495
Reading the Results 496
Chapter 21: Matrix Multiplication with OpenCL 499
The Basic Matrix Multiplication Algorithm 499
A Direct Translation into OpenCL 501
Increasing the Amount of Work per Kernel 506
Optimizing Memory Movement: Local Memory 509
Performance Results and Optimizing the Original CPU Code 511
Chapter 22: Sparse Matrix-Vector Multiplication 515
Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515
Description of This Implementation 518
Tiled and Packetized Sparse Matrix Representation 519
Header Structure 522
Tiled and Packetized Sparse Matrix Design Considerations 523
Optional Team Information 524
Tested Hardware Devices and Results 524
Additional Areas of Optimization 538
Appendix: Summary of OpenCL 1.1 541
The OpenCL Platform Layer 541
The OpenCL Runtime 543
Buffer Objects 544
Program Objects 546
Kernel and Event Objects 547
Supported Data Types 550
Vector Component Addressing 552
Preprocessor Directives and Macros 555
Specify Type Attributes 555
Math Constants 556
Work-Item Built-In Functions 557
Integer Built-In Functions 557
Common Built-In Functions 559
Math Built-In Functions 560
Geometric Built-In Functions 563
Relational Built-In Functions 564
Vector Data Load/Store Functions 567
Atomic Functions 568
Async Copies and Prefetch Functions 570
Synchronization, Explicit Memory Fence 570
Miscellaneous Vector Built-In Functions 571
Image Read and Write Built-In Functions 572
Image Objects 573
Image Formats 576
Access Qualifiers 576
Sampler Objects 576
Sampler Declaration Fields 577
OpenCL Device Architecture Diagram 577
OpenCL/OpenGL Sharing APIs 577
OpenCL/Direct3D 10 Sharing APIs 579
Index 581
Erscheint lt. Verlag | 28.7.2011 |
---|---|
Reihe/Serie | OpenGL |
Verlagsort | New Jersey |
Sprache | englisch |
Maße | 180 x 230 mm |
Gewicht | 1010 g |
Themenwelt | Informatik ► Grafik / Design ► Film- / Video-Bearbeitung |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
Informatik ► Weitere Themen ► Hardware | |
ISBN-10 | 0-321-74964-2 / 0321749642 |
ISBN-13 | 978-0-321-74964-2 / 9780321749642 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich