Instruction Level Parallelism - Alex Aiken, Utpal Banerjee, Arun Kejariwal, Alexandru Nicolau

Instruction Level Parallelism (eBook)

eBook Download: PDF
2016 | 1st ed. 2016
XXI, 255 Seiten
Springer US (Verlag)
978-1-4899-7797-7 (ISBN)
Systemvoraussetzungen
64,19 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
This book precisely formulates and simplifies the presentation of Instruction Level Parallelism (ILP) compilation techniques. It uniquely offers consistent and uniform descriptions of the code transformations involved. Due to the ubiquitous nature of ILP in virtually every processor built today, from general purpose CPUs to application-specific and embedded processors, this book is useful to the student, the practitioner and also the researcher of advanced compilation techniques. With an emphasis on fine-grain instruction level parallelism, this book will also prove interesting to researchers and students of parallelism at large, in as much as the techniques described yield insights that go beyond superscalar and VLIW (Very Long Instruction Word) machines compilation and are more widely applicable to optimizing compilers in general. ILP techniques have found wide and crucial application in Design Automation, where they have been used extensively in the optimization of performance as well as area and power minimization of computer designs.




Alex Aiken is the Alcatel-Lucent Professor and the current chair of the Computer Science Department at Stanford. His research interests include most areas of programming languages and compilers and particularly automated methods of analysis for both high performance and high reliability.

Utpal Banerjee has a PhD in mathematics from Carnegie-Mellon University and a PhD in computer science from the University of Illinois at Urbana-Champaign. He has taught at the University of Cincinnati, Arizona State University and the University of Illinois. Dr. Banerjee has served as a research staff member at Honeywell, Fairchild, Control Data and Intel corporations. His current affiliation is with the Department of Computer Science, University of California at Irvine. He has published a number of papers and books on restructuring compilers, including encyclopedia articles and a series of books on loop transformations. He is a fellow of the IEEE and a fellow of the ACM.
<
Arun Kejariwal is a Statistical Learning Principal at Machine Zone. He co-founded MZ Research and currently manages a team of research scientists. He is leading the research and development of novel algorithms for fraud detection, anomaly detection in security and operational data. Prior to joining Machine Zone, he was a lead in the Data Fidelity Team at Twitter and open sourced standalone R packages for anomaly detection and breakout detection. He received Ph.D. in Computer Science from UC Irvine and is a Senior Member of IEEE and ACM.

Alexandru Nicolau's research is in the areas of Parallel Processing/ILP, and Embedded Systems/Design Automation. His interests focus on Computer Performance/power tradeoffs, parallelizing compilers, GPUs. His current work involves collaborations both within and outside UCI, most recently with researchers at Stanford, University of Michigan, UCLA, UCSD as part of a flagship NSF Expedition project, and a separate grant with UIUC. He authored over 300 peer-reviewed papers and several books. He is the Editor-in-Chief of the International Journal of Parallel Processing, and an IEEE Fellow.


This book precisely formulates and simplifies the presentation of Instruction Level Parallelism (ILP) compilation techniques. It uniquely offers consistent and uniform descriptions of the code transformations involved. Due to the ubiquitous nature of ILP in virtually every processor built today, from general purpose CPUs to application-specific and embedded processors, this book is useful to the student, the practitioner and also the researcher of advanced compilation techniques. With an emphasis on fine-grain instruction level parallelism, this book will also prove interesting to researchers and students of parallelism at large, in as much as the techniques described yield insights that go beyond superscalar and VLIW (Very Long Instruction Word) machines compilation and are more widely applicable to optimizing compilers in general. ILP techniques have found wide and crucial application in Design Automation, where they have been used extensively in the optimization of performanceas well as area and power minimization of computer designs.

Alex Aiken is the Alcatel-Lucent Professor and the current chair of the Computer Science Department at Stanford. His research interests include most areas of programming languages and compilers and particularly automated methods of analysis for both high performance and high reliability.Utpal Banerjee has a PhD in mathematics from Carnegie-Mellon University and a PhD in computer science from the University of Illinois at Urbana-Champaign. He has taught at the University of Cincinnati, Arizona State University and the University of Illinois. Dr. Banerjee has served as a research staff member at Honeywell, Fairchild, Control Data and Intel corporations. His current affiliation is with the Department of Computer Science, University of California at Irvine. He has published a number of papers and books on restructuring compilers, including encyclopedia articles and a series of books on loop transformations. He is a fellow of the IEEE and a fellow of the ACM. Arun Kejariwal is a Statistical Learning Principal at Machine Zone. He co-founded MZ Research and currently manages a team of research scientists. He is leading the research and development of novel algorithms for fraud detection, anomaly detection in security and operational data. Prior to joining Machine Zone, he was a lead in the Data Fidelity Team at Twitter and open sourced standalone R packages for anomaly detection and breakout detection. He received Ph.D. in Computer Science from UC Irvine and is a Senior Member of IEEE and ACM.Alexandru Nicolau’s research is in the areas of Parallel Processing/ILP, and Embedded Systems/Design Automation. His interests focus on Computer Performance/power tradeoffs, parallelizing compilers, GPUs. His current work involves collaborations both within and outside UCI, most recently with researchers at Stanford, University of Michigan, UCLA, UCSD as part of a flagship NSF Expedition project, and a separate grant with UIUC. He authored over 300 peer-reviewed papers and several books. He is the Editor-in-Chief of the International Journal of Parallel Processing, and an IEEE Fellow.

Contents 6
List of Figures 10
List of Tables 13
Preface 14
Foreword 16
Acknowledgments 18
1 Introduction 19
1.1 Scope of the Book 19
1.2 Instruction-Level Parallelism 21
1.3 Outline of Topics 23
2 Overview of ILP Architectures 26
2.1 Historical Perspective 26
2.2 Superscalar and VLIW Machines 30
2.3 Early ILP Architectures 32
2.4 ILP Architectures in the 80's 37
2.5 ILP Architectures in the 90's 41
2.6 Itanium 52
2.6.1 The EPIC Philosophy 53
2.6.2 Itanium Architecture 55
3 Scheduling Basic Blocks 60
3.1 Introduction 60
3.2 Basic Concepts 61
3.3 Unlimited Resources 64
3.3.1 ASAP Algorithm 64
3.3.2 ALAP Algorithm 66
3.4 Limited Resources 67
3.4.1 List Scheduling 67
3.4.2 Linear Analysis 69
3.5 An Example 70
3.6 More Algorithms 75
3.6.1 Critical Path Algorithm 75
3.6.2 Restricted Branch and Bound Algorithm 79
3.6.3 Force-Directed Scheduling 83
3.7 Limited Beyond Basic Block Optimization 90
4 Trace Scheduling 95
4.1 Introduction 95
4.2 Basic Concepts 98
4.2.1 Program Model 98
4.2.2 Traces 101
4.2.3 Dependence 103
4.2.4 Schedules 103
4.2.5 Program Transformation 105
4.3 Traces without Joins 106
4.4 General Traces 116
4.5 Trace Scheduling Algorithm 125
4.6 Picking Traces 128
5 Percolation Scheduling 133
5.1 Introduction 133
5.2 The Core Transformations 134
5.2.1 Delete Transformation 136
5.2.2 Move-op Transformation 137
5.2.3 Move-test Transformation 138
5.2.4 Unify Transformation 139
5.3 Remarks 142
5.3.1 Termination 143
5.3.2 Completeness 144
5.3.3 Confluence 145
5.4 Extensions 145
5.4.1 Migrate Transformation 145
5.4.2 Trailblazing 146
5.4.3 Resource-Constrained Percolation Scheduling 147
6 Modulo Scheduling 149
6.1 Introduction 149
6.2 Unrolling 151
6.3 Preliminaries 154
6.4 Modulo Scheduling Algorithm 156
6.4.1 Remarks 162
Sufficiency of simple cycles 162
Infeasibility of MII 164
6.4.2 Limitations 164
6.5 Modulo Scheduling with Conditionals 166
6.5.1 Hierarchical Reduction 166
6.5.2 Enhanced Modulo Scheduling 168
6.5.3 Modulo Scheduling with Multiple InitiationIntervals 171
6.6 Iterative Modulo Scheduling 172
6.6.1 The Algorithm 173
Determining Scheduling Priority 175
Determining Earliest Start Time 176
Determining Candidate Time Slots 176
6.7 Optimizations 177
6.7.1 Modulo Variable Expansion 177
6.7.2 Using Loop Unrolling to Enhance Modulo Scheduling 178
7 Software Pipelining by Kernel Recognition 182
7.1 Introduction 182
7.1.1 Basic Idea 184
7.2 The URPR Algorithm 185
7.3 OPT: Optimal Loop Pipelining of Innermost Loops 189
7.4 General Handling of Conditionals 195
7.4.1 Perfect Pipelining 195
Compaction 196
The Algorithm 198
7.4.2 Enhanced Pipeline-Percolation Scheduling 208
7.4.3 Optimal Software Pipelining with Control Flow 214
7.5 Nested Loops 215
7.6 Procedure Calls 216
8 Epilogue 219
Bibliography 223
Index 261

Erscheint lt. Verlag 26.11.2016
Zusatzinfo XXI, 255 p. 78 illus., 30 illus. in color.
Verlagsort New York
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Programmiersprachen / -werkzeuge
Mathematik / Informatik Informatik Theorie / Studium
Technik Elektrotechnik / Energietechnik
Technik Nachrichtentechnik
Schlagworte Compilers • GPU architecture scheduling algorithms • GPUs • Instruction-level parallelism • Loop parallelization • Modulo scheduling • Optimization • Parallel Computing • Parallelism • Parallel Processing • Percolation scheduling • Performance • Scheduling • software pipelining • Superscalar • Trace scheduling • VLIW
ISBN-10 1-4899-7797-X / 148997797X
ISBN-13 978-1-4899-7797-7 / 9781489977977
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 4,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Deterministische und randomisierte Algorithmen

von Volker Turau; Christoph Weyer

eBook Download (2024)
De Gruyter (Verlag)
64,95
Mit über 150 Workouts in Java und Python

von Luigi Lo Iacono; Stephan Wiefling; Michael Schneider

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
29,99
Mit über 150 Workouts in Java und Python

von Luigi Lo Iacono; Stephan Wiefling; Michael Schneider

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
29,99