Workflows for e-Science (eBook)
XXII, 526 Seiten
Springer London (Verlag)
978-1-84628-757-2 (ISBN)
This is a timely book presenting an overview of the current state-of-the-art within established projects, presenting many different aspects of workflow from users to tool builders. It provides an overview of active research, from a number of different perspectives. It includes theoretical aspects of workflow and deals with workflow for e-Science as opposed to e-Commerce. The topics covered will be of interest to a wide range of practitioners.
Dr. Ian Taylor has been a Lecturer at Cardiff University's School of Computer Science since 2002. He concurrently holds an adjunct Assistant Professorship at the Center for Computation & Technology at Louisiana State University and regularly offers consultations in the USA. He has a Ph.D. in Physics and Music and is the co-ordinator of Triana activities at Cardiff (http://www.trianacode.org). Through this he has been active in many major projects including GridLab, CoreGrid and GridOneD. His research interests include distributed techniques and workflow for Grid and P2P computing, which take in applications ranging from astrophysics and healthcare to distributed audio.
Ian has previously written a professional book for Springer on P2P, Web Services and Grids, and has published over 50 scientific papers. He has also co-edited a special edition for Journal of Grid Computing on Scientific Workflow.
Dr. Matthew Shields has been a research associate at Cardiff University, jointly in the Schools of Computer Science,
Physics and Astronomy, since 2001. He gained his Ph.D. in Computer Science from Cardiff University in the area of
problem solving environments. Dr Shields is one of two lead developers for the Triana project and has been responsible
for helping broaden its adoption within new application domains including biodiversity. His interests include problem
solving environments, workflow, component and service based computing, Grid and high-performance computing.
Ewa Deelman is an Research Assistant Professor at the USC Computer Science Department and a Research Team Leader at the Center for Grid Technologies at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative scientific environments based on Grid technologies, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. At ISI, Dr. Deelman is leading the Pegasus project, which designs and implements workflow mapping techniques for large-scale workflows running in distributed environments. Pegasus is being used day-to-day by scientists in a variety of disciplines including astronomy, gravitational-wave physics, earthquake science and many others. Prior to joining ISI in 2000, she was a Senior Software Developer at UCLA conducting research in the area of performance prediction of large-scale applications on high performance machines.
Dr. Deelman received her PhD from Rensselaer Polytechnic Institute in Computer Science in 1997 in the area of parallel discrete event simulation. Dr. Deelman is an Associate Editor responsible for Grid Computing for the Scientific Programming Journal and a chair of the GGF Workflow Management Research Group.
Dennis Gannon, Department of Computer Science, Lindley Hall, Indiana University, Bloomington, IN 47401 (gannon@cs.indiana.edu) Dr. Gannon is a professor of Computer Science in the School of Informatics at Indiana University. He is also Science Director for the Indiana Pervasive Technology Labs. He received his Ph.D. in Computer Science from the University of Illinois in 1980 and his Ph.D. in Mathematics from the University of California in 1974. From 1980 to 1985, he was on the faculty at Purdue University. From 1997-2004 he was Chairman of the Indiana Computer Science Department. His research interests include software tools for high performance parallel and distributed systems and problem solving environments for scientific computation.
This collection of articles on 'Work?ows for e-Science' is very timely and - portant. Increasingly, to attack the next generation of scienti?c problems, multidisciplinary and distributed teams of scientists need to collaborate to make progress on these new 'Grand Challenges'. Scientists now need to access and exploit computational resources and databases that are geographically distributed through theuseof high speed networks. 'Virtual Organizations' or 'VOs' must be established that span multiple administrative domains and/or institutions and which can provide appropriate authentication and author- ation services and access controls to collaborating members. Some of these VOsmayonlyhavea?eetingexistencebutthelifetimeofothersmayrun into many years. The Grid community is attempting to develop both sta- ards and middleware to enable both scientists and industry to build such VOs routinely and robustly. This, of course, has been the goal of research in distributed computing for many years; but now these technologies come with a new twist service orie- ation. By specifying resources in terms of a service description, rather than allowing direct access to the resources, the IT industry believes that such an approach results in the construction of more robust distributed systems. The industry has therefore united around web services as the standard technology toimplementsuchserviceorientedarchitecturesandtoensureinteroperability between di?erent vendor systems.
Dr. Ian Taylor has been a Lecturer at Cardiff University's School of Computer Science since 2002. He concurrently holds an adjunct Assistant Professorship at the Center for Computation & Technology at Louisiana State University and regularly offers consultations in the USA. He has a Ph.D. in Physics and Music and is the co-ordinator of Triana activities at Cardiff (http://www.trianacode.org). Through this he has been active in many major projects including GridLab, CoreGrid and GridOneD. His research interests include distributed techniques and workflow for Grid and P2P computing, which take in applications ranging from astrophysics and healthcare to distributed audio. Ian has previously written a professional book for Springer on P2P, Web Services and Grids, and has published over 50 scientific papers. He has also co-edited a special edition for Journal of Grid Computing on Scientific Workflow. Dr. Matthew Shields has been a research associate at Cardiff University, jointly in the Schools of Computer Science, Physics and Astronomy, since 2001. He gained his Ph.D. in Computer Science from Cardiff University in the area of problem solving environments. Dr Shields is one of two lead developers for the Triana project and has been responsible for helping broaden its adoption within new application domains including biodiversity. His interests include problem solving environments, workflow, component and service based computing, Grid and high-performance computing. Ewa Deelman is an Research Assistant Professor at the USC Computer Science Department and a Research Team Leader at the Center for Grid Technologies at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative scientific environments based on Grid technologies, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. At ISI, Dr. Deelman is leading the Pegasus project, which designs and implements workflow mapping techniques for large-scale workflows running in distributed environments. Pegasus is being used day-to-day by scientists in a variety of disciplines including astronomy, gravitational-wave physics, earthquake science and many others. Prior to joining ISI in 2000, she was a Senior Software Developer at UCLA conducting research in the area of performance prediction of large-scale applications on high performance machines. Dr. Deelman received her PhD from Rensselaer Polytechnic Institute in Computer Science in 1997 in the area of parallel discrete event simulation. Dr. Deelman is an Associate Editor responsible for Grid Computing for the Scientific Programming Journal and a chair of the GGF Workflow Management Research Group. Dennis Gannon, Department of Computer Science, Lindley Hall, Indiana University, Bloomington, IN 47401 (gannon@cs.indiana.edu) Dr. Gannon is a professor of Computer Science in the School of Informatics at Indiana University. He is also Science Director for the Indiana Pervasive Technology Labs. He received his Ph.D. in Computer Science from the University of Illinois in 1980 and his Ph.D. in Mathematics from the University of California in 1974. From 1980 to 1985, he was on the faculty at Purdue University. From 1997-2004 he was Chairman of the Indiana Computer Science Department. His research interests include software tools for high performance parallel and distributed systems and problem solving environments for scientific computation.
Foreword 7
Contents 9
List of Contributors 13
1 Introduction 22
1.1 Background 22
1.2 Application and User Perspective 24
1.3 Work.ow Representation and Common Structure 25
1.4 Frameworks and Tools: Work.ow Generation, Refinement and Execution 26
2 Scientific versus Business Workflows 30
Part I Application and User Perspective 38
3 Generating Complex Astronomy Workflows 40
3.1 Introduction 40
3.2 The Architecture of Montage 41
3.3 Grid-Enabled Montage 46
3.4 Supporting a Community of Users 52
Acknowledgments 58
4 A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis 60
4.1 Introduction 60
4.2 Gravitational Waves 60
4.3 The LIGO Data Grid Infrastructure 63
4.4 Constructing Workflows with the Grid/LSC User Environment 68
4.5 The Inspiral Analysis Workflow 73
4.6 Concluding Remarks 79
Acknowledgments 80
5 Workflows in Pulsar Astronomy 81
5.1 Introduction 81
5.2 Pulsars and Their Detection 81
5.3 Workflow for Signal Processing 83
5.4 Use of Metacomputing in Dedispersion 88
5.5 Workflows of Online Pulsar Searches 92
5.6 Future Work: Toward a Service-Oriented Approach 97
Acknowledgments 99
6 Workflow and Biodiversity e-Science 101
6.1 Introduction 101
6.2 Background: Biodiversity and e-Science 101
6.3 BiodiversityWorld as an e-Biodiversity Environment 103
6.4 Related Work 107
6.5 Toward an Exploratory Workflow Environment 109
6.6 Conclusions 110
Acknowledgments 111
7 Ecological Niche Modeling Using the Kepler Workflow System 112
7.1 Introduction 112
7.2 Approaches in Ecological Niche Modeling 113
7.3 Data Access via EcoGrid 116
7.4 Hierarchical Decomposition of the ENM Workflow 116
7.5 Modular Component Substitution 119
7.6 Transformation and Data Integration 122
7.7 Grid and Peer-to-Peer Computing 125
7.8 Opportunities for Biodiversity Science Using Scientific Workflows 126
7.9 Advantages of Automated Workflows for Biodiversity and Ecological Science 128
Acknowledgments 129
8 Case Studies on the Use of Workflow Technologies for Scientific Analysis: The Biomedical Informatics Research Network and the Telescience Project 130
8.1 Introduction 130
8.2 Framework for Integrated Workflow Environments 132
8.3 Scientific Process Workflows: Process and State Management Tools 135
8.4 The Role of Portals as Workflow Controllers 136
8.5 Interapplication Workflows: Pipeline-Building Tools 137
8.6 Intrapipeline Workflow: Planners and Execution Engines 139
8.7 Use Cases 140
8.8 The Telescience Project 140
8.9 The Biomedical Informatics Research Network ( BIRN) 141
8.10 Discussion 146
Acknowledgments 146
9 Dynamic, Adaptive Workflows for Mesoscale Meteorology 147
9.1 Introduction 147
9.2 The LEAD Data and Service Architecture 149
9.3 LEAD Workflow 151
9.4 Conclusions 162
9.5 Acknowledgments 163
10 SCEC CyberShake Workflows—Automating Probabilistic Seismic Hazard Analysis Calculations 164
10.1 Introduction to SCEC CyberShake Workflows 164
10.2 The SCEC Hardware and Software Computing Environment 167
10.3 SCEC Probabilistic Seismic Hazard Analysis Research 168
10.4 Computational Requirements of CyberShake 169
10.5 SCEC Work.ow Solutions to Key Workflow Requirements 172
10.6 Benefits of Modeling CyberShake as Workflows 173
10.7 Cost of Using the SCEC Workflow System 174
10.8 From Computational Pathway to Abstract Workflow 175
10.9 Resource Provisioning in the CyberShake Workflows 181
10.10 CyberShake Workflow Results 182
10.11 Conclusions 183
Acknowledgments 184
Part II Workflow Representation and Common Structure 186
11 Control- Versus Data-Driven Workflows 188
11.1 Introduction 188
11.2 Workflow Representations 189
11.3 Control-Driven Workflows 191
11.4 Data-Driven Workflows 193
11.5 Toward a Common Workflow Language 193
12 Component Architectures and Services: From Application Construction to Scientific Workflows 195
12.1 Introduction 195
12.2 Component Architectures: General Concepts 196
12.3 Models of Composition 199
12.4 Stateful and Stateless Components 206
12.5 Space and Time and the Limits to the Power of Graphical Expression 208
13 Petri Nets 211
13.1 Introduction 211
13.2 Choreography—Using Petri Nets for Modelling Abstract Applications 215
13.3 Orchestration—Using Petri Nets for Mapping Abstract Workflows onto Concrete Resources 222
13.4 Enactment—Using Petri Nets for Executing and Controlling e- Science Applications 223
13.5 Conclusions 227
Acknowledgments 228
14 Adapting BPEL to Scientific Workflows 229
14.1 Introduction 229
14.2 Short Overview of BPEL 229
14.3 Goals and Requirements for Scientific Workflows in Grids 234
14.4 Illustrative Grid Workflow Example 236
14.5 Workflow Life-Cycle on an Example of a GPEL Engine 240
14.6 Challenges in Using BPEL in Grids 246
15 Protocol-Based Integration Using SSDL and p-Calculus 248
15.1 Introduction 248
15.2 Service Orientation 250
15.3 SSDL Overview 252
15.4 The Sequential Constraint Protocol Framework 255
15.5 A Use Case 259
15.6 Related Work 262
15.7 Conclusions 264
Acknowledgments 264
16 Workflow Composition: Semantic Representations for Flexible Automation 265
16.1 Introduction 265
16.2 The Need for Assisted Workflow Composition 265
16.3 From Reusable Templates to Fully Specified Executable Workflows 271
16.4 Semantic Representations of Workflows to Support Assisted Composition 275
16.5 Automatic Completion of Workflows 277
16.6 Conclusions 278
Acknowledgments 278
17 Virtual Data Language: A Typed Workflow Notation for Diversely Structured Scientific Data 279
17.1 Introduction 279
17.2 Related Work 281
17.3 XDTM Overview 282
17.4 Physical and Logical Structure: An Example 282
17.5 Virtual Data Language 283
17.6 An Application Example: Functional MRI 290
17.7 VDL Implementation 294
17.8 Conclusion 296
Acknowledgments 296
Part III Frameworks and Tools: Work.ow Generation, Re . nement, and Execution 298
18 Workflow-Level Parametric Study Support by MOTEUR and the P-GRADE Portal 300
18.1 Introduction 300
18.2 Task-Based and Service-Based Workflows 301
18.3 Describing Parametric Application Workflows 302
18.4 Efficient Execution of Data-Intensive Workflows 304
18.5 Exploiting Both Task- and Service-Based Approaches in Parametric Data- Intensive Applications 311
18.6 MOTEUR Service-Based Work.ow Enactor 312
18.7 P-GRADE Portal 313
18.8 Conclusions 319
18.9 Acknowledgments 320
19 Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community 321
19.1 Introduction 321
19.2 The Bioinformatics Background 324
19.3 Aligning with Life Science 325
19.4 Architecture of Taverna 326
19.5 Discovering Resources and Designing Workflows 331
19.6 Executing and Monitoring Workflows 334
19.7 Managing and Sharing Workflows and Their Results 336
19.8 Related Work 337
19.9 Discussion and Future Directions 339
Acknowledgments 340
20 The Triana Workflow Environment: Architecture and Applications 341
20.1 Introduction 341
20.2 Relation to Other Frameworks 343
20.3 Inside The Triana Framework 344
20.4 Distributed Triana Workflows 345
20.5 Workflow Representation and Generation 351
20.6 Current Triana Applications 353
20.7 Example 1: Distributing GAP Services 354
20.8 Example 2: The Visual GAT 356
20.9 Conclusion 360
20.10 Acknowledgments 360
21 Java CoG Kit Workflow 361
21.1 Introduction 361
21.2 The Java CoG Kit Karajan Workflow Framework 366
21.3 Work.ow Support for Experiment Management 376
21.4 Conclusion 376
Acknowledgement 377
22 Workflow Management in Condor 378
22.1 Introduction 378
22.2 DAGMan Design Principles 379
22.3 DAGMan Details 380
22.4 Implementation Status 389
22.5 Interaction with Condor 390
22.6 Integration with Stork 390
22.7 Future Directions 395
22.8 Conclusions 396
23 Pegasus: Mapping Large-Scale Work.ows to Distributed Resources 397
23.1 Introduction 397
23.2 Workflow Generation for Pegasus 398
23.3 Pegasus and the Target Workflow Execution Environment 399
23.4 Pegasus and Workflow Refinement 402
23.5 Workflow Execution 406
23.6 Adapting the Workflow Mapping to a Dynamic Execution Environment 406
23.7 Optimizing Workflow Performance with Pegasus 408
23.8 Applications 411
23.9 Related Work 413
23.10 Conclusions 414
Acknowledgements 415
24 ICENI 416
24.1 Introduction 416
24.2 The Workflow Pipeline 423
24.3 Specification 424
24.4 Realization 426
24.5 Execution Environment 431
24.6 Application Interaction 435
24.7 Conclusion 435
25 Expressing Workflow in the Cactus Framework 437
25.1 Introduction 437
25.2 Structure 438
25.3 Basic Workflow in Cactus 438
25.4 Extensions 442
26 Sedna: A BPEL-Based Environment for Visual Scientific Workflow Modeling 449
26.1 Introduction 449
26.2 Modeling Scientific Workflows 451
26.3 Scientific Workflow Editor 457
26.4 Case Study: Polymorph Search 465
26.5 Related Work 468
26.6 Lessons Learned and Future Work 469
26.7 Acknowledgments 470
27 ASKALON: A Development and Grid Computing Environment for Scienti fic Workflows 471
27.1 Introduction 471
27.2 Work.ow Case Study and Grid Infrastructure 472
27.3 Work.ow Generation 474
27.4 Resource Manager 477
27.5 Scheduler 479
27.6 Execution Engine 484
27.7 Overhead Analysis 486
27.8 Conclusions 491
27.9 Acknowledgments 492
Part IV Future Requirements 494
Looking into the Future of Workflows: The Challenges Ahead 496
1 User Experience 496
2 Workflow Languages and Representations 498
3 Workflow Compilers 499
4 Workflow Enactors or Executors 500
5 Debugging 501
6 Execution Environments 501
7 The Big Question 502
References 504
Index 536
Erscheint lt. Verlag | 31.12.2007 |
---|---|
Zusatzinfo | XXII, 526 p. |
Verlagsort | London |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Betriebssysteme / Server |
Mathematik / Informatik ► Informatik ► Netzwerke | |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
Informatik ► Theorie / Studium ► Algorithmen | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Informatik ► Weitere Themen ► Hardware | |
Schlagworte | Algorithm analysis and problem complexity • BPEL • Complexity • Condor • Dag • Distributed Systems • E-Science • GRID • Jini • JXTA • Modeling • OGSA • OGSI • P2P • Peer-to-Peer • Petri net • Petri Nets • programming • SOA • SOAP • Virtual organizations • VO • Web Services • Workflow • Workflow Management • WSDL • WSRF |
ISBN-10 | 1-84628-757-X / 184628757X |
ISBN-13 | 978-1-84628-757-2 / 9781846287572 |
Haben Sie eine Frage zum Produkt? |
Größe: 12,4 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich