Build Your Own Programming Language - Clinton L. Jeffery

Build Your Own Programming Language (eBook)

A developer's comprehensive guide to crafting, compiling, and implementing programming languages

Clinton L. Jeffery (Autor)

eBook Download: EPUB

2024 | 1. Auflage
556 Seiten
Packt Publishing (Verlag)
978-1-80461-715-1 (ISBN)

There are many reasons to build a programming language: out of necessity, as a learning exercise, or just for fun. Whatever your reasons, this book gives you the tools to succeed.
You'll build the frontend of a compiler for your language and generate a lexical analyzer and parser using Lex and YACC tools. Then you'll explore a series of syntax tree traversals before looking at code generation for a bytecode virtual machine or native code. In this edition, a new chapter has been added to assist you in comprehending the nuances and distinctions between preprocessors and transpilers. Code examples have been modernized, expanded, and rigorously tested, and all content has undergone thorough refreshing. You'll learn to implement code generation techniques using practical examples, including the Unicon Preprocessor and transpiling Jzero code to Unicon. You'll move to domain-specific language features and learn to create them as built-in operators and functions. You'll also cover garbage collection.
Dr. Jeffery's experiences building the Unicon language are used to add context to the concepts, and relevant examples are provided in both Unicon and Java so that you can follow along in your language of choice.
By the end of this book, you'll be able to build and deploy your own domain-specific language.

Embark on a journey through essential components of language design, compiler construction, preprocessors, transpilers, and runtime systems in this second edition, authored by the creator of the Unicon programming language. Purchase of the print or Kindle book includes a free PDF eBookKey FeaturesSolve pain points in your application domain by building a custom programming languageLearn how to create parsers, code generators, semantic analyzers, and interpretersTarget bytecode, native code, and preprocess or transpile code into another high-level languageBook DescriptionThere are many reasons to build a programming language: out of necessity, as a learning exercise, or just for fun. Whatever your reasons, this book gives you the tools to succeed. You ll build the frontend of a compiler for your language with a lexical analyzer and parser, including the handling of parse errors. Then you ll explore a series of syntax tree traversals before looking at code generation for a bytecode virtual machine or native code. In this edition, a new chapter has been added to assist you in comprehending the nuances and distinctions between preprocessors and transpilers. Code examples have been modernized, expanded, and rigorously tested, and all content has undergone thorough refreshing. You ll learn to implement code generation techniques using practical examples, including the Unicon Preprocessor and transpiling Jzero code to Unicon. You'll move to domain-specific language features and learn to create them as built-in operators and functions. You ll also cover garbage collection. Dr. Jeffery s experiences building the Unicon language are used to add context to the concepts, and relevant examples are provided in both Unicon and Java so that you can follow along in your language of choice. By the end of this book, you'll be able to build and deploy your own domain-specific language.What you will learnAnalyze requirements for your language and design syntax and semantics.Write grammar rules for common expressions and control structures.Build a scanner to read source code and generate a parser to check syntax.Implement syntax-coloring for your code in IDEs like VS Code.Write tree traversals and insert information into the syntax tree.Implement a bytecode interpreter and run bytecode from your compiler.Write native code and run it after assembling and linking using system tools.Preprocess and transpile code into another high-level languageWho this book is forThis book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler design or construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate or better proficiency in Java or C++ programming languages (or another high-level programming language) is assumed.]]>

Preface

This second edition was begun primarily at the suggestion of a first edition reader, who called me one day and explained that they were using the book for a programming language project. The project was not generating code for a bytecode interpreter or a native instruction set as covered in the first edition. Instead, they were creating a transpiler from a classic legacy programming language to a modern mainstream language. There are many such projects, because there is a lot of old code out there that is still heavily used. The Unicon translator itself started as a preprocessor and then was extended until it became in some sense, a transpiler. So, when Packt asked for a second edition, it was natural to propose a new chapter on that topic; this edition has a new Chapter 11 and all chapters (starting from what was Chapter 11 in the previous edition) have seen their number incremented by one. A second major facet of this second edition was requested by Packt and not my idea at all. They requested that the IDE syntax coloring chapter be extended to deal with the topic of adding syntax coloring to mainstream IDEs that I did not write and do not use, instead of its previous content on syntax coloring in the Unicon IDEs. Although this topic is outside my comfort zone, it is a valuable topic that is somewhat under-documented at present and easily deserves inclusion, so here it is. You, as the reader, can decide whether I have managed to do it any justice as an introduction to that topic.

After 60+ years of high-level language development, programming is still too difficult. The demand for software of ever-increasing size and complexity has exploded due to hardware advances, while programming languages have improved far more slowly. Creating new languages for specific purposes is one antidote for this software crisis.

This book is about building new programming languages. The topic of programming language design is introduced, although the primary emphasis is on programming language implementation. Within this heavily studied subject, the novel aspect of this book is its fusing of traditional compiler-compiler tools (Flex and Byacc) with two higher-level implementation languages. A very high-level language (Unicon) plows through a compiler’s data structures and algorithms like butter, while a mainstream modern language (Java) shows how to implement the same code in a more typical production environment.

One thing I didn’t really understand after my college compiler class was that the compiler is only one part of a programming language implementation. Higher-level languages, including most newer languages, may have a runtime system that dwarfs their compiler. For this reason, the second half of this book spends quality time on a variety of aspects of language runtime systems, ranging from bytecode interpreters to garbage collection.

Who this book is for

This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate-level knowledge and experience of working with a high-level language such as Java or C++ are required in order to get the most out of this book.

What this book covers

Chapter 1, Why Build Another Programming Language?, discusses when to build a programming language, and when to instead design a function library or a class library. Many readers of this book will already know that they want to build their own programming language. Some should design a library instead.

Chapter 2, Programming Language Design, covers how to precisely define a programming language, which is important to know before trying to build a programming language. This includes the design of the lexical and syntax features of the language, as well as its semantics. Good language designs usually use as much familiar syntax as possible.

Chapter 3, Scanning Source Code, presents lexical analysis, including regular expression notation and the tools Ulex and JFlex. By the end, you will be opening source code files, reading them character by character, and reporting their contents as a stream of tokens consisting of the individual words, operators, and punctuation in the source file.

Chapter 4, Parsing, presents syntax analysis, including context-free grammars and the tools iyacc and byacc/j. You will learn how to debug problems in grammars that prevent parsing, and report syntax errors when they occur.

Chapter 5, Syntax Trees, covers syntax trees. The main by-product of the parsing process is the construction of a tree data structure that represents the source code’s logical structure. The construction of tree nodes takes place in the semantic actions that execute on each grammar rule.

Chapter 6, Symbol Tables, shows you how to construct symbol tables, insert symbols into them, and use symbol tables to identify two kinds of semantic errors: undeclared and illegally redeclared variables. In order to understand variable references in executable code, each variable’s scope and lifetime must be tracked. This is accomplished by means of table data structures that are auxiliary to the syntax tree.

Chapter 7, Checking Base Types, covers type checking, which is a major task required in most programming languages. Type checking can be performed at compile time or at runtime. This chapter covers the common case of static compile-time type checking for base types, also referred to as atomic or scalar types.

Chapter 8, Checking Types on Arrays, Method Calls, and Structure Accesses, shows you how to perform type checks for the arrays, parameters, and return types of method calls in the Jzero subset of Java. The more difficult parts of type checking are when multiple or composite types are involved. This is the case when functions with multiple parameter types must be checked, or when arrays, hash tables, class instances, or other composite types must be checked.

Chapter 9, Intermediate Code Generation, shows you how to generate intermediate code by looking at examples for the Jzero language. Before generating code for execution, most compilers turn the syntax tree into a list of machine-independent intermediate code instructions. Key aspects of control flow, such as the generation of labels and goto instructions, are handled at this point.

Chapter 10, Syntax Coloring in an IDE, addresses the challenge of incorporating information from syntax analysis into an IDE in order to provide syntax coloring and visual feedback about syntax errors. A programming language requires more than just a compiler or interpreter – it requires an ecosystem of tools for developers. This ecosystem can include debuggers, online help, or an integrated development environment.

Chapter 11, Preprocessors and Transpilers, gives an overview of generating output intended to be compiled or interpreted by another high-level language. Preprocessors are usually line-oriented and translate lines into very similar output, while transpilers usually translate one high-level language to a different high-level language with a full parse and significant semantic changes.

Chapter 12, Bytecode Interpreters, covers designing the instruction set and the interpreter that executes bytecode. A new domain-specific language may include high-level domain programming features that are not supported directly by mainstream CPUs. The most practical way to generate code for many languages is to generate bytecode for an abstract machine whose instruction set directly supports the domain, and then execute programs by interpreting that instruction set.

Chapter 13, Generating Bytecode, continues with code generation, taking the intermediate code from Chapter 9, Intermediate Code Generation, and generating bytecode from it. Translation from intermediate code to bytecode is a matter of walking through a giant linked list, translating each intermediate code instruction into one or more bytecode instructions. Typically, this is a loop to traverse the linked list, with a different chunk of code for each intermediate code instruction.

Chapter 14, Native Code Generation, provides an overview of generating native code for x86_64. Some programming languages require native code to achieve their performance requirements. Native code generation is like bytecode generation, but more complex, involving register allocation and memory addressing modes.

Chapter 15, Implementing Operators and Built-In Functions, describes how to support very high-level and domain-specific language features by adding operators and functions that are built into the language. Very high-level and domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. Adding built-ins may simplify your language, improve its performance, or enable side effects in your language semantics that would otherwise be difficult or impossible. The examples in this chapter are drawn from Unicon, as it is much higher level than Java and implements more complex semantics in its built-ins.

Chapter 16, Domain Control Structures, covers when you need a new control structure, and...

Erscheint lt. Verlag	31.1.2024
Vorwort	Imran Ahmad
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
	Informatik ► Theorie / Studium ► Compilerbau
	Mathematik / Informatik ► Informatik ► Web / Internet
ISBN-10	1-80461-715-6 / 1804617156
ISBN-13	978-1-80461-715-1 / 9781804617151

Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 8,1 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

47,35 €