Blick ins Buch

Vectorization (eBook)

A Practical Guide to Efficient Implementations of Machine Learning Algorithms

Edward Dongbo Cui (Autor)

eBook Download: EPUB

2024
623 Seiten
Wiley-IEEE Press (Verlag)
978-1-394-27295-2 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Enables readers to develop foundational and advanced vectorization skills for scalable data science and machine learning and address real-world problems

Offering insights across various domains such as computer vision and natural language processing, Vectorization covers the fundamental topics of vectorization including array and tensor operations, data wrangling, and batch processing. This book illustrates how the principles discussed lead to successful outcomes in machine learning projects, serving as concrete examples for the theories explained, with each chapter including practical case studies and code implementations using NumPy, TensorFlow, and PyTorch.

Each chapter has one or two types of contents: either an introduction/comparison of the specific operations in the numerical libraries (illustrated as tables) and/or case study examples that apply the concepts introduced to solve a practical problem (as code blocks and figures). Readers can approach the knowledge presented by reading the text description, running the code blocks, or examining the figures.

Written by the developer of the first recommendation system on the Peacock streaming platform, Vectorization explores sample topics including:

Basic tensor operations and the art of tensor indexing, elucidating how to access individual or subsets of tensor elements
Vectorization in tensor multiplications and common linear algebraic routines, which form the backbone of many machine learning algorithms
Masking and padding, concepts which come into play when handling data of non-uniform sizes, and string processing techniques for natural language processing (NLP)
Sparse matrices and their data structures and integral operations, and ragged or jagged tensors and the nuances of processing them

From the essentials of vectorization to the subtleties of advanced data structures, Vectorization is an ideal one-stop resource for both beginners and experienced practitioners, including researchers, data scientists, statisticians, and other professionals in industry, who seek academic success and career advancement.

Edward DongBo Cui is a Data Science and Machine Learning Engineering Leader who holds a PhD in Neuroscience from Case Western Reserve University, USA. Edward served as Director of Data Science at NBC Universal, building the first recommendation system on the new Peacock streaming platform. Previously, he was Lead Data Scientist at Nielsen Global Media. He is an expert in ML engineering, research, and MLOps to drive data-centric decision-making and enhance product innovation.

1
Introduction to Vectorization

1.1 What Is Vectorization

Vectorization is a type of parallel computing paradigm that performs arithmetic operations on an array of numbers [Intel, 2022] within a single data processing unit (either CPU or GPU). Modern CPUs can perform between 4 and 16 single precision (float32) parallel computations, depending on the type of instructions [Intel, 2022] (see Figure 1.1).

Here, SSE, or streaming SIMD (single-instruction-multiple-data) extension, has 128 registers; each register is used to store 1-bit of data for computations. This is equivalent to performing four single-precision (float32) operations or two double-precision (float64) operations. Intel’s AVX512, or Advanced Vector Extension 512, on the other hand, has 512 registers and can therefore process 512 bits of data simultaneously. This is equivalent to performing 16 single-precision or 8 double-precision operations.

The term “Vectorization” also refers to the process of transforming an algorithm computable on one data point at a time to an operation that calculates a collection of data simultaneously [Khartchenko, 2018]. A classic example is the dot product operation,

where , and .

The dot product operation on the right-hand side of the equation is the vectorized form of the summation process on the left. Instead of operating on one element of the array at a time, it operates on the collections or arrays, and .

Figure 1.1 CPU vector computations.

Note

Another definition of the term “vectorization” is related to the more recent advances in large language models: using a vector or an array of values to represent a word. This is more commonly known as embedding in the literature which originated from the word2vec model in natural language processing. In Chapters 6 and 7, we will introduce the idea of representing string or categorical features as numerical values, i.e. integer indices, sparse one-hot encoding vectors, and dense embedding vectors.

1.1.1 A Simple Example of Vectorization in Action

Suppose we want to take the dot product of 1 million random numbers drawn from the uniform distribution. Using Python’s for loop, we would typically implement this algorithm as the follows:

On my M1 MacBook Pro under Python 3.11.4, this is the runtime of the above function:

If we implement the algorithm using numpy with vectorization

Under the same setup, we have the following runtime of the vectorized implementation.

This is a ∼37× speed-up by vectorization!

1.1.2 Python Can Still Be Faster!

However, the above example is not to discount that certain native Python operations are still faster than NumPy’s implementations, despite NumPy being often used for vectorization operations to simplify for-loops in Python.

For example, we would like to split a list of strings delimited by periods “.”. Using Python’s list comprehension, we have

Line 2 runs for

However, in comparison, if we use NumPy’s “vectorized” split operation

Line 3 runs for

which is about ∼4.0 slower than native Python.

1.1.3 Memory Allocation of Vectorized Operations

When using vectorized operations, we need to consider the trade-off between memory and speed. Sometimes, vectorized operations would need to allocate more memory for results from intermediate steps. Therefore, this has implications on the overall performance of the program, in terms of space vs. time complexity. Consider a function in which we would like to compute the sum of the squares of the elements in an array. This can be implemented as a for-loop, where we iterate through each element, take the square, and accumulate the square values as the total. Alternatively, the function can also be implemented as a vectorized operation where we take the square of each element first, store the results in an array, and then sum the squared vector.

Let us use the memory_profiler package to examine the memory usage of each line of the two functions. We can first install the package by pip install memory_profiler==0.61.0 and load it as a plugin in a Jupyter Notebook.

We also save the above script into a file (which we call memory_allocation_example.py), then we measure the memory trace of each function in the notebook as follows:

This gives us the following line-by-line memory trace of the function call:

We can also use %timeit magic function to measure the speed of the function:

which gives

Similarly, let us also look at the for-loop implementation

And we also test the run time as

which gives

We can see that although the for-loop function is slower than the vectorized function, the for-loop had almost no memory increase. In contrast, the vectorized solution has a ∼1M increase in memory usage while reducing the run time by about 30%. If X is several magnitudes larger than our current example, then we can see that even more memory will be consumed in exchange for speed. Hence, we need to consider carefully the trade-off between speed and memory (or time vs. space) when using vectorized operations.

1.2 Case Study: Dense Layer of a Neural Network

Vectorization is a key skill for implementing various machine learning models, especially for deep learning algorithms. A simple neural network consists of several feed-forward layers, as illustrated in Figure 1.2.

A feed-forward layer (also called dense layer, linear layer, perceptron layer, or hidden layer, as illustrated in Figure 1.3) has the following expression:

Notice that is a linear model. Here,

is the input, and typically of shape (batch_size, num_inputs)
is the weight (a.k.a. kernel) parameter matrix of shape (num_inputs, hidden_size)
is the bias parameter vector of shape (1, hidden_size)
Figure 1.2 Multilayer perceptron.

Figure 1.3 Feed-forward layer.
is an activation function that is applied to every element of the resulting linear model operation. It can be, e.g., relu, sigmoid, or tanh.
is the output of shape (batch_size, hidden_size)

Stacking multiple feed-forward layers creates the multilayer perceptron model, as illustrated by our example neural network above.

If we were to implement a feed-forward layer using only Python naively, we would need to run thousands to millions of iterations of the for-loop, depending on the batch_size, num_inputs, and hidden_size. When it comes to large language models like generative pretrained transformer (GPT) which has hundreds of billions of parameters to optimize, using for loops to make such computations can become so slow that training or making an inference using the model is infeasible. But if we take advantage of vectorization on either CPU or GPU, we would see significant gains in the performance of our model. Furthermore, certain operations like matrix multiplication are simply inefficient to perform if implemented naively by simply parallelizing a for-loop, so in practice must rely on special algorithms such as Strassen’s Algorithm [Strassen, 1969] (or more recently, a new algorithm discovered by AlphaTensor [Fawzi et al., 2022]). Implementing these algorithms as an instruction on either CPU or GPU is a non-trivial task itself. Practitioners of machine learning and data science should therefore take advantage of pre-existing libraries that implement these algorithms. But to use these libraries efficiently, one needs to be adept in thinking in terms of vectorization.

In the following, let’s take a look at an example of how to implement the above perceptron layer using numpy.

Lines 3–5 define the sigmoid activation function,
Line 10 implements the linear model
1. – Here, the @ operator is the matrix multiplication operator that is recognized by typical vector libraries like numpy, tensorflow, and torch. We will discuss more on matrix operations in Chapter 4.
2. – The result of multiplying and should be a matrix of shape (batch_size, hidden_size), but in this implementation, we have of shape (1, hidden_size). How can a vector be added to a matrix? In Chapter 2, we will explore the concept called broadcasting.
Line 12 applies the sigmoid activation we defined previously to compute the final output of this dense layer. This is an element-wise operation, where we apply the sigmoid function to each...

Erscheint lt. Verlag	18.12.2024
Sprache	englisch
Themenwelt	Technik ► Bauwesen
Schlagworte	algorithms • Artificial Intelligence • Computer Science • Data Science • data structure • Keras • machine learning • Mathematics • NumPy • Pandas • Python • PyTorch • Research • Scientific Computing • SciPy • string processing • tensorflow
ISBN-10	1-394-27295-2 / 1394272952
ISBN-13	978-1-394-27295-2 / 9781394272952

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 22,8 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.