Linear Algebra in ML

Have you ever wondered why every machine learning course starts with linear algebra? Let me show you why this branch of mathematics is the bedrock of modern AI, and how its elegant simplicity enables complex learning systems.

The Mathematics Behind Machine Learning

At its core, machine learning operates on data represented as vectors and matrices. When we train a model, we’re essentially performing a series of linear transformations on these mathematical objects.

Vector Spaces and Linear Transformations

Consider a dataset with $n$ features. Each data point lives in an n-dimensional vector space $\mathbb{R}^n$. When we apply a linear transformation to this data, we’re multiplying it by a matrix $A$:

$$f(x) = Ax$$

This simple operation forms the basis of neural networks, where each layer performs a linear transformation followed by a non-linear activation function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np

def linear_transform(X: np.ndarray, W: np.ndarray, b: np.ndarray) -> np.ndarray:
"""
Applies a linear transformation to input data.

Args:
X: Input data matrix of shape (n_samples, n_features)
W: Weight matrix of shape (n_features, n_output)
b: Bias vector of shape (n_output,)

Returns:
Transformed data of shape (n_samples, n_output)
"""
return np.dot(X, W) + b

Matrix Operations in Practice

Let’s examine how matrix operations power some fundamental machine learning techniques.

Principal Component Analysis

PCA relies heavily on eigendecomposition to find the directions of maximum variance in our data. The covariance matrix $C$ is computed as:

$$C = \frac{1}{n}X^TX$$

And we solve the eigenvalue equation:

$$Cv = \lambda v$$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def compute_pca(X: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]:
"""
Performs PCA dimensionality reduction.

Args:
X: Data matrix of shape (n_samples, n_features)
n_components: Number of components to keep

Returns:
eigenvalues, eigenvectors of the covariance matrix
"""
# Center the data
X_centered = X - np.mean(X, axis=0)

# Compute covariance matrix
cov_matrix = np.dot(X_centered.T, X_centered) / X.shape[0]

# Compute eigendecomposition
eigenvals, eigenvecs = np.linalg.eigh(cov_matrix)

# Sort in descending order
idx = np.argsort(eigenvals)[::-1]
eigenvals = eigenvals[idx][:n_components]
eigenvecs = eigenvecs[:, idx][:, :n_components]

return eigenvals, eigenvecs

Neural Networks and Linear Algebra

Modern deep learning architectures are essentially compositions of linear transformations and non-linearities. Each layer in a neural network performs:

$$h = \sigma(Wx + b)$$

where $\sigma$ is a non-linear activation function, $W$ is the weight matrix, and $b$ is the bias vector.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def neural_network_layer(
X: np.ndarray,
W: np.ndarray,
b: np.ndarray,
activation: callable
) -> np.ndarray:
"""
Implements a single neural network layer.

Args:
X: Input data
W: Weight matrix
b: Bias vector
activation: Activation function

Returns:
Layer output after activation
"""
Z = linear_transform(X, W, b)
return activation(Z)

Optimization and Gradient Descent

The training process itself relies heavily on linear algebra. When we perform gradient descent, we’re computing partial derivatives with respect to matrices:

$$W_{t+1} = W_t - \alpha \frac{\partial L}{\partial W_t}$$

where $L$ is our loss function and $\alpha$ is the learning rate.

Advanced Applications

Linear algebra enables sophisticated techniques like Singular Value Decomposition (SVD), which is crucial for recommendation systems and matrix factorization. The SVD of a matrix $A$ is given by:

$$A = U\Sigma V^T$$

where $U$ and $V$ are orthogonal matrices, and $\Sigma$ is a diagonal matrix containing singular values.

1
2
3
4
5
6
7
8
9
10
11
12
13
def truncated_svd(X: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""
Performs truncated SVD for dimensionality reduction.

Args:
X: Input matrix
k: Number of singular values to keep

Returns:
U, Sigma, V matrices of the truncated SVD
"""
U, s, Vt = np.linalg.svd(X, full_matrices=False)
return U[:, :k], s[:k], Vt[:k, :]

Linear algebra’s power lies in its ability to express complex transformations through simple matrix operations. As we push the boundaries of AI, understanding these fundamentals becomes increasingly important for developing and optimizing new architectures. The next time you train a model, remember that behind those high-level APIs lies a beautiful mathematical framework that makes it all possible.