Mathematics for Machine Learning - Chapter 3

Posted Dec 28, 2025 Updated Jan 3, 2026

By Sean Kane 5 min read

Overview

Brief 2-3 sentence summary of what this chapter/section covers. What are the main learning objectives?

Key Concepts & Definitions

Norms

Norms are a function to determine the length of each vector $x$, $

\in \mathbb{R}$.

Clear explanation of the concept.

Formal Definition:

Mathematical or precise definition goes here, using blockquotes for emphasis.

\[|| \cdot || : V \rightarrow \mathbb{R}, \\ x \mapsto ||x||,\]

such that for all $\lambda \in \mathbb{R}$ and $x,y \in V$ the following hold:

Absolutely homogenous: $ \lambda x = \lambda x $
Triangle inequality: $ x + y \le x + y $
Positive definite: $ x \ge 0$ and $ x = 0 \Longleftrightarrow x = 0$

Intuition:

Norms assign a length to a vector, the function that makes this assignment must satisfy the following:

Multiplying the vector by a scalar will multiply the norm by that same scalar
The norm of the sum of two vectors must be less than or equal to the sum of the norms of the two vectors
If the norm is 0, the vector is the 0 vector.

Examples:

Manhattan norm (also called $l_1 : norm$) on $\mathbb{R}$ is:

$||x||_1 := \sum_{i=1}^{n}{|x_i|},$ where $|\cdot|$ is the absolute value. The vecotrs where $||x||_1 = 1$ is the unit circle

Euclidean norm (also called $l_2 : norm$) of $x \in \mathbb{R}^n$:

\[||x||_2 := \sqrt{\sum_{i=1}^{n}{x_i^2}} = \sqrt{x^Tx}\]

computes distance from the origin. All vectors $x \in \mathbb{R}^2 with

_2 = 1$ is the unit circle.

Inner Products

Inner products all for intuitive geometrical concepts like the length of a vector, angle between two vectors, or distance between two vectors. Later we touch on orthogonal vectors

Dot Product

Dot product is the product of two matrices elementwise

\[x^Ty = \sum_{i=1}^{n}x_iy_i\]

Example:

$\begin{bmatrix} 2 & 3 \end{bmatrix}^T \begin{bmatrix} 4 \\ 5 \end{bmatrix}$ $\begin{bmatrix} 8 \\ 15 \end{bmatrix}$

General Inner Products

Bilinear mapping $\Omega$ is a mapping with two arguments that is linear in each argument.

linear in the first argument: $\Omega(\lambda x + \psi y, z) = \lambda \Omega{x,z} + \psi \Omega(y,z)$ linear in the second: $\Omega(x, \lambda y + \psi z) = \lambda \Omega{x,zy} + \psi \Omega(y,z)$

For $V$ a vector space and $\Omega : V \times V \mapsto \mathbb{R}$ ($\Omega$ maps V to a single value)

Symmetric if the order of the arguments does not matter, $\Omega(x,y) = \Omega(y,x)$ for all $x, y \in V$
Positive Definite if for all x in the vector space, $\Omega(x,x) > 0$ except for the 0 case, $\Omega(0,0) = 0$.
A positive definite, symmetric bilinear mapping is an inner product of $V$. Typically written as $\langle x,y \rangle$
The pair $(V, \langle \cdot , \cdot \rangle)$ in inner product space or (real) vector space with inner product.

Positive definitess of the inner products implies:

\[\forall x \in V \backslash \{0\}: x^TAx > 0\]

A symmetric matrix $A \in \mathbb{R}^{n \times n}$ is symmetric, positive definite if the above holds
symmetric, positive semidefinite if only $\ge 0$ holds

Theorem: For a real-valued, finite-dimensional vector space V and an ordered basis B of V, it holds that $\langle \cdot, \cdot \rangle : V \times V \mapsto \mathbb{R}$ is an inner product if and only if there exists a symmetric, positive definite matrix $A \in \mathbb{R}^{n \times n}$ with

\[\langle x , y \rangle = \hat{x}^TA\hat{y}.\]

With the following properties:

The null space (kernel) of $A$ consists only of $0$ because $x^TAx > 0$ for all $x \ne 0$. THis implies that $Ax \ne 0$ if $x \ne 0$
The diagonal elements $a_{ii}$ are positive because $a_{ii} = e_i^TAe_i > 0$, where $e_i$ is the i-th vector of the standard basis in $\mathbb{R}^n$

Lengths and Distances

Angles and Orthogonality

Orthonormal Basis

Orthonormal Complement

Inner Product of Functions

Orthogonal Projections

Rotations

Theorems & Important Results

Theorem Name (e.g., Bayes’ Theorem)

Statement:

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Proof: (optional)

Start with the definition of conditional probability
Apply algebraic manipulation
Arrive at the result ∎

Intuition: What does this theorem tell us? When is it useful?

Applications:

Where this theorem is commonly used
Example domains or problems

Mathematical Derivations

Show important derivations step-by-step:

Starting from the basic assumption:

\[f(x) = ...\]

Apply technique/transformation:

\[g(x) = ...\]

Final result:

\[h(x) = ...\]

Examples & Problem Solutions

Example 1: Problem Title

Problem Statement:

Clearly state the problem to be solved.

Given:

Parameter 1: value
Parameter 2: value

Find: What we’re looking for

Solution:

Step 1: Initial setup

Explain the approach and show the math.

Step 2: Apply relevant theorem

\[result = calculation\]

Step 3: Simplify and interpret

Answer: $final_result$

Key Takeaway: What makes this problem important or what technique did we learn?

Example 2: Another Problem

(Follow same structure)

Code Implementation

  
import numpy as np

def example_function(x):
    """
    Brief description of what this code demonstrates.

    Args:
        x: input description

    Returns:
        output description
    """
    result = x ** 2
    return result

# Example usage
print(example_function(5))

Output:

25

Explanation: What does this code illustrate from the chapter?

Diagrams & Visualizations

graph LR
    A[Concept A] --> B[Concept B]
    B --> C[Result C]
    A --> D[Alternative Path]

Description of what the diagram shows and why it’s helpful

Personal Insights & Commentary

Connections to Previous Material

How this chapter relates to Chapter X
Links to other concepts or books

Confusing Points & Clarifications

What was initially confusing
How I understood it after working through examples
Common pitfalls to avoid

Practical Applications

Real-world use cases
Why this matters beyond the textbook

Open Questions

Things to explore further
Related topics to study next

Practice Problems

Problem 1

From textbook page XXX, exercise Y.Z

Problem 2

Self-generated problem to test understanding:

State the problem

Summary & Key Takeaways

Quick bullet-point recap:

Main Concept 1: One-sentence summary
Main Concept 2: One-sentence summary
Important Formula: $key_equation$
Practical Insight: Why this matters

References & Further Reading

Primary Source

Book: Book Title by Author Name, Chapter X
Pages: pp. XXX-YYY

Supplementary Resources

Video Lecture Title - Brief description
Blog Post/Article - What it covers
Related Paper - Why it’s relevant

Revision History

2025-01-01: Initial notes created
2025-01-05: Added practice problem solutions
2025-01-10: Clarified proof of Theorem X

This post will be updated as I work through problems and gain deeper understanding.

Machine Learning, Analytic Geometry

math-for-ml

This post is licensed under CC BY 4.0 by the author.