Post

Mathematics for Machine Learning - Chapter 3

Overview

Brief 2-3 sentence summary of what this chapter/section covers. What are the main learning objectives?


Key Concepts & Definitions

Norms

Norms are a function to determine the length of each vector $x$, $ x \in \mathbb{R}$.

Clear explanation of the concept.

Formal Definition:

Mathematical or precise definition goes here, using blockquotes for emphasis.

\[|| \cdot || : V \rightarrow \mathbb{R}, \\ x \mapsto ||x||,\]

such that for all $\lambda \in \mathbb{R}$ and $x,y \in V$ the following hold:

  • Absolutely homogenous: $ \lambda x =\lambda  x $
  • Triangle inequality: $ x + y \le x + y $
  • Positive definite: $ x \ge 0$ and $ x = 0 \Longleftrightarrow x = 0$

Intuition:

Norms assign a length to a vector, the function that makes this assignment must satisfy the following:

  • Multiplying the vector by a scalar will multiply the norm by that same scalar
  • The norm of the sum of two vectors must be less than or equal to the sum of the norms of the two vectors
  • If the norm is 0, the vector is the 0 vector.

Examples:

Manhattan norm (also called $l_1 : norm$) on $\mathbb{R}$ is:

\(||x||_1 := \sum_{i=1}^{n}{|x_i|},\) where $|\cdot|$ is the absolute value. The vecotrs where $||x||_1 = 1$ is the unit circle

Euclidean norm (also called $l_2 : norm$) of $x \in \mathbb{R}^n$:

\[||x||_2 := \sqrt{\sum_{i=1}^{n}{x_i^2}} = \sqrt{x^Tx}\]
computes distance from the origin. All vectors $x \in \mathbb{R}^2 with x _2 = 1$ is the unit circle.

Inner Products

Inner products all for intuitive geometrical concepts like the length of a vector, angle between two vectors, or distance between two vectors. Later we touch on orthogonal vectors

Dot Product

Dot product is the product of two matrices elementwise

\[x^Ty = \sum_{i=1}^{n}x_iy_i\]

Example:

\(\begin{bmatrix} 2 & 3 \end{bmatrix}^T \begin{bmatrix} 4 \\ 5 \end{bmatrix}\) \(\begin{bmatrix} 8 \\ 15 \end{bmatrix}\)

General Inner Products

Bilinear mapping $\Omega$ is a mapping with two arguments that is linear in each argument.

linear in the first argument: \(\Omega(\lambda x + \psi y, z) = \lambda \Omega{x,z} + \psi \Omega(y,z)\) linear in the second: \(\Omega(x, \lambda y + \psi z) = \lambda \Omega{x,zy} + \psi \Omega(y,z)\)

For $V$ a vector space and $\Omega : V \times V \mapsto \mathbb{R}$ ($\Omega$ maps V to a single value)

  • Symmetric if the order of the arguments does not matter, $\Omega(x,y) = \Omega(y,x)$ for all $x, y \in V$

  • Positive Definite if for all x in the vector space, $\Omega(x,x) > 0$ except for the 0 case, $\Omega(0,0) = 0$.

  • A positive definite, symmetric bilinear mapping is an inner product of $V$. Typically written as $\langle x,y \rangle$
  • The pair $(V, \langle \cdot , \cdot \rangle)$ in inner product space or (real) vector space with inner product.

Positive definitess of the inner products implies:

\[\forall x \in V \backslash \{0\}: x^TAx > 0\]
  • A symmetric matrix $A \in \mathbb{R}^{n \times n}$ is symmetric, positive definite if the above holds
  • symmetric, positive semidefinite if only $\ge 0$ holds

Theorem: For a real-valued, finite-dimensional vector space V and an ordered basis B of V, it holds that $\langle \cdot, \cdot \rangle : V \times V \mapsto \mathbb{R}$ is an inner product if and only if there exists a symmetric, positive definite matrix $A \in \mathbb{R}^{n \times n}$ with

\[\langle x , y \rangle = \hat{x}^TA\hat{y}.\]

With the following properties:

  • The null space (kernel) of $A$ consists only of $0$ because $x^TAx > 0$ for all $x \ne 0$. THis implies that $Ax \ne 0$ if $x \ne 0$
  • The diagonal elements $a_{ii}$ are positive because $a_{ii} = e_i^TAe_i > 0$, where $e_i$ is the i-th vector of the standard basis in $\mathbb{R}^n$

Lengths and Distances

Angles and Orthogonality

Orthonormal Basis

Orthonormal Complement

Inner Product of Functions

Orthogonal Projections

Rotations


Theorems & Important Results

Theorem Name (e.g., Bayes’ Theorem)

Statement:

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Proof: (optional)

  1. Start with the definition of conditional probability
  2. Apply algebraic manipulation
  3. Arrive at the result ∎

Intuition: What does this theorem tell us? When is it useful?

Applications:

  • Where this theorem is commonly used
  • Example domains or problems

Mathematical Derivations

Show important derivations step-by-step:

Starting from the basic assumption:

\[f(x) = ...\]

Apply technique/transformation:

\[g(x) = ...\]

Final result:

\[h(x) = ...\]

Examples & Problem Solutions

Example 1: Problem Title

Problem Statement:

Clearly state the problem to be solved.

Given:

  • Parameter 1: value
  • Parameter 2: value

Find: What we’re looking for

Solution:

Step 1: Initial setup

Explain the approach and show the math.

Step 2: Apply relevant theorem

\[result = calculation\]

Step 3: Simplify and interpret

Answer: $final_result$

Key Takeaway: What makes this problem important or what technique did we learn?

Example 2: Another Problem

(Follow same structure)


Code Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np

def example_function(x):
    """
    Brief description of what this code demonstrates.

    Args:
        x: input description

    Returns:
        output description
    """
    result = x ** 2
    return result

# Example usage
print(example_function(5))

Output:

1
25

Explanation: What does this code illustrate from the chapter?


Diagrams & Visualizations

graph LR
    A[Concept A] --> B[Concept B]
    B --> C[Result C]
    A --> D[Alternative Path]

Description of what the diagram shows and why it’s helpful


Personal Insights & Commentary

Connections to Previous Material

  • How this chapter relates to Chapter X
  • Links to other concepts or books

Confusing Points & Clarifications

  • What was initially confusing
  • How I understood it after working through examples
  • Common pitfalls to avoid

Practical Applications

  • Real-world use cases
  • Why this matters beyond the textbook

Open Questions

  • Things to explore further
  • Related topics to study next

Practice Problems

Problem 1

From textbook page XXX, exercise Y.Z

Problem 2

Self-generated problem to test understanding:

State the problem


Summary & Key Takeaways

Quick bullet-point recap:

  1. Main Concept 1: One-sentence summary
  2. Main Concept 2: One-sentence summary
  3. Important Formula: $key_equation$
  4. Practical Insight: Why this matters

References & Further Reading

Primary Source

  • Book: Book Title by Author Name, Chapter X
  • Pages: pp. XXX-YYY

Supplementary Resources


Revision History

  • 2025-01-01: Initial notes created
  • 2025-01-05: Added practice problem solutions
  • 2025-01-10: Clarified proof of Theorem X

This post will be updated as I work through problems and gain deeper understanding.

This post is licensed under CC BY 4.0 by the author.