Mathematics for Machine Learning - Chapter 3
Overview
Brief 2-3 sentence summary of what this chapter/section covers. What are the main learning objectives?
Key Concepts & Definitions
Norms
| Norms are a function to determine the length of each vector $x$, $ | x | \in \mathbb{R}$. |
Clear explanation of the concept.
Formal Definition:
\[|| \cdot || : V \rightarrow \mathbb{R}, \\ x \mapsto ||x||,\]Mathematical or precise definition goes here, using blockquotes for emphasis.
such that for all $\lambda \in \mathbb{R}$ and $x,y \in V$ the following hold:
Absolutely homogenous: $ \lambda x = \lambda x $ Triangle inequality: $ x + y \le x + y $ Positive definite: $ x \ge 0$ and $ x = 0 \Longleftrightarrow x = 0$
Intuition:
Norms assign a length to a vector, the function that makes this assignment must satisfy the following:
- Multiplying the vector by a scalar will multiply the norm by that same scalar
- The norm of the sum of two vectors must be less than or equal to the sum of the norms of the two vectors
- If the norm is 0, the vector is the 0 vector.
Examples:
Manhattan norm (also called $l_1 : norm$) on $\mathbb{R}$ is:
\(||x||_1 := \sum_{i=1}^{n}{|x_i|},\) where $|\cdot|$ is the absolute value. The vecotrs where $||x||_1 = 1$ is the unit circle
Euclidean norm (also called $l_2 : norm$) of $x \in \mathbb{R}^n$:
\[||x||_2 := \sqrt{\sum_{i=1}^{n}{x_i^2}} = \sqrt{x^Tx}\]| computes distance from the origin. All vectors $x \in \mathbb{R}^2 with | x | _2 = 1$ is the unit circle. |
Inner Products
Inner products all for intuitive geometrical concepts like the length of a vector, angle between two vectors, or distance between two vectors. Later we touch on orthogonal vectors
Dot Product
Dot product is the product of two matrices elementwise
\[x^Ty = \sum_{i=1}^{n}x_iy_i\]Example:
\(\begin{bmatrix} 2 & 3 \end{bmatrix}^T \begin{bmatrix} 4 \\ 5 \end{bmatrix}\) \(\begin{bmatrix} 8 \\ 15 \end{bmatrix}\)
General Inner Products
Bilinear mapping $\Omega$ is a mapping with two arguments that is linear in each argument.
linear in the first argument: \(\Omega(\lambda x + \psi y, z) = \lambda \Omega{x,z} + \psi \Omega(y,z)\) linear in the second: \(\Omega(x, \lambda y + \psi z) = \lambda \Omega{x,zy} + \psi \Omega(y,z)\)
For $V$ a vector space and $\Omega : V \times V \mapsto \mathbb{R}$ ($\Omega$ maps V to a single value)
Symmetric if the order of the arguments does not matter, $\Omega(x,y) = \Omega(y,x)$ for all $x, y \in V$
Positive Definite if for all x in the vector space, $\Omega(x,x) > 0$ except for the 0 case, $\Omega(0,0) = 0$.
- A positive definite, symmetric bilinear mapping is an inner product of $V$. Typically written as $\langle x,y \rangle$
- The pair $(V, \langle \cdot , \cdot \rangle)$ in inner product space or (real) vector space with inner product.
Positive definitess of the inner products implies:
\[\forall x \in V \backslash \{0\}: x^TAx > 0\]- A symmetric matrix $A \in \mathbb{R}^{n \times n}$ is symmetric, positive definite if the above holds
- symmetric, positive semidefinite if only $\ge 0$ holds
Theorem: For a real-valued, finite-dimensional vector space V and an ordered basis B of V, it holds that $\langle \cdot, \cdot \rangle : V \times V \mapsto \mathbb{R}$ is an inner product if and only if there exists a symmetric, positive definite matrix $A \in \mathbb{R}^{n \times n}$ with
\[\langle x , y \rangle = \hat{x}^TA\hat{y}.\]With the following properties:
- The null space (kernel) of $A$ consists only of $0$ because $x^TAx > 0$ for all $x \ne 0$. THis implies that $Ax \ne 0$ if $x \ne 0$
- The diagonal elements $a_{ii}$ are positive because $a_{ii} = e_i^TAe_i > 0$, where $e_i$ is the i-th vector of the standard basis in $\mathbb{R}^n$
Lengths and Distances
Angles and Orthogonality
Orthonormal Basis
Orthonormal Complement
Inner Product of Functions
Orthogonal Projections
Rotations
Theorems & Important Results
Theorem Name (e.g., Bayes’ Theorem)
Statement:
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]Proof: (optional)
- Start with the definition of conditional probability
- Apply algebraic manipulation
- Arrive at the result ∎
Intuition: What does this theorem tell us? When is it useful?
Applications:
- Where this theorem is commonly used
- Example domains or problems
Mathematical Derivations
Show important derivations step-by-step:
Starting from the basic assumption:
\[f(x) = ...\]Apply technique/transformation:
\[g(x) = ...\]Final result:
\[h(x) = ...\]Examples & Problem Solutions
Example 1: Problem Title
Problem Statement:
Clearly state the problem to be solved.
Given:
- Parameter 1: value
- Parameter 2: value
Find: What we’re looking for
Solution:
Step 1: Initial setup
Explain the approach and show the math.
Step 2: Apply relevant theorem
\[result = calculation\]Step 3: Simplify and interpret
Answer: $final_result$
Key Takeaway: What makes this problem important or what technique did we learn?
Example 2: Another Problem
(Follow same structure)
Code Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np
def example_function(x):
"""
Brief description of what this code demonstrates.
Args:
x: input description
Returns:
output description
"""
result = x ** 2
return result
# Example usage
print(example_function(5))
Output:
1
25
Explanation: What does this code illustrate from the chapter?
Diagrams & Visualizations
graph LR
A[Concept A] --> B[Concept B]
B --> C[Result C]
A --> D[Alternative Path]
Description of what the diagram shows and why it’s helpful
Personal Insights & Commentary
Connections to Previous Material
- How this chapter relates to Chapter X
- Links to other concepts or books
Confusing Points & Clarifications
- What was initially confusing
- How I understood it after working through examples
- Common pitfalls to avoid
Practical Applications
- Real-world use cases
- Why this matters beyond the textbook
Open Questions
- Things to explore further
- Related topics to study next
Practice Problems
Problem 1
From textbook page XXX, exercise Y.Z
Problem 2
Self-generated problem to test understanding:
State the problem
Summary & Key Takeaways
Quick bullet-point recap:
- Main Concept 1: One-sentence summary
- Main Concept 2: One-sentence summary
- Important Formula: $key_equation$
- Practical Insight: Why this matters
References & Further Reading
Primary Source
- Book: Book Title by Author Name, Chapter X
- Pages: pp. XXX-YYY
Supplementary Resources
- Video Lecture Title - Brief description
- Blog Post/Article - What it covers
- Related Paper - Why it’s relevant
Related Chapters
- Previous: Chapter X-1 Title
- Next: Chapter X+1 Title
- See also: Related Topic
Revision History
- 2025-01-01: Initial notes created
- 2025-01-05: Added practice problem solutions
- 2025-01-10: Clarified proof of Theorem X
This post will be updated as I work through problems and gain deeper understanding.