Initial commit

This commit is contained in:
2025-12-25 21:13:43 -08:00
commit 9ce7679e9c
40 changed files with 2430 additions and 0 deletions

View File

@@ -0,0 +1,29 @@
#Math #Calculus
# The Problem
If $f(x) = \sum _{k \geq 0} a_k x^k$, and this series converges for $x = x_0$, prove:
$$
\sum _{k \geq 0} a_k x_0^k H_k = \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy
$$
where $H_k$ is defined to be the partial sums of the harmonic series ($H_0 = 0$, $H_k = \sum _{i = 1}^k \frac 1 i$ for $k \geq 1$).
(from *The Art of Computer Programming*)
# Solution
Although this problem might seem intimidating with a power series involving the harmonic numbers on the LHS and a summation function inside an integral on the RHS, it is fairly trivial to bring out the summation and express the RHS as a power series:
$$
\int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy \\
= \int _0^1 \frac {\sum _{k \geq 0} a_k x_0^k - \sum _{k \geq 0} a_k x_0^k y^k} {1 - y} dy \\
= \sum _{k \geq 1} a_k x_0^k \int _0^1 \frac {1 - y^k} {1 - y} dy
$$
The integral factor on the last step is now merely Eulers integral representation for the harmonic numbers, which is easily proven by the simple fact that $\frac {1 - y^k} {1 - y} = \sum _{i = 0}^{k - 1} y^i$. Therefore:
$$
\sum _{k \geq 0} a_k x_0^k H_k = \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy
$$

View File

@@ -0,0 +1,34 @@
#Math #Algebra
# Multiplying an Adjacency Matrix by Itself
Consider adjacency matrix $M$ with $n$ nodes. We know that if there is a path between $a$ and $b$, $M_{a, b} = 1$, if not $M_{a, b} = 0$. Now consider all possible nodes $c$ to choose a path $a \to c \to b$. We see that for $c$ to be a possible node $M_{a, c} = 1$ and $M_{c, b} = 1$.
Now observe the definition of multiplication of matrices. We know:
$$
MM_{a, b} = \sum _{c = 1}^n M_{a, c} M_{i, c}
$$
For each path $c$, each term summed only equals one when $M_{a, c}$ and $M_{c, b}$ are both $1$, or when path $a \to c \to b$ exists. Therefore, the number of paths of length $2$ between $a$ and $b$ is $MM_{a, b}$.
# Taking an Adjacency Matrix to $l$
Now consider a matrix $N$ where there are $N_{a, b}$ paths of length $l$ between $a$ and $b$. Have $M$ be the adjacency matrix for the graph $N$ models. Now say we want all paths of length $l + 1$ that look like $a \to … \to c \to b$. We know the number of paths $a \to … \to c$ of length $l$ is $N_{a, c}$. Similarly, we know there are $M_{c, b}$ paths that satisfy $c \to b$. Therefore, the number of paths that satisfy $a \to … \to c \to b$ is:
$$
N_{a, c}M_{c, b}
$$
Choosing any node $c$ gives the number of paths of length $l + 1$ satisfying $a \to b$:
$$
\sum _{c = 1}^n N_{a, c}M_{c, b} \\
NM_{a, b}
$$
Since $M_{a, b}$ obviously is the amount of paths of length $1$ between $a$ and $b$, by induction, the amount of paths of length $l$ between $a$ and $b$ is:
$$
M^l_{a, b}
$$

View File

@@ -0,0 +1,33 @@
#Math #Calculus
Limit to solve:
$$
\lim _{x\to 0} \frac {e^x-1} {x}
$$
Let $t = e^x - 1$
$$
\lim _{t\to 0} \frac {t} {\ln(t+1)}
$$
$$
\lim _{t\to 0} \frac {1} {\frac {1} {t} \ln(1+t)}
$$
Inverse power log rule
$$
\lim _{t\to 0} \frac {1} {\ln(1+t)^{\frac {1} {t}}}
$$
Definition of e
$$
\frac {1} {\ln e}
$$
$$
1
$$

17
Basic Category Theory.md Normal file
View File

@@ -0,0 +1,17 @@
#Math #CT
# Categories
Categories contain:
- A collection of **objects**
- A collection of **morphisms** (also called **arrows**) connecting objects denoted by $f: S \to T$, where $f$ is the **morphism**, $S$ is the **source**, and $T$ is the **target**
- Note: $f: A \to B$ and $g: A \to B$ **DOES NOT IMPLY** $f = g$
- Formally this can also be expressed as a relation between a collection of objects and a collection of morphisms
- Morphisms have a notion of **composition**, that being if $f: A \to B$, $g: B \to C$, then $g \circ f: A \to C$
There are three rules for categories:
- **Associativity:** For morphisms $a$, $b$, and $c$, $(a \circ b) \circ c = a \circ (b \circ c)$
- **Closed composition:** If for morphisms $a$ and $b$, $a \circ b$ exists, then there must be morphism $c = a \circ b$
- **Identity morphisms:** For every object $A$ in a category, there must be an identity morphism $\text{id}_A: A \to A$

51
Bezout’s Identity.md Normal file
View File

@@ -0,0 +1,51 @@
#Math #NT
# Statement
Let $x \in \mathbb{Z}$, $y \in \mathbb{Z}$, $x \neq 0$, $y \neq 0$, and $g = \gcd(x, y)$. Bezout's Identity states that $\alpha \in \mathbb{Z}$ and $\beta \in \mathbb{Z}$ exists when:
$$
\alpha x + \beta y = g
$$
Furthermore, $g$ is the least positive integer able to be expressed in this form.
# Proof
## First Statement
Let $x = gx_1$ and $y = gy_1$, and notice $\gcd(x_1, y_1) = 1$ and $\operatorname{lcm} (x_1, y_1) = x_1 y_1$.
Since this is true, the smallest integer $\alpha$ for $\alpha x_1 \equiv 0 \mod y$ is $a = y_1$.
For all integers $0 \leq a, b < y_1$, $ax_1 \not\equiv bx_1 \mod y$. (If not, we get $|b - a| > y_1$, which is contradictory). Thus, by pigeonhole principle, there exists $\alpha$ such that $\alpha x_1 \equiv 1 \mod y_1$.
Therefore, there is an $\alpha$ such that $ax_1 - 1 \equiv 0 \mod y_1$, and by extension, there exists an integer $\beta$ such that:
$$
\alpha x_1 - 1 = -\beta y_1 \\
\alpha x_1 + \beta y_1 = 1
$$
By multiplying by $g$:
$$
\alpha x + \beta y = g
$$
## Second Statement
To prove $g$ is minimum, lets consider another positive integer $g\prime$:
$$
\alpha\prime x + \beta\prime y = g\prime
$$
Since all values are a multiple of $g$:
$$
0 \equiv \alpha \prime x + \beta \prime x \mod g \\
0 \equiv g\prime \mod g
$$
Since $g$ and $g\prime$ are positive integers, $g\prime \geq g$.

View File

@@ -0,0 +1,21 @@
#Math #Probability
# Problem
Why does n choose k, or $\frac{n!}{k!(n-k)!}$ generate the coefficient for $x^ky^{n-k}$ in $(x+y)^n$?
# Explanation
Lets see what happens when expanding $(x+y)^4$:
$$
(x+y)^4\\
=(x+y)(x+y)(x+y)(x+y)\\
=xxxx+\\
yxxx+xyxx+xxyx+xxxy+\\
yyxx+yxyx+yxxy+xyyx+xyxy+xxyy+\\
xyyy+yxyy+yyxy+yyyx+\\
yyyy
$$
When expanding, notice the number of terms with k of x (and likewise 4-k of y) is the number of combinations of 4 choose k, as you choose k slots to put k xs in out of 4 slots. Therefore, $(x+y)^n={n \choose 0}x^0y^n+{n \choose 1}x^1y^{n-1}...+{n \choose n-1}x^{n-1}y^1+{n \choose n}x^ny^0$

85
Bohr Mollerup Theorem.md Normal file
View File

@@ -0,0 +1,85 @@
#Math #Calculus
# Intro
The Gamma function $\Gamma(x)$ is a way to extend the factorial function, where $\Gamma(n + 1) = n!$. This gives us two conditions defining $\Gamma (x)$:
$$
\Gamma(1) = 1 \\
\Gamma(x + 1) = x \Gamma (x)
$$
However, by adding a third condition stating $\Gamma (x)$ is logarithimically convex ($\log \circ \space \Gamma$ is convex), we can prove that $\Gamma (x)$ is unique!
# Proof
Let $G$ be a function with the properties above. Since $G(x + 1) = xG(x)$, we can define any $G(x + n)$, where $n \in \mathbb{N}$ as:
$$
G(x + n) = G(x)\prod _{i = 0}^{n - 1}(x + i)
$$
This means that it is sufficient to define $G(x)$ on $x \in (0, 1]$ for a unique $G(x)$.
Let $S(x_1, x_2)$ be defined as $\frac {\log (\Gamma(x_2)) - \log (\Gamma(x_1))} {x_2 - x_1}$. Observe that by log-concavity, for all $0 \lt x \leq 1$ and $n \in \mathbb{N}$:
$$
S(n - 1, n) \leq S(n, n +x) \leq S(n, n + 1) \\
\log (G(n))) - \log (G(n-1)) \leq \frac {\log (G(n + x)) - \log (G(n))} {x} \leq \log (G(n + 1)) - \log (G(n)) \\
\log ((n - 1)!) - \log ((n-2)!) \leq \frac {\log (G(x + n)) - \log ((n - 1)!)} {x} \leq \log (n!) - \log ((n - 1)!) \\
\log(n - 1) \leq \frac {\log (\frac{G(x + n)}{(n - 1)!})} {x} \leq \log(n) \\
\log((n - 1)^x) \leq \log (\frac {G(x + n)}{(n - 1)!})\leq \log(n^x) \\
$$
Raising to the $n$:
$$
(n - 1)^x \leq \frac {G(x + n)}{(n - 1)!}\leq n^x \\
(n - 1)^x(n - 1)! \leq G(x + n)\leq n^x(n - 1)! \\
$$
Using the above work to expand $G(x + n)$:
$$
\frac{(n - 1)^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \leq G(x) \leq \frac{n^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \\
\frac{(n - 1)^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \leq G(x) \leq \frac{n^xn!} {\prod _{i = 0}^n(x + i)}(\frac {n + x} n) \\
$$
Of course, taking the limit as $n$ goes to infinity on both sides by brute force will produce the value of $G(x)$, however I will present a more elegant solution. Notice we can take the inequalities separately, resulting in:
$$
\frac{(n_1 - 1)^x(n_1 - 1)!} {\prod _{i = 0}^{n_1 - 1}(x + i)} \leq G(x)\\
G(x) \leq \frac{n_2^xn_2!} {\prod _{i = 0}^{n_2}(x + i)}(\frac {n_2 + x} {n_2}) \\
$$
This shows that no matter $n_1$ and $n_2$, the equality still holds!
Now we can sub in $n_1 = n + 1$, $n_2 = n$, to get:
$$
\frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} (\frac {n + x} n)\\
$$
Taking a limit to infinity on both sides:
$$
\lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} (\frac {n + x} n)\\
\lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \\
$$
<aside>
<img src="https://www.notion.so/icons/star_yellow.svg" alt="https://www.notion.so/icons/star_yellow.svg" width="40px" />
Therefore there is only a singular function satisfying $G(x)$, as it is squeezed on $[0, 1)$.
</aside>
# Exercise to the Reader
Prove that the definition:
$$
\Gamma(n) = \int _{0}^{\infty} x^{n - 1} e^{-x} dx
$$
is valid.

245
Central Limit Theorem.md Normal file
View File

@@ -0,0 +1,245 @@
#Math #Probability
# The Central Limit Theorem
Let us sum $n$ instances from an i.i.d (independent and identical distribution) with defined first and second moments (mean and variance). Center the distribution on $0$ and scale it by its standard deviation. As $n$ goes to infinity, the distribution of that variable goes toward
$$
\frac 1 {\sqrt 2 \pi} e^{- \frac {x^2} 2}
$$
or the standard normal distribution
## Mathematical Definition
Let Y be the mean of a sequence of n i.i.ds
$$
Y = \frac 1 n \sum _{i=1}^{n} X_i
$$
Let $\mu=E(X_i)$, the expected value of $X$, and $\sigma = \sqrt {Var(X)}$, the standard deviation of $X$
Calculate the expected value of Y, $E(Y)$, and the variance, $Var(Y)$:
$$
E(Y) \\
= E(\frac 1 n \sum _{i=1}^{n} X_i) \\
= \frac 1 n \sum _{i=1}^{n} E(X_i) \\
= \frac 1 n \sum _{i=1}^{n} \mu \\
= \frac {n \mu} {n} \\
= \mu
$$
$$
Var(Y) \\
= Var(\frac 1 n \sum _{i=1}^n X_i) \\
= \frac 1 {n^2} \sum _{i=1}^n Var(X_i) \\
= \frac \sigma n
$$
Let $Y^*$ be centered by $E(Y)$ and scaled by it's standard deviation, $\sqrt {Var(Y)}$
$$
Y^* \\ = \frac {Y - E(Y)} {\sqrt {Var(Y)}} \\ = \frac {Y - \mu} {\sqrt {\frac {\sigma^2} {n}}} \\ = \frac {\sqrt n (Y - \mu)} \sigma \\= \frac {\sqrt n (\frac 1 n \sum _{i=0}^n X_i - \mu)} \sigma \\ = \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma
$$
The CLT states
$$
Y^* \overset d \to N(0, 1)
$$
Or $Y^*$ converges in distribution to the standard normal distribution with a mean of 0 and a standard deviation of 1
# Proof
## A Change in Variables
Let $S$ be the sum of our sequence of n i.i.ds
$$
S = \sum _{i=1}^{n} X_i
$$
Lets calculate $E(S)$ and $Var(S)$
$$
E(S) \\
=E(\sum _{i=1}^n X_i) \\
=\sum _{i=1}^n E(X_i) \\
=\sum _{i=1}^n \mu \\
= n\mu
$$
$$
Var(S) \\
=Var(\sum _{i=1}^n X_i) \\
=\sum _{i=1}^n Var(X_i) \\
=\sum _{i=1}^n \sigma^2 \\
=n\sigma^2
$$
Center $S$ by $E(S)$ and scale it by $\sqrt {Var(S)}$ for $S^*$
$$
S^* \\
= \frac {S - E(S)} {\sqrt {Var(S)}} \\
= \frac {S - n\mu} {\sqrt {n\sigma^2}} \\
= \frac {S - n\mu} {\sqrt {n}\sigma} \\
= \frac {\frac 1 {\sqrt n} (S-n\mu)} { \sigma} \\
= \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma
$$
From the above, $Y^*=S^*$. In the proof, we will use $S^*$, as it is easier to manipulate.
## MGFs
An MGF is a function where
$$
M_V(t) = E(e^{tV})
$$
where $V$ is a random variable
(reminder for me to do another notion on this)
### Properties of MGFs
Property 1:
If
$$
C=A+B
$$
Then
$$
M_C(t) \\
= E(e^{tC}) \\
= E(e^{ta + tb}) \\
= E(e^{ta}e^{tb}) \\
= E(e^{ta}) + E(e^{tb}) \\
= M_A(t) + M_B(t)
$$
Property 2:
$$
M_V^{(r)}(0) = E(V^r)
$$
The $r$ derivative of $M_V$ gives the $r$ moment of $V$
Property 3:
Let $A$ be a sequence of random variables with MGFs of $A_1$, $A_2$… $A_n$
If
$$
M_{A_n}(t) \to M_B(t)
$$
Then
$$
A \overset d \to B
$$
### MGF of a Normal Distribution
Let a random variable derived from a standard normal distribution be Z
$$
Z \sim N(0, 1)
$$
$$
M_z(t) \\
= E(e^{xt}) \\
= \int _{-\infty}^{\infty} e^{xt} \frac 1 {\sqrt {2\pi}} e^{-\frac {x^2} 2} dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{tx-\frac 1 2 x^2} dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx )} dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx + t ) + \frac 1 2 t^2 } dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 + \frac 1 2 t^2 } dx \\
= e ^ {\frac 1 2 t^2} \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 } dx \\
= e ^ {\frac {t^2} 2}
$$
## The Argument
To prove the CLT, we need to prove that $S^*$ converges to $N(0, 1)$ as $n \to \infty$. Our approach will be to prove that the MGF of $N(0, 1)$ converges to the distribution of $S^*$ as $n \to \infty$.
$$
S^* \\
= \frac {S - E(S)} {\sqrt {Var(S)}} \\
= \frac {S - n\mu} {\sqrt {n \sigma^2}} \\
= \frac {\sum _{i=1}^{n} X_i - n\mu} {\sqrt n \sigma} \\
= \sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma}
$$
Start manipulating MGF of $S^*$:
$$
M_{S^*}(t) \\
= E(e^{tS^*}) \\
= E(e^{t(\sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma})}) \\
= E(e^{t(\frac {(X-\mu)} {\sqrt n \sigma})})^n \\
= (M_{\frac {(X-\mu)} {\sqrt n \sigma}}(t))^n \\
=(M_{(X - \mu)} (\frac t {\sqrt n \sigma })^n
$$
Expand out Taylor series for $(M_{(x-\mu)}(\frac t {\sqrt n \sigma}))^n$ (note $O(t^3)$ means order $t^3$ and above, and tends to zero as $n$ goes to $\infty$ ):
$$
M_{(X-\mu)}(\frac t {\sqrt n \sigma}) \\
= (M_{(X-\mu)}(0)) + (\frac {M_{(X-\mu)}\prime(0)} {1!})(\frac t {\sqrt n \sigma}) + (\frac {M_{(X-\mu)}\prime\prime(0)} {2!})(\frac t {\sqrt n \sigma})^2 + (\frac {M_{(X-\mu)}\prime\prime\prime(0)} {1!})(\frac t {\sqrt n \sigma})^3 + ...\\
= 1 + (\frac {t} {\sqrt n \sigma})E(X-\mu) + (\frac {t^2} {2 n \sigma^2})E((X-\mu)^2) + (\frac {t^3} {6n ^ {\frac 3 2} \sigma ^ 3})E((X-\mu)^3) + ... \\
= 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2}
{2n \sigma^2})E((X-\mu)^2) + O(t^3) \\
\approx 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2}
{2n \sigma^2})E((X-\mu)^2)
$$
Remember $E(X-\mu) = 0$ and $E((X-\mu)^2) = \sigma^2$
$$
= 1 + (\frac t {\sqrt n \sigma})(0) + (\frac {t^2} {2n \sigma^2})(\sigma ^ 2) \\
= 1 + \frac {t^2} {2n}
$$
Solve for $M_{S^*} (t)$:
$$
M_{S^*}(t) = (1 + \frac {t^2} {2n})^n
$$
Solve $M_{S^*} (t)$ for $\lim _{n \to \infty}$:
$$
\lim _{n \to \infty} M_{S^*}(t) \\
= \lim _{n \to \infty} (1 + \frac {t^2} {2n})^n \\
= \lim _{n \to \infty} (1 + \frac 1 {(\frac {2n} {t^2})})^{\frac {t^2} 2 (\frac {2n} {t^2})} \\
= e^{\frac {t^2} 2}
$$
Since $\lim _{n \to \infty} M_{S^*} (t) \to M_Z(t)$, $\lim _{n \to \infty}S^* \overset d \to N(0, 1)$. Therefore:
$$
Y^* \overset d \to N(0, 1)
$$
proving the Central Limit Theorem
## Summary of the Argument
$$
Y^* = S^* \\
\lim _{n \to \infty} M_{S^*}(t) \to M_Z (t) \\
\lim _{n \to \infty} S^* \to N(0, 1) \\
\lim _{n \to \infty} Y^* \to N(0, 1) \\
$$

View File

@@ -0,0 +1,98 @@
#Math #NT
# Theorem
Say $m$ and $n$ are two coprime positive integers. The Chicken McNugget Theorem states the highest number that can't be expressed by $am + bn$, $a \in \mathbb{Z}$, $b \in \mathbb{Z}$, and $a, b \geq 0$ is:
$$
mn - m - n
$$
# Proof
Let a purchasable number relative to $m$ and $n$ be able to be represented by
$$
am + bn
$$
Where $a$ and $b$ are two non negative integers
## Lemma 1
Let $A_N \subset \mathbb{Z} \times \mathbb{Z}$ and $A_N$ be all $(x, y)$ such that for $m \in \mathbb{Z}$ and $n \in \mathbb{Z}$, $xm + yn. = N$. For $(x, y) \in A_N$:
$$
A_N = \{(x + kn, y - km): k \in \mathbb{Z}\}
$$
### Proof
By Bezout's Lemma, there exists integers $x\prime$ and $y\prime$ such that $x\prime m + y\prime n = 1$. Then, $Nx\prime m + Ny\prime n = N$. Thus, $A_N$ is nonempty.
Each addition of $kn$ to $x$ adds $kmn$ to $N$, and each subtraction of $km$ from $y$ subtracts $kmn$ from $N$, so all these values are in $A_N$.
To prove these are the only solutions, let $(x_1, y_1) \in A_N$ and $(x_2, y_2) \in A_N$. This means:
$$
mx_1 + ny_1 = mx_2 + ny_2 \\
m(x_1 - x_2) = n(y_2 - y_1) \\
$$
Since $m$ and $n$ are coprime, and $m$ divides $n(y_2 - y_1)$:
$$
y_2 - y_1 \equiv 0 \mod m \\
y_2 \equiv y_1 \mod m
$$
Similarly:
$$
x_2 \equiv x_1 \mod n
$$
Let $k_1, k_2 \in \mathbb{Z}$ such that:
$$
x_2 - x_1 = k_1n \\
y_1 - y_2 = k_2m \\
$$
By multiplying by $m$ and $n$ respectively, we get $k_1 = -k_2$, proving the lemma.
## Lemma 2
For $N \in \mathbb{Z}$, there is a unique $(a_N, b_N) \in \mathbb{Z} \times \{0, 1, 2… m - 1\}$ such that $a_Nm + b_Nn = N$.
## Proof
There is only one possible $k$ for
$N$ is purchasable if and only if $a_N \geq 0$.
## Lemma 3
$$
0 \leq y - km \leq m - 1
$$
### Proof
If $a_N \geq 0$, we can pick $(a_N, b_N)$ so $N$ is purchasable. If $a_N < 0$, $a_N + kn < 0$ when $k \leq 0$, or $b_N + km < 0$ for $k > 0$.
## Putting it Together
Therefore, the set of non purchasable integers is:
$$
\{xm + yn : x<0, 0 \leq y \leq m -1\}
$$
To maximize this set, we chose $x = -1$ and $y = m - 1$:
$$
-m + (m - 1)n \\
mn - m - n
$$

View File

@@ -0,0 +1,91 @@
#Math #NT
For the proof, let p and q be coprime
# Rearrangement
$$
x=a \: mod \: p\\
x=b \: mod \: q
$$
Subtract a from both equations
$$
x-a=0 \: mod \: p\\
x-a=b-a \: mod \: q
$$
# The Underlying Problem
Let m be an integer from 0 to q-1 (inclusive), and r be an integer from 0 to q-1 (inclusive)
$$
mp=r \: mod \: q
$$
There are q possible values of m, and q possible values of r.
Since p and q are coprime, the remainders cannot repeat until after m > q-1
Therefore, there is a unique value of m to produce any remainder r in the above equation.
# Putting it all Together
If we look at the last equation in *Rearrangement*, we see it matches the equation in *The Underlying Problem*, where b-a corresponds to r, and x-a corresponds to mp.
So, we can see there is one unique solution for x in the interval of 0 to pq-1 (inclusive)
We can extend this by saying there will be a solution pq larger than another solution, making the solutions expressible via mod.
# The Underlying Problem (but rigour)
<aside>
hi LH, this was actually one of the theorems that got me into compo math (Raina can actually vouch). as such the above section is lwk bad, so here is an update since you asked
(oh jeez I did not know how latex worked back then sorry)
</aside>
Again start with
$$
mp \equiv r \mod q
$$
Suppose $m_1$ and $m_2$ are two $m$ that give the same $r$. Then $pm_1 \equiv pm_2 \mod q$. By the cancellation law $m_1 \equiv m_2 \mod q$, since $\gcd(p, q) = 1$.
### Cancellation Law Proof (Brownie Points)
$$
pm_1 - pm_2 = p(m_1 - m_2)
$$
Know $q$ divides $pm_1 - pm_2$ since they are both the same mod $q$, therefore $q$ divides the RHS. By Euclids Lemma $q$ must divide $m_1 - m_2$, meaning $m_1 \equiv m_2 \mod q$.
# Final Theorem
Let p and q be coprime. If:
then:
$$
x \: rem \: pq
$$
exists and is unique.
# Notes
$$
x \: = 0 \: mod \: y\\
x \: rem \: y = 0
$$
both mean x is divisible by y.
$$
x=a \: mod \: p\\
x=b \: mod \: q
$$

33
Choosing Stuff.md Normal file
View File

@@ -0,0 +1,33 @@
#Math #Probability
# Problem
Given $m$ items of one type and $n$ items of another type, what is the probability of choosing $l$ items of type one and $o$ items of type two if you pick $l + o$ items?
# Solution
Total ways to choose the items not considering types:
$$
{m + n} \choose {l + o}
$$
Total ways to choose $l$ items of type one:
$$
m \choose l
$$
Total ways to choose $o$ items of type two:
$$
n \choose o
$$
Multiply the ways to choose both items to get the number of ways to choose $l$ items of type one and $o$ items of type two, divide by total number of combinations:
$$
\frac {{m \choose l} {n \choose o}} {{m + n} \choose {l + o}}
$$

View File

@@ -0,0 +1,40 @@
#Math #Probability
# Conditional Probability
Conditional probability, or the probability of $A$ given $B$ is:
$$
P(A|B)
$$
Let's start with the probability of $P(A \cup B)$. We know that when $P(A | B)$, $B$ is given to be true. Therefore, we must divide the probably of $P(A \cup B)$ by $P(B)$.
$$
P(A | B) = \frac {P(A \cup B)} {P(B)}
$$
This defines $P(A | B)$ for events. When $P(A | B) = P(A)$, $A$ and $B$ are independent.
# Bayes Theorem
Let's start with the definitions of conditional probability:
$$
P(A | B) = \frac {P(A \cup B)} {P(B)} \\
P(B | A) = \frac {P(A \cup B)} {P(A)}
$$
Rearrange the second equation to define $P(A \cup B)$:
$$
P(A \cup B) = P(A) P(B | A)
$$
Now substitute that equation into the first equation:
$$
P(A | B) = \frac {P(A) P(B | A)} {P(B)}
$$
The above equation is Bayes' Theorem for events.

45
Convolutions.md Normal file
View File

@@ -0,0 +1,45 @@
#Math #Probability
# Discrete Case
Lets create a function expressing the probability two functions results have a sum of $s$.
$$
\sum _{x = -\infty}^{\infty} f(x)g(s-x)
$$
Let's unpack this formula. The inside of the sum finds the probability of a single case where $f$ and $g$ adds to $s$. By using a summation, we can run through every possible case that this happens.
This operation is called a discrete convolution. Convolutions are notated as
$$
[f * g](s)
$$
# Continuous Case
Extending the previous equation over to a continuous function, we can attain a definition like this:
$$
[f * g](s) = \int _{-\infty}^{\infty} f(x)g(s-x) dx
$$
Naturally, we'd expect this to be a probably density function of $f + g$. This is from the same effect as the discrete convolution, except we talk about this for an infinitely small point and probability densities.
# Summary
Convolutions return the probability or probability density of adding two functions together (this depends on the type of function you use).
They are defined by:
Discrete:
$$
[f * g](s) = \sum _{x = -\infty}^{\infty} f(x)g(s-x)
$$
Continuous:
$$
[f * g](s) = \int _{-\infty}^{\infty} f(x)g(s-x) dx
$$

45
Dearrangement.md Normal file
View File

@@ -0,0 +1,45 @@
#Math #Probability
# Problem
How many ways are there to arrange a set of $n$ distinct elements such that no element is in it's original position?
# Solution
The way to arrange the set without consideration for position is:
$$
n!
$$
Now accounting for the values that have one element in it's original position:
$$
n! - {n\choose 1}(n - 1)!
$$
However, we subtracted arrangements with two elements in their original position twice:
$$
n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)!
$$
Now, we readded arrangements with three elements in their original position:
$$
n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! - {n \choose 3}(n - 3)!
$$
This pattern continues by PIE, giving us:
$$
n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! - {n \choose 3}(n - 3)! ... + (-1)^n{n \choose n}(n - n)!
$$
Since ${n \choose k}(n - k)! = \frac {n!} {k!}$, we can rewrite as:
$$
\frac {n!} {0!} - \frac {n!} {1!} + \frac {n!} {2!} ... + (-1)^n\frac {n!} {n!} \\
= \sum _{k = 0}^n (-1)^k \frac {n!} {k!} \\
= n! \sum _{k = 0}^n \frac {(-1)^k} {k!}
$$

91
Derivatives.md Normal file
View File

@@ -0,0 +1,91 @@
#Calculus #Math
# Intuition & Definition
How can instant rate of change be defined at a point?
Call our function of choice $y$:
Slope of $y$ between $x_1$ and $x_2$:
$$
\frac {y_1 - y_2} {x_1 - x_2}
$$
$$
= \frac {y(x_1) - y(x_2)} {x_1 - x_2}
$$
However, $x_1 \neq x_2$ due to division by $0$.
## Definitions
Avoid division by $0$ via using a limit such that $x_1 \to x_2$:
$$
\frac {dy} {dx} = \lim_{x_1 \to x_2} \frac {y(x_1) - y(x_2)} {x_1 - x_2}
$$
Changing variables:
$$
\frac {dy} {dx} = \lim_{a \to x} \frac {y(a) - y(x)} {a - x}
$$
Define $a = \lim _{\Delta x \to 0}(x + \Delta x)$:
$$
\frac {dy} {dx} = \lim_{\Delta x \to 0} \frac {y(x + \Delta x) - y(x)} {\Delta x}
$$
# Derivative Rules
## Constant Rule
When $y = a$ and $a$ is constant:
$$
\frac {dy} {dx}
$$
$$
= \lim_{\Delta x \to 0} \frac {a - a} {\Delta x}
$$
$$
= \lim_{\Delta x \to 0} \frac {0} {\Delta x}
$$
$$
= 0
$$
## Sum and Difference Rule
$$
\frac {df} {dx} + \frac {dg} {dx}
$$
$$
= \lim_{\Delta x \to 0} \frac {f(x + \Delta x) - f(x)} {\Delta x} + \frac {g(x + \Delta x) - g(x)} {\Delta x}
$$
$$
= \lim_{\Delta x \to 0} \frac {[f(x + \Delta x) + g(x + \Delta x)] - [f(x) + g(x)]}{\Delta x}
$$
$$
= \frac d {dx} (f + g)
$$
## Power Rule
> **Note:** This proof of power rule only extends to $n \in \mathbb{N}$. Power rule can be extended to $n \in \mathbb{Z}$ through the use of the derivative of $\ln$, but this article does not cover such a proof as of now.
$$
\frac {d} {dx} x^n
$$
$$
= \lim_{\Delta x \to 0} \frac {(x + \Delta x)^n - x^n} {\Delta x}
$$
Use a binomial expansion:
$$
= \lim_{\Delta x \to 0} \frac {\sum_{i = 0}^n {n \choose i}x^i{\Delta x}^{n - i} - x^n} {\Delta x}
$$
Take out last term in sum:
$$
= \lim_{\Delta x \to 0} \frac {x^n + \sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i} - x^n} {\Delta x}
$$
$$
= \lim_{\Delta x \to 0} \frac {\sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i}} {\Delta x}
$$
$$
= \lim_{\Delta x \to 0} \sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i - 1}
$$
Bring limit inside sum:
$$
= \sum_{i = 0}^{n - 1} \left[{n \choose i}x^i \lim_{\Delta x \to 0} {\Delta x}^{n - i - 1}\right]
$$
For $i < n - 1$, $\lim {\Delta x \to 0} \Delta x^{n - i - 1} = 0$, so only the case where $i = n - 1$ matters:
$$
= {n \choose {n - 1}} x^{n - 1}
$$
$$
= nx^{n - 1}
$$
>**Therefore:**
>$$ \frac d {dx} x^n = nx^{n - 1} $$

View File

@@ -0,0 +1,34 @@
#Math #Calculus
# Extending the Factorial Function
We know $n!$ has a restricted domain of $n \in \mathbb{N}$, but we want to extend this function to $n \in \mathbb{R}$. To do this, we define two basic properties for the gamma function:
$$
n\Gamma(n) = \Gamma(n + 1) \\
\Gamma(n + 1) = n!, \space n\in \mathbb{N}
$$
# Derivation
We know repeated differentiation can generate a factorial function, so we start by differentiating:
$$
\int _{0}^{\infty} e^{-ax} dx = \frac 1 a
$$
**Lebeniz Integral Rule** allows us to differentiate inside the integral, so by repeated differentiation with respect to $a$ and cancelling out the negative sign we get:
$$
\int _{0}^{\infty} xe^{-ax} dx = \frac 1 {a^2} \\
\int _{0}^{\infty} x^2e^{-ax} dx = \frac 2 {a^3} \\
\int _{0}^{\infty} x^ne^{-ax} dx = \frac {n!} {a^{n + 1}} \\
$$
Plugging $a = 1$ we get:
$$
\Gamma(n) = \int _{0}^{\infty} x^{n - 1} e^{-x} dx
$$
Plugging the definition into the above properties should affirm that this defines the gamma function.

View File

@@ -0,0 +1,55 @@
#Math #Calculus
# Definition
When
$$
\lim _{x \to c} f(x) = L
$$
For $\epsilon > 0$ and $\delta > 0$, there is a value $\delta$ for every value of $\epsilon$ such that
$$
0 < |x - c| < \delta\\
0 < |f(x) - L| < \epsilon\\
$$
# Proving a Limit
Lets prove:
$$
\lim _{h \to 0} \frac {(x + h)^2 - x^2} h = 2x
$$
Let:
$$
0 < |\frac {(x + h)^2 - x^2} h - 2x| < \epsilon \\
0 < |\frac {(x + h)^2 - x^2} h - 2x| < \epsilon \\0 < |\frac {(x^2 + 2xh + h^2 - x^2)} h - 2x| < \epsilon \\0 < |\frac {2xh + h^2} h - 2x| < \epsilon \\
$$
Remember $\epsilon > 0$:
$$
0 < |2x + h - 2x| < \epsilon \\
0 < |h| < \epsilon
$$
We have to prove for every $\epsilon$:
$$
0 < |h - 0| < \delta \\
0 < |h| < \delta
$$
These two inequalities are the same, so they are easily satisfied just by setting:
$$
\delta = \epsilon
$$
# Graphical Explanation
[https://www.desmos.com/calculator/tucchymbrq](https://www.desmos.com/calculator/tucchymbrq)

View File

@@ -0,0 +1,57 @@
#Math #Trig
# Euler's Formula
Euler's formula states:
$$
e^{i \theta} = i\sin \theta + \cos \theta
$$
## Proof
$$
\frac d {d \theta} \frac {i \sin \theta + \cos \theta} {e^{i \theta}} \\
= e^{-i\theta}(i \sin \theta + \cos \theta) \\
= (e^{-i\theta})(i \sin \theta + \cos \theta)\prime + (e^{-i\theta}) \prime (i \sin \theta + \cos \theta) \\
= (e^{-i\theta})(i \cos \theta - \sin \theta) - i(e^{-i\theta})(i \sin \theta + \cos \theta) \\
= (e^{-i\theta})(i \cos \theta - \sin \theta) - (e^{-i\theta})(i \cos \theta - \sin \theta) \\
= 0
$$
Therefore $\frac {i \sin \theta + \cos \theta} {e^{i \theta}}$ is a constant. Plug in $\theta = 0$, to get $\frac {i \sin \theta + \cos \theta} {e^{i \theta}} = 1$. Multiply both sides by $e^{i\theta}$ to get
$$
e^{i \theta} = i\sin \theta + \cos \theta
$$
## Euler's Identity
Plug $\theta = π$ into Euler's Formula
$$
e^{i \pi} = i\sin \pi + \cos \pi \\
e^{i \pi} = -1
$$
# Trig Functions Redefined
Sine:
$$
e^{i \theta} = i\sin \theta + \cos \theta \\
-e^{-i \theta} = -i\sin -\theta - \cos -\theta \\
-e^{-i \theta} = i\sin \theta - \cos \theta \\
e^{i\theta} - e^{-i\theta} = 2i \sin \theta \\
\sin \theta = \frac {e^{i\theta} - e^{-i\theta}} {2i}
$$
Cosine:
$$
e^{i \theta} = i\sin \theta + \cos \theta \\
e^{-i \theta} = i\sin -\theta + \cos -\theta \\
e^{-i \theta} = -i\sin \theta + \cos \theta \\
e^{i\theta} + e^{-i \theta} = 2\cos \theta \\
\cos \theta = \frac {e^{i\theta} + e^{-i \theta}} 2
$$

View File

@@ -0,0 +1,22 @@
#Math #NT
# Fermets Little Theorem
If $p$ is a prime integer:
$$
a^{p - 1} \equiv 1 \mod p \\
a^p \equiv a \mod p
$$
$$
a^{p - 1} \equiv 1 \mod p \\
a^p \equiv a \mod p
$$
# Proof
Let $p$ be a prime integer. Say a necklace has $p$ beads and $a$ possible colors per bread. Except for a necklace with only one color, each combination of necklace colors has $p$ permutations. Therefore:
$$
a^p \equiv a \mod p
$$

31
Fermet Euler Theorem.md Normal file
View File

@@ -0,0 +1,31 @@
#Math #NT
# Theorem
Let $a$ and $m$ be coprime numbers.
$$
a^{\phi(m)} \equiv 1 \mod m
$$
This is a generalization of Fermet's Little Theorem, as $m$ is a prime number in Fermets Little Theorem.
# Proof
Let:
$$
A = \{p_1, p_2, p_3,... p_{\phi(m)} \} \mod m \\
B = \{ap_1, ap_2, ap_3,...ap_{\phi(m)}\} \mod m
$$
Where $p_x$ is the $x$th number relatively prime to $m$.
Since $a$ and $p_x$ are coprime to $m$, $ap_x$ is coprime to $m$. Since each $p_x$ is unique, $ap_x$ is unique, which makes set $B$ the same as set $A$.
Since all terms are coprime to $m$:
$$
a^{\phi(m)} \prod _{k = 0}^{\phi(m)} p_k \equiv \prod _{k = 0}^{\phi(m)} p_k \mod m \\
a^{\phi(m)} \equiv 1 \mod m
$$

184
Fourier Series Proof.md Normal file
View File

@@ -0,0 +1,184 @@
#Math #Calculus
# Starting the Proof Off
The Taylor Series uses $x^n$ as building blocks for a function:
[[Taylor Series Proof]]
However, we can use $\sin (nx)$ and $\cos(nx)$ as well. This will be our starting point to derive the Fourier Series:
$$
f(x) = a_0\cos (0x) + b_0\sin(0x) + a_1\cos (x) + b_1\sin(x) + a_2\cos (2x) + b_2\sin(2x)... \\
f(x) = a_0 + \sum _{n = 1}^\infty (a_n\sin(nx) + b_n\cos(nx))
$$
This will be the basic equation we will use.
# Finding $a_0$
Lets integrate the equation on both sides, and bound by $[-\pi, \pi]$:
$$
\int _{-\pi}^\pi f(x) dx = \int _{-\pi}^\pi a_0 dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx) dx
$$
The first integral evaluates to $2\pi a_0$. Since the third integral is an odd function, it evaluates to $0$. The second integral can be expressed as:
$$
a_n \int _{-\pi}^\pi \cos(nx) dx \\
= \frac {a_n} n (\sin(n\pi) - \sin(-n\pi)) \\
= 0
$$
So now we have:
$$
2\pi a_0 = \int _{-\pi}^\pi f(x) dx \\
a_0 = \frac 1 {2\pi} \int _{-\pi}^\pi f(x) dx
$$
# Finding $a_n$
Lets multiply the entire equation by $\cos(mx)$, where $m \in \mathbb{Z}^+$ ($m$ is a positive integer):
$$
f(x)\cos(mx) = a_0\cos(mx) + \sum _{n = 1}^\infty a_n\cos(nx)\cos(mx) + b_n\sin(nx)\cos(mx)
$$
Now integrate on both sides, and bound by $[-\pi, \pi]$:
$$
\int _{-\pi}^\pi f(x)\cos(mx) dx = \int _{-\pi}^\pi a_0\cos(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx)\cos(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx)\cos(mx) dx
$$
We have three integrals on the right hand side to evaluate:
## First Integral
$$
\int _{-\pi}^\pi a_0 \cos(mx) dx \\
= \frac{a_0} m \sin(m\pi)- \frac{a_0} m \sin(-m\pi)
$$
Since $m\pi$ is always a multiple of $\pi$:
$$
=0
$$
## Second Integral
$$
\int _{-\pi}^\pi a_n\cos(nx)\cos(mx) dx
$$
Using $\cos$ addition formula:
$$
= \frac {a_0} 2 \int _{-\pi}^\pi \cos(nx + mx) + \cos(nx - mx) dx \\
= \frac {a_0} 2 (\int _{-\pi}^\pi \cos(nx + mx) dx + \int _{-\pi}^\pi \cos(nx - mx) dx) \\
= [\frac {a_0} 2 (\frac {\sin(nx + mx)} {n + m} + \frac {\sin(nx - mx)} {n - m})]_{-\pi}^{\pi} \\
$$
Here you will notice that this integral doesnt work for $n = m$. Well circle back to that later. For now, this is two odd functions being added together. Since the bounds are the negatives of one another:
$$
= 0
$$
Now, circling back to the extra case, where $n = m$:
$$
a_m\int _{-\pi}^\pi \cos^2(nx)dx \\
= a_m\int _{-\pi}^\pi \frac {1 + \cos(2x)} 2 dx \\
= a_m[\frac x 2 + \frac {\sin 2x} 4 ]_{-\pi}^\pi \\
= a_m[(\frac {\pi} 2 + \frac {\sin 2\pi} 4 ) - (\frac {-\pi} 2 + \frac {\sin -2\pi} 4 )] \\
= a_m\pi
$$
So, the second term in the right hand side evaluates to $a_m\pi$.
## Third Integral
$$
\int _{-\pi}^{\pi} \sin(nx)\cos(mx) dx \\
= \frac 1 2 \int _{-\pi}^{\pi} \sin(nx + mx) dx + \frac 1 2 \int _{-\pi}^\pi \sin(nx - mx) dx \\
= [-\frac 1 2(\frac {\cos(nx + mx)} {n + m} + \frac {\cos(nx - mx)} {n - m})]_{-\pi}^\pi \\
$$
Remember that $\cos x = -cos(x + \pi)$:
$$
= 0
$$
## Putting it Together
Now we have:
$$
\int _{-\pi}^\pi f(x)\cos(mx) dx = a_m\pi \\
\frac 1 \pi \int _{-\pi}^\pi f(x)\cos(mx) dx = a_m
$$
Note in this case $m$ and $n$ both represent any positive integer, and are therefore interchangeable:
$$
a_n = \frac 1 \pi \int _{-\pi}^\pi f(x)\cos(nx) dx \\
$$
# Finding $b_n$
Multiply the equation by $\sin mx$, where $m \in \mathbb{Z}^+$,integrate, and bound between $[-\pi, \pi]$:
$$
\int _{-\pi}^\pi f(x)\sin(mx) dx = \int _{-\pi}^\pi a_0\sin(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx)\sin(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx)\sin(mx) dx
$$
The first two terms are already covered, so lets focus on the final term.
## Last Integral
$$
\int _{-\pi}^\pi b_n\sin(nx)\sin(mx) dx \\
= b_n\int _{-\pi}^\pi \cos(nx - mx) - \cos(nx + mx) dx \\
= b_n [\frac {\sin(nx - mx)} {n - m} - \frac {\sin(nx + mx)} {n + m}]_{-\pi}^\pi
$$
Again, there is a special case where $n = m$. Remember $\sin \pi = 0$, so:
$$
= 0
$$
With the special case:
$$
b_m\int _{-\pi}^\pi \sin^2(mx) dx \\
= b_m\int _{-\pi}^\pi \frac {-\cos(2mx) + 1} 2 dx \\
= b_m[\frac 1 2 (x - \frac {\sin(2mx)} {2m})]_{-\pi}^\pi \\
= b_m\pi
$$
## Putting it Together
$$
b_m\pi = \int _{-\pi}^\pi f(x)\sin(mx) dx \\
b_m = \frac 1 \pi \int _{-\pi}^\pi f(x)\sin(mx) dx \\
b_n = \frac 1 \pi \int _{-\pi}^\pi f(x)\sin(nx) dx
$$
# Fourier Series
Using the above, lets express $f(x)$ as a Fourier Series:
$$
f(x) = \frac 1 {2\pi} \int _{-\pi}^\pi f(x) dx + \sum _{n = 1}^\infty \frac {\cos (nx)} \pi \int _{-\pi}^\pi f(x)\cos(nx) dx + \sum _{n = 1}^\infty \frac {\sin (nx)} \pi \int _{-\pi}^\pi f(x)\sin(nx) dx
$$
Note that this representation only works for when the function repeats from $[0, 2\pi]$. Using a similar proof, we can get:
$$
f(x) = \frac 1 P \int _{-\frac P 2}^{\frac P 2} f(x) dx + \sum _{n = 1}^\infty \frac {2 \cos (\frac {2\pi nx} P)} P \int _{-\frac P 2}^{\frac P 2} f(x)\cos(\frac {2\pi nx} P) dx + \sum _{n = 1}^\infty \frac {2 \sin (\frac {2\pi nx} P)} P \int _{-\frac P 2}^{\frac P 2} f(x)\sin(\frac {2\pi nx} P) dx
$$

View File

@@ -0,0 +1,26 @@
#Math #Calculus
# Proof
Let's express a Fourier Series as:
$$
v = \frac {2\pi nx} P \\
f(x) = \sum _{n = 0}^\infty A_n \cos v + B_n \sin v
$$
We can deduce:
$$
f(x) = \sum _{n = 0}^{\infty} \frac {A_n e^{iv} + A_n e^{-iv} - iB_n e^{iv} + iB_n e^{-iv}} 2 \\
= \sum _{n = 0}^{\infty} 0.5(A_n + iB_n)e^{-iv} + 0.5(A_n - iB_n)e^{iv} \\
= \sum _{n = 0}^{\infty} \frac {e^{-iv}} P \int _{-P/2}^{P/2} f(x) (\cos v + i\sin v) dx + \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x) (\cos -v + i\sin -v) dx \\
= \sum _{n = 0}^{\infty} \frac {e^{-iv}} P \int _{-P/2}^{P/2} f(x)e^{iv} dx + \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x)e^{-iv} dx \\
= \sum _{n = -\infty}^{\infty} \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x)e^{-iv} dx
$$
## Definitions
Definitions of $A_n$ and $B_n$:
[[Fourier Series Proof]]

27
Hockey Stick Identity.md Normal file
View File

@@ -0,0 +1,27 @@
#Math #Probability
# Statement
For $n \geq r$, $n, r \in \mathbb{N}$:
$$
\sum _{i = r}^n {i \choose r} = {n + 1 \choose r + 1}
$$
# Proof
Let us have a base case $n = r$:
$$
{r \choose r} = {r + 1 \choose r + 1} = 1
$$
Now suppose $\sum _{i = r}^n {i \choose r} = {n + 1 \choose r + 1}$ for a certain $n$:
$$
\sum _{i = r}^n {i \choose r} + {n + 1 \choose r} \\
= {n + 1 \choose r + 1} + {n + 1 \choose r} \\
= {n + 2 \choose r + 1}
$$
Since $n = r$ is true, and if a case is true for $n$, it is true for $n + 1$, this statement is true for all $n \geq r$.

46
Hyperbolic Trig.md Normal file
View File

@@ -0,0 +1,46 @@
#Math #Trig
# Definition
## Definition in terms of $e$
We define $\cosh$ and $\sinh$ to be the even and odd parts of $e^x$ respectively:
$$
\cosh x = \frac {e^x + e^{-x}} 2 \\
\sinh x = \frac {e^x - e^{-x}} 2
$$
Note this gives us:
$$
\sinh x + \cosh x = e^x
$$
similar to Euler's Formula for circular trig functions.
## Definition in terms of a hyperbola
[https://www.desmos.com/calculator/ixmjpfmukk](https://www.desmos.com/calculator/ixmjpfmukk)
Know that the geometric definition of $\cosh$ is that $B = \cosh 2b$, where $b$ is the blue area. To find $b$, we can use:
$$
b = \frac {B\sqrt{B^2 - 1}} 2 -\int _1^B \sqrt {x^2 - 1} dx \\
= \frac {B\sqrt{B^2 - 1}} 2 - \frac {B\sqrt {B^2 - 1} - \ln(B + \sqrt {B^2 - 1})} 2\\
= \frac {\ln(B + \sqrt {B^2 - 1})} 2
$$
Now let $a = 2b = -\ln(B + \sqrt {B^2 - 1})$. Now we can solve for $B$ in terms of $a$ to define $\cosh$:
$$
a = \ln(B + \sqrt {B^2 - 1}) \\
B = \frac {e^a + e^{-a}} 2 \\
\cosh x = \frac {e^x + e^{-x}} 2
$$
Now using the fact $\cosh$ and $\sinh$ lie on a hyperbola (can be proved algebraically) we get:
$$
\sinh x = \frac {e^x - e^{-x}} 2
$$

22
Laplace Transforms.md Normal file
View File

@@ -0,0 +1,22 @@
#Calculus #Math
# Background - Analytic Continuation
$$
\int _0^\infty e^{-st} dt = \frac 1 {s}
$$
is used as an analytic continuation of the function. For the Laplace Transform to work, most of the integrals used must be extended to analytic continuations.
# Definition - Laplace Transform
$$
F(s) = \int _0^\infty f(x) e^{-st} dt
$$
# Intuition - The $e^{sx}$ Finding Machine
Take $f(x)$ as $\sum c_n e^{at}$. Plugging into the Laplace Transform:
$$
F(s) = \int _0^\infty \sum c_ne^{(a - s)t} dt
$$
$$
= \sum c_n \int _0^\infty e^{-(s - a)t} dt
$$
$$
= \sum \frac {c_n} {s - a}
$$
Therefore the Laplace Transform of a function reveals both $c_n$ and $s$ in the sum based upon the parts that make up the transform: poles reveal all $s$ values, while the "magnitude" of each pole reveals the magnitude of each $e^{sx}$ term.

44
Leibniz Integral Rule.md Normal file
View File

@@ -0,0 +1,44 @@
#Math #Calculus
# Theorem
Let $f(x, t)$ be such that both $f(x, t)$ and its partial derivative $f_x (x, t)$ be continuous in $t$ and $x$ in a region of the $xt$-plane, such that $a(x) \leq t \leq b(x)$, $x_0 \leq x \leq x_1$. Also let $a(x)$ and $b(x)$ be continuous and have continuous derivatives for $x_0 \leq x \leq x_1$. Then, for $x_0 \leq x \leq x_1$:
$$
\frac d {dx} (\int _{a(x)}^{b(x)} f(x, t) dt) = f(x, b(x)) \cdot \frac d {dx} b(x) - f(x, a(x)) \cdot \frac d {dx} a(x) + \int _{a(x)}^{b(x)} \frac \partial {\partial x} f(x, t) dt
$$
Notably, this also means:
$$
\frac d {dx} (\int _{c_1}^{c_2} f(x) dx) = \int _{c_1}^{c_2} \frac d {dx} f(x) dx
$$
# Proof
Let $\varphi(x) = \int _a^b f(x, t) dt$ where $a$ and $b$ are functions of $x$i. Define $\Delta a = a(x + \Delta x) - a(x)$ and $\Delta b = b(x + \Delta x) - b(x)$. Then,
$$
\Delta \varphi = \varphi(x + \Delta x)- \varphi(x) \\
= \int _{a + \Delta a}^{b + \Delta b} f(x + \Delta x, t) dt - \int _a^b f(x, t) dt \\
$$
Now expand the first integral by integrating over 3 separate ranges:
$$
\int _{a + \Delta a}^a f(x + \Delta x, t) dt + \int _a^b f(x + \Delta x, t) dt + \int _b^{b + \Delta b} f(x + \Delta x, t) dt - \int _a^b f(x, t) dt \\
= -\int _a^{a + \Delta a} f(x + \Delta x, t) dt + \int _a^b [f(x + \Delta x, t) - f(x, t)]dt + \int _b^{b + \Delta b} f(x + \Delta x, t) dt
$$
From mean value theorem we know $\int _a^b f(t) dt = (b - a)f(\xi)$, which applies to the first and last integrals:
$$
\Delta \varphi = -\Delta a f(x + \Delta x, \xi_1) + \int _a^b [f(x + \Delta x, t) - f(x, t)]dt + \Delta b f(x + \Delta x, \xi_2) \\
\frac {\Delta \varphi} {\Delta x} = -\frac {\Delta a} {\Delta x} f(x + \Delta x, \xi_1) + \int _a^b \frac {f(x + \Delta x, t) - f(x, t)} {\Delta x} dt + \frac {\Delta b} {\Delta x} f(x + \Delta x, \xi_2) \\
$$
Now as we set $\Delta x \to 0$, we can express many of the terms as definitions of derivatives (note we pass the limit sign through the integral via bounded convergence theorem). Note now that $\xi_1 \to a$ and $\xi_2 \to b$, which gives us:
$$
\frac d {dx} \int _a^b f(x, t) dt = -\frac {da} {dx} f(x, a) + \int _a^b \frac {\partial} {\partial x} f(x, t) dt + \frac {db} {dx} f(x + \Delta x, b) \\
$$

33
Limits.md Normal file
View File

@@ -0,0 +1,33 @@
#Math #Calculus
Limits are when a number gets really close to another number but never actually reaches it. It is notated by
$$
\lim _{x \to y}
$$
where x approaches y.
You can substitute numbers in for limit variables, such as
$$
\lim _{x \to 1} x + 1 = 2
$$
Limits can go around certain constraints.
$$
\frac {1-x} {1-x}
$$
would not be defined at $x=1$, however
$$
\lim _{x \to 1} \frac {1-x} {1-x} = 1
$$
Limits can also approach infinity, to use “infinity” in certain situations.
$$
\lim _{x \to \infty} \frac 1 x = 0
$$

45
Matrices.md Normal file
View File

@@ -0,0 +1,45 @@
#Math #Algebra
A matrix is an $n$ by $m$ set of values. A $4 \times 3$can be notated by:
$$
\begin{bmatrix}a_1 & a_2 & a_3 \cr b_1 & b_2 & b_3 \cr c_1 & c_2 & c_3 \cr d_1 & d_2 & d_3 \end{bmatrix}
$$
To get a value from matrix $a$ in row $r$ and column $c$, use:
$$
a_{r, c}
$$
# Addition
With two matrices of the same order, add corresponding elements:
$$
\begin{bmatrix} a_1 & b_1 \cr c_1 & d_1 \end{bmatrix} + \begin{bmatrix} a_2 & b_2 \cr c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 + a_2 & b_1 + b_2 \cr c_1 + c_2 & d_1 + d_2 \end{bmatrix}
$$
# Subtraction
With two matrices of the same order, subtract corresponding elements:
$$
\begin{bmatrix} a_1 & b_1 \cr c_1 & d_1 \end{bmatrix} - \begin{bmatrix} a_2 & b_2 \cr c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 - a_2 & b_1 - b_2 \cr c_1 - c_2 & d_1 - d_2 \end{bmatrix}
$$
# Scalar Multiplication
When multiplying a matrix by a scalar, multiply each element by said scalar:
$$
s\begin{bmatrix} a & b \cr c & d \end{bmatrix} = \begin{bmatrix} sa & sb \cr sc & sd \end{bmatrix}
$$
# Matrix Multiplication
Let $a$ be an $i$ by $j$ matrix and $b$ be a $m$ by $n$ matrix. If $j = m$, $ab$ is defined.
$$
ab_{c, d} = \sum _{k = 1}^{j} a_{c, k}b_{k, d}
$$

View File

@@ -0,0 +1,44 @@
#Math #Probability
# Observing Pascals Triangle
| n/k | 0 | 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- | --- | --- |
| 0 | 1 | | | | | |
| 1 | 1 | 1 | | | | |
| 2 | 1 | 2 | 1 | | | |
| 3 | 1 | 3 | 3 | 1 | | |
| 4 | 1 | 4 | 6 | 4 | 1 | |
| 5 | 1 | 5 | 10 | 10 | 5 | 1 |
As you can see, Pascals Triangle generates:
$$
{n \choose k}
$$
or
$$
\frac{n!}{k!(n-k)!}
$$
But how does this work?
First, we can manually prove the top two rows of Pascals Triangle by plugging the values into the binomial coefficient formula.
Afterward, we can use the property of Pascals Triangle, taking Pascals Triangle as a function P:
$$
P(n + 1, k) = P(n, k) + P(n, k-1)
$$
By proving this property in the binomial coefficient formula, we can deduce that Pascals Triangle generates binomial coefficients
# The Proof
$$
\frac{n!}{k!(n-k)!}+\frac{n!}{(k-1)!(n-k+1)!}\\=\frac{n!(n-k+1)}{k!(n-k)!(n-k+1)}+\frac{n!k}{(k-1)!(n-k+1)!k}\\=\frac{n!(n-k+1)}{k!(n-k+1)!}+\frac{n!k}{k!(n-k+1)!}\\=\frac{n!(n+k+1-k)}{k!(n-k+1)!}\\=\frac{n!(n+1)}{k!(n-k+1)!}\\=\frac{(n+1)!}{k!(n-k+1)!}
$$
From this, we have proven that we can generate binomial coefficients using Pascals Triangle

97
Poisson Distribution.md Normal file
View File

@@ -0,0 +1,97 @@
#Math #Probability
# The Poisson Distribution
The Poisson Distribution describes a distribution where an event occurs for an interval of time, where there is an a mean number of times the event happens in the same interval of time.
# Binomial Distribution to Poisson Distribution
Binomial Distribution
$$
\frac {n!} {k!(n-k)!} p^k (1-p)^{n-k}
$$
Binomial Distribution with infinite trials
$$
\lim _{n\to\infty} \frac {n!} {k!(n-k)!} p^k (1-p)^{n-k}
$$
Let a be np, the average success rate in n intervals. This gives us the Poisson Distribution in another form.
$$
\lim _{n\to\infty} \frac {n!} {k!(n-k)!} (\frac {a} {n})^k (1-\frac {a} {n})^{n-k}
$$
$$
\lim _{n\to\infty} \frac {n!} {k!(n-k)!} (\frac {a^k} {n^k}) (1-\frac {a} {n})^n(1-\frac {a} {n})^{-k}
$$
$$
\frac {a^k} {k!} \lim _{n\to\infty} \frac {n!} {n^a(n-k)!} (1-\frac {a} {n})^n(1-\frac {a} {n})^{-k}
$$
Now we have three limits to evaluate
# Evaluating the Limits
## First Limit
$$
\lim _{n \to\infty} \frac {n!} {n^k(n-k)!}
$$
$$
\lim _{n \to\infty} \frac {n(n-1)(n-2)...(n-k)(n-k-1)...(1)} {n^k(n-k)(n-k-1)...(1)}
$$
$$
\lim _{n\to\infty} \frac {n(n-1)...(n-k+1)} {n^k}
$$
$$
\lim _{n\to\infty} (\frac {n} {n})(\frac {n-1} {n})...(\frac {n-k+1} {n})
$$
As n goes to infinity, all the terms tend to 1. Therefore, the limit tends to 1.
## Second Limit
$$
\lim _{n\to\infty} (1-\frac {a} {n})^n
$$
Let u be -n/x (note this tends to negative infinity)
$$
\lim _{n\to\infty}(1+\frac {1} {u})^{-au}
$$
Use definition of e
$$
e^{-a}
$$
## Third Limit
$$
\lim _{n\to\infty}(1-\frac{a} {n})^{-k}
$$
a/n tends to 0
$$
1^k
$$
Therefore this limit tends to 1.
# Putting it together
$$
\frac {e^{-a}a^{k}}{k!}
$$
is the formula for the probability of an event happening k times in an interval of time, where a is the mean number of times of the event happening in the interval of time the event ran in. This is the formula for the Poisson Distribution.

View File

@@ -0,0 +1,37 @@
#Math #NT #Probability
# Problem
Calculate:
$$
P(x, y \in \mathbb{N}: gcd(x, y) = 1)
$$
# Solution
Each number has a $\frac 1 p$ chance to be divisible by prime $p$, so the probability that two numbers do not share prime factor $p$ is
$$
1 - p^{-2}
$$
Therefore, the probability two numbers are coprime is:
$$
\prod _{p \in \mathbb{P}} 1 - p^{-2}
$$
Since $1 - x = (\frac 1 {1 - x})^{-1} = (\sum _{n = 0}^{\infty} x^n)^{-1}$, we can express the above as:
$$
(\prod _{p \in \mathbb{P}} \sum _{n = 0}^{\infty} p^{-2n})^{-1}
$$
We can choose any $n$ for $p^{2n}$ for each prime $p$, so by the Unique Factorization Theorem (any natural number can be prime factored one and only one way), we get:
$$
(\sum _{n = 0} n^{-2})^{-1} \\
= (\frac {\pi^2} 6)^{-1} \\
= \frac 6 {\pi^2}
$$

49
Rational Root Theorem.md Normal file
View File

@@ -0,0 +1,49 @@
#Math #Algebra
# Proof
Let polynomial
$$
P(x) = \sum _{i = 0}^n c_i x^i
$$
where $c_i \in \mathbb{Z}$ (all values of $c$ are integers).
Now let $P(\frac p q) = 0$, where $p$ and $q$ are coprime integers (let a fraction $\frac p q$ be in simplest form and be a root of $P$).
$$
\sum _{i = 0}^n c_i (\frac p q)^i = 0
$$
Multiplying by $q^n$:
$$
\sum _{i = 0}^n c_i p^n q^{n - i} = 0
$$
Now subtract $c_0 q^n$ from both sides and factor $p$ out to get:
$$
p\sum _{i = 1}^n c_i p^{n - 1} q^{n - i} = -c_0 q^n
$$
Now $p$ must divide $-c_0q^n$. However, we know $p$ cannot divide $q^n$ (since $\frac p q$ is in simplest form / $p$ and $q$ are coprime), so $p$ must divide $c_0$.
Doing the same thing as above but with the $a_n$ term and $q$:
$$
q\sum _{i = 0}^{n - 1} c_i p^n q^{n - i - 1} = -c_n p^n
$$
By the above logic, $q$ must divide $c_n$.
## Conclusion
For all rational roots in simplest form ($\frac p q$ where $p$ and $q$ are coprime integers), $p$ must be a factor of the last coefficient while $q$ must be a factor of the first coefficient.
## Notes
For the curious, coprime integers $p$ and $q$ mean that $\gcd(p, q) = 1$.
If future me or someone else is wondering about the excess definitions, this was made for a friend.

263
Skip Gram.md Normal file
View File

@@ -0,0 +1,263 @@
#Coding
# Abstract
> \"No one is going to implement word2vec from scratch\" or sm 🤓
> commentary like that idk
This notebook provides a brief explanation and implementation of a Skip
Gram model, one of the two types of models word2vec refers to.
# Intuition
## Problem
Given a corpus C, map all tokens to a vector such that words with
similar semantics (similar probability of appearing within a context)
close to each other.
## Idea
**The idea of a skip gram model proceeds from these two observations:**
1. Similar words should appear in similar contexts
2. Similar words should appear together
The intuition behind the Skip Gram model is to map a target token to all
the words appearing in a context window around it.
> The MIMS major **Quentin** is a saber fencer.
In this case the target token **Quentin** should map to all the other
tokens in the window. As such the target token should have similar
mappings to words such as MIMS, saber, and fencer.
Skip Gram treats each vector representation of a token as a set of
weights, and uses a linear-linear-softmax model to optimize them. At the
end, the first set of weights are a list of $n$ vectors that map a token
to a prediction of output tokens - solving the initial mapping problem.
# Code & Detailed Implementation
## Preproccessing
Tokenize all the words, and build training pairs using words in a
context window:
``` python
import numpy as np
class Preproccess:
@staticmethod
def tokenize(text):
"""Returns a list of lowercase tokens"""
return "".join([t for t in text.lower().replace("\n", " ") if t.isalpha() or t == " "]).split(" ")
@staticmethod
def build_vocab(tokens, min_count=1):
"""Create an id to word and a word to id mapping"""
token_counts = {}
for token in tokens:
if token not in token_counts:
token_counts[token] = 0
token_counts[token] += 1
sorted_tokens = sorted(token_counts.items(), key=lambda t:t[1], reverse=True) # Sort tokens by frequency
vocab = {}
id_to_word = [0] * len(sorted_tokens)
for i in range(len(sorted_tokens)):
token, count = sorted_tokens[i]
if count < min_count:
break
id_to_word[i] = token
vocab[token] = i
return vocab, id_to_word
@staticmethod
def build_pairs(tokens, vocab, window_size=5):
"""Generate training pairs"""
pairs = []
token_len = len(tokens)
for center in range(token_len):
tokens_before = tokens[max(0, center-window_size):center]
tokens_after = tokens[(center + 1):min(token_len, center + 1 + window_size)]
context_tokens = tokens_before + tokens_after
for context in context_tokens:
if tokens[center] in vocab and context in vocab:
pairs.append((tokens[center], context))
return pairs
@staticmethod
def build_neg_sample(word, context, vocab, samples=5):
"""Build negative samples"""
neg_samples = []
neg_words = [vocab[w] for w in vocab if (w != word) and (w != context)]
neg_samples = np.random.choice(neg_words, size=samples, replace=False)
```
## Build Model
- 3 layers used as an optimizer:
- $L_1 = XW_1$
- $S = W_2 L_1$
- $P = \text{softmax(S)}$
- Loss function: $-\sum \log(P_{\text{context}} | P_{\text{target}})$
- Negative sampling used to speed up training, compare and update
against \~20 negative vocab terms instead of updating all weights
``` python
class Word2Vec:
def __init__(self, vocab_size, embedding_dim=100):
"""Initialize weights"""
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.W1 = np.random.normal(0, 0.1, (vocab_size, embedding_dim)) # First layer - word encoding
self.w2 = np.random.normal(0, 0.1, (embedding_dim, vocab_size)) # Second layer - context encoding
def sigmoid(self, x):
"""Numerically stable sigmoid"""
x = np.clip(x, -500, 500)
return 1 / (1 + np.exp(-x))
def cross_entropy_loss(self, probability):
"""Cross entropy loss function"""
return -np.log(probability + 1e-10) # 1e-10 added for numerical stability
def neg_sample_train(self, center_token, context_token, negative_tokens, learning_rate=0.01):
"""Negative sampling training for a single training pair"""
total_loss = 0
total_W1_gradient = 0
# Forward prop for positive case
center_embedding = self.W1[center_token, :] # L₁ = XW₁
context_vector = self.W2[:, context_token]
score = np.dot(center_embedding, context_vector) #L₂ = L₁W₂, but only for the context token vector
sigmoid_score = self.sigmoid(score)
loss = self.cross_entropy_loss(sigmoid_score)
total_loss += loss
# Backward prop for positive case
score_gradient = 1 - sigmoid_score # ∂L/∂S
W2_gradient = center_embedding * score_gradient # ∂L/∂W₂ = ∂L/∂S * ∂S/∂W₂ = XW₁ * ∂L/∂S
W1_gradient = context_vector * score_gradient # ∂L/∂W₁ = ∂L/∂S * ∂S/∂W₁ = W₂ * ∂L/∂S
# Update weights
self.W2[:, context_token] -= learning_rate * W2_gradient
total_W1_gradient += learning_rate * W1_gradient
for neg_token in negative_tokens:
# Forward prop for negative case
neg_vector = self.W2[:, neg_token]
neg_score = np.dot(center_embedding, neg_vector)
neg_sigmoid_score = self.sigmoid(neg_score)
neg_loss = -np.log(1 - neg_sigmoid_score)
total_loss += neg_loss
# Backward prop for negative case
neg_score_gradient = sigmoid_score
neg_W2_gradient = center_embedding * neg_score_gradient
neg_W1_gradient = context_vector * neg_score_gradient
# Update weights
self.W2[:, neg_token] -= learning_rate * neg_W2_gradient
total_W1_gradient -= learning_rate * neg_W1_gradient
# Update W1
total_W1_gradient = np.clip(total_W1_gradient, -1, 1)
self.W1[center_token, :] += total_W1_gradient
return total_loss
def find_similar(self, token):
"""Use cos similarity to find similar words"""
word_vec = self.W1[token, :]
similar = []
for i in range(self.vocab_size):
if i != token:
other_vec = self.W1[i, :]
norm_word = np.linalg.norm(word_vec)
norm_other = np.linalg.norm(other_vec)
if norm_word > 0 and norm_other > 0:
cosine_sim = np.dot(word_vec, other_vec) / (norm_word * norm_other)
else:
cosine_sim = 0
similar.append((cosine_sim, i))
similar.sort(key=lambda x:x[0], reverse=True)
return [word[1] for word in similar]
```
## Run Model
``` python
def epoch(model, pairs, vocab):
loss = 0
pair_len = len(pairs)
done = 0
for word, context in pairs:
neg_samples = Preproccess.build_neg_sample(word, context, vocab, samples=5)
loss += model.neg_sample_train(word, context, neg_samples)
done += 1
if ((100 * done) / pair_len) // 1 > ((100 * done - 100) / pair_len) // 1:
print("_", end="")
return loss
with open("corpus.txt") as corpus_file:
CORPUS = corpus_file.read()
EPOCHS = 100
tokens = Preproccess.tokenize(CORPUS)
vocab, id_to_token = Preproccess.build_vocab(tokens, min_count=3)
print("~VOCAB LEN~:", len(vocab))
pairs = Preproccess.build_pairs(tokens, vocab, window_size=5)
model = Word2Vec(len(id_to_token), embedding_dim=100)
print("~STARTING TRAINING~")
for i in range(EPOCHS):
print(f"Epoch {i}: {epoch(model, pairs, vocab) / len(id_to_token)}")
print("~FINISHED TRAINING~")
```
# Notes (Pedantic Commentary Defense :P)
1. I use the term \"similar\" and \"related\" in reference to words,
which implies some sort of meaning is encoded. However in practice
word2vec is just looking for words with high probabilities of being
in similar contexts, which happens to correlate to \"meaning\"
decently well.
2. CBOW shares a very similar intuition to Skip Gram, the only
difference is which way you map a target token to context tokens.
3. Of course, a good deal of mathamatical pain can be shaved off this
excercise by using Tensorflow (here is a
[Colab](https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/word2vec.ipynb#scrollTo=iLKwNAczHsKg)
from Tensorflow that does it) - but this is done from scratch so the
inner workings of word2vec can be more easily seen.
4. Results are (very) subpar with a small corpus size, and this isn\'t
optimized for GPUs sooo\... at least the error goes down!
# Sources
1. <https://en.wikipedia.org/wiki/Word2vec>
2. <https://arxiv.org/abs/1301.3781> (worth a read - not a long paper
and def on the less math intensive side of things)
3. <https://ahammadnafiz.github.io/posts/Word2Vec-From-Scratch-A-Complete-Mathematical-and-Implementation-Guide/#implementation>

61
Taylor Series Proof.md Normal file
View File

@@ -0,0 +1,61 @@
#Math #Calculus
Represent function using power series:
$$
f(x) = \sum _{n=0}^{\infty} c_n (x-a)^n
$$
Find $c_0$
$$
c_0=f(a)
$$
Take derivative of function
$$
\frac d {dx} f(x) = \sum _{n=0}^\infty c_{n+1} (n+1)(x-a)^n
$$
Find $c_1$
$$
c_1=\frac {d} {dx} f(a)
$$
Take second derivative of function
$$
\frac {d^2} {d^2x} f(x) = \sum _{n=0}^\infty c_{n+2} (n+1)(n+2)(x-a)^n
$$
Find $c_2$
$$
c_2=\frac {\frac {d^2} {d^2x} f(a)} {2}
$$
Take third derivative of function
$$
\frac {d^3} {d^3x} f(x) = \sum _{n=0}^\infty c_{n+3} (n+1)(n+2)(n+3)(x-a)^n
$$
Find $c_3$
$$
c_3=\frac {\frac {d^3} {d^3x} f(a)} {6}
$$
Create general formula for $n$th element of $c$
$$
c_n = \frac {\frac {d^n} {d^nx}f(a)} {n!}
$$
Create general formula for function as polynomial
$$
f(x)=\sum _{n=0}^\infty \frac {\frac {d^n} {d^nx}f(a)} {n!} (x-a)^n
$$

45
The Basel Problem.md Normal file
View File

@@ -0,0 +1,45 @@
#Math #NT
# Basel Problem Solution
## Base Sum
$$
\frac {\pi^2} 4 \\
= \frac {\pi^2} 4 \csc^2 (\frac \pi 2) \\
= \frac {\pi^2} {4^2} (\csc^2 (\frac \pi 4) + \csc^2 (\frac \pi 4 + \pi))
$$
Do this operation $a$ times, with the above equation being the second time:
$$
= \frac {\pi^2} {4^{a + 1}}\sum _{n = 1}^{2^{a}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\
= \sum _{n = 1}^{2^{a}} \frac {\pi^2} {4^{a + 1}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\
= \sum _{n = 1}^{2^{a}} \frac {\pi ^2}{4^{a + 1}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\
= \sum _{n = 1}^{2^{a}} (\frac {2^{a + 1}} \pi \sin(\frac \pi {2^{a+1}} + \frac \pi {2^a}))^{-2} \\
$$
As $a$ approaches $\infty$:
$$
= 2\sum _{n=1}^{\infty} (2n - 1)^{-2}
$$
Therefore:
$$
\sum _{n = 1}^{\infty} (2n - 1)^{-2} = \frac {\pi^2} {8}
$$
## Manipulating this Sum
$$
\sum _{n = 1}^{\infty} (2n)^{-2} = \frac 1 4 \sum _{n = 1}^{\infty} n^{-2} \\\sum _{n = 1}^{\infty} (2n - 1)^{-2} = \frac 3 4 \sum _{n = 1}^{\infty} n^{-2} \\\frac {\pi ^2} 8 = \frac 3 4 \sum _{n = 1}^{\infty} n^{-2} \\
\frac {\pi ^2} 6 = \sum _{n = 1}^{\infty} n^{-2} \\
$$
Therefore
$$
\frac {\pi ^2} 6 = \sum _{n = 1}^{\infty} n^{-2} \\
$$

81
Totient Function.md Normal file
View File

@@ -0,0 +1,81 @@
#Math #NT
# Definition
Eulers totient function returns the number of integers from $1 \leq k \leq n$ for a positive integer $n$. It is notated as:
$$
\phi(n)
$$
# $\phi(n)$ for Prime Powers
Through prime factorization, for $p^k$, the only positive integers below $p^k$ where $\gcd(p^k, n) > 1$ is where $n = mp$, for $1 \leq m \leq p^{k - 1}$. Therefore:
$$
\phi(p^k) \\ = p^k - p^{k - 1} \\ = p^{k - 1}(p - 1) \\ p^k(1 - \frac 1 p)
$$
# Multiplicative Property of $\phi$
If $m$ and $n$ are coprime:
$$
\phi(m)\phi(n) = \phi(mn)
$$
Proof: Let set $A$ be all numbers coprime to $m$ below $m$, and set $B$ be all numbers coprime to $n$ below $n$.
$$
|A| = \phi(m) \\ |B| = \phi(n)
$$
Let set $D$ be all possible ordered pairs using elements from $A$ and $B$, where the element of $A$ is first. If for each element $(k_1, k_2)$in set $D$ we return a value $\theta$ where:
$$
\theta \equiv k_1 \mod m \\ \theta \equiv k_2 \mod n
$$
CRT ensures $\theta$ is unique to $\mod ab$ and exists. Given the fact $\gcd(x + yz, z) = \gcd(x, z)$, we can say that:
$$
\gcd(\theta, m) = \gcd(k_1, m) = 1 \\ \gcd(\theta, n) = \gcd(k_2, n) = 1 \\ \gcd(\theta, mn) = 1
$$
If we put all $\theta$ in set $C$, we can see that set $C$ has all the elements fitting the above conditions. Looking at the length of $C$:
$$
|C| = \phi(mn) \\
|C| = |A| * |B| = \phi(m)\phi(n) \\
\phi(mn) = \phi(m)\phi(n)
$$
# Value of $\phi$ for any Number
Let a positive integer $n$ prime factorization be:
$$
n = p_1^{k_1}p_2^{k_2}p_3^{k_3}...p_l^{k_l}
$$
Now using the properties above:
$$
\phi(n) \\
= \prod _{i = 1}^l \phi(p_i^{k_i}) \\
= \prod _{i = 1}^l p_i^{k_i}(1 - \frac 1 {p_i})
$$
Multiplying all $p_i^{k_i}$ gives $n$, so factor that out:
$$
= n \prod _{i = 1}^l (1 - \frac 1 {p_i})
$$
(you can derive most textbook definitions from this formula easily)
Final formula:
$$
\phi(n) = n \prod _{i = 1}^l (1 - \frac 1 {p_i})
$$

97
Vectors.md Normal file
View File

@@ -0,0 +1,97 @@
#Math #Algebra
# Defining Vectors
Vectors are a list of components. They can be expressed in ij notation by:
$$
\mathbf a = 2i + 3j -4k
$$
or
$$
\vec a = 2i + 3j -4k
$$
You can also express a vector as a matrix:
$$
\vec a = \begin {bmatrix}
2 \\
3 \\
-4 \\
\end {bmatrix} \\
\vec a = \begin {bmatrix} 2 & 3 & -4 \end {bmatrix}
$$
# Adding and Subtracting Vectors
To add vectors, add their corresponding components. For example:
$$
\vec a = 4i + 7j - 9k \\
\vec b = 3i - 5j - 8k \\
\vec a + \vec b = 7i + 2j - 17k
$$
Subtracting vectors works in a similar fashion:
$$
\vec a - \vec b = i + 12j - k
$$
Here are the formulas:
$$
\vec a + \vec b = a_i+b_i \\
\vec a - \vec b = a_i-b_i
$$
Heres a graph visualizing the addition and subtraction of vectors: [https://www.desmos.com/calculator/gavjpwhnuo](https://www.desmos.com/calculator/gavjpwhnuo)
# Multiplication by Scalar
To multiply a vector by a scalar (regular number), just multiply all the components by that number:
$$
m\vec a = ma_i
$$
# Multiplication by Another Vector: Dot Product
There are two different ways to multiply a vector by another vector. The first way is a dot product. Here is the algebraic definition, where n is the length of the two vectors:
$$
\vec a \cdot \vec b = \sum _{i = 0}^n a_ib_i
$$
With two two dimensional vectors, we can also provide a geometric definition, where $||\vec a||$ is the magnitude of $\vec a$, and $\theta$ is the angle between the vectors:
$$
\vec a \cdot \vec b = ||a|| \: ||b|| \: \cos \theta
$$
As you can see, the dot product returns a single value, or scalar. From the geometric definition, you can see that it describes how much one vector “aligns” to the other.
## Proving that the Definitions are the Same
Let $\vec a$ have a magnitude of $m$ and an angle of $p$, let $\vec b$ have a magnitude of $n$ and an angle of $q$.
$$
\vec a \cdot \vec b \\
= m\cos p \: n \cos q + m\sin p \: n \sin q \\
= mn(\cos p \: cos q + \sin p \: \sin q) \\
= mn\cos(p-q)
$$
Using the algebraic definition, we can get the geometric definition as shown above.
# Cross Product
Let $n$ be a unit vector perpendicular to $\vec a$ and $\vec b$, and $\theta$ be the angle between them. The cross product is:
$$
\vec a \times \vec b = ||\vec a|| \: ||\vec b|| \: \sin \theta \: n
$$

29
Vieta’s Formulas.md Normal file
View File

@@ -0,0 +1,29 @@
#Math #Algebra
Let polynomial $a$ be:
$$
a = c_n \prod _{i = 0}^n (x - r_i)
$$
where $r_i$ is a root of $a$, and $c_n$ is the leading coefficient of $a$.
We can also represent $a$ as:
$$
a = \sum _{i = 0}^n c_i x^i
$$
By expanding the first definition of $a$, we can define $c_i$ by:
$$
c_{n-i} = (-1)^i c_n\sum _{sym}^i r
$$
This is through the nature of multiplying binomials, with the coefficient resulting in the sum of all possible combinations of $i$ roots multiplied together, or the $i$th elementary symmetric sum of set $r$. We also have to multiply by the negative sign, resulting in $(-1)^i$
We can refactor to state:
$$
\sum _{sym}^i r = (-1)^i \frac {c_{n-i}} {c_n}
$$

3
index.md Normal file
View File

@@ -0,0 +1,3 @@
# Welcome! 👋
I am [@craisin](https://craisin.tech), a CS and math enthusiast! These are a set of notes related to anything adjacent to math/CS that I am interested in. Enjoy!

40
sin x = 2.md Normal file
View File

@@ -0,0 +1,40 @@
#Math #Trig
$$
\sin x = 2
$$
$$
\frac {e^{ix} - e^{-ix}} {2i} = 2
$$
$$
e^{ix} - e^{-ix} = 4i \\
$$
$$
e^{ix} - (e^{ix})^{-1} = 4i
$$
Let $u = e^{ix}$:
$$
u - u^{-1} = 4i
$$
$$
u^2 - 1 = 4iu \\ $$$$
u^2 - 4iu - 1 = 0
$$$$
u^2 - 4iu - 4 = -3 $$$$
(u - 2i)^2 = -3 \\ $$$$
u - 2i = \pm \sqrt {-3} $$$$
u = 2i \pm \sqrt {-3} \\ $$$$
u = i(2 \pm \sqrt 3)
$$
Substitute back into $u$, for $n \in \mathbb{Z}$:
$$
e^{ix} = i(2 \pm \sqrt 3) \\ $$$$
ix = \ln (i(2 \pm \sqrt 3)) \\ $$$$
ix = \ln i + 2\pi n+ \ln(2 \pm \sqrt 3) \\ $$$$
ix = \frac {i\pi} 2 + 2\pi n + \ln(2 \pm \sqrt 3) $$$$
x = \frac \pi 2 - i\ln(2 \pm \sqrt 3) + 2\pi n
$$