Initial commit

2025-12-25 21:13:43 -08:00
commit 9ce7679e9c
40 changed files with 2430 additions and 0 deletions
--- a/Problem.md
+++ b/Problem.md
@@ -0,0 +1,29 @@
 #Math #Calculus
 # The Problem
 If $f(x) = \sum _{k \geq 0} a_k x^k$, and this series converges for $x = x_0$, prove:
 $$
 \sum _{k \geq 0} a_k x_0^k H_k = \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy
 $$
 where $H_k$ is defined to be the partial sums of the harmonic series ($H_0 = 0$, $H_k = \sum _{i = 1}^k \frac 1 i$ for $k \geq 1$).
 (from *The Art of Computer Programming*)
 # Solution
 Although this problem might seem intimidating with a power series involving the harmonic numbers on the LHS and a summation function inside an integral on the RHS, it is fairly trivial to bring out the summation and express the RHS as a power series:
 $$
 \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy \\
 = \int _0^1 \frac {\sum _{k \geq 0} a_k x_0^k - \sum _{k \geq 0} a_k x_0^k y^k} {1 - y} dy \\
 = \sum _{k \geq 1} a_k x_0^k \int _0^1  \frac {1 - y^k} {1 - y} dy
 $$
 The integral factor on the last step is now merely Euler’s integral representation for the harmonic numbers, which is easily proven by the simple fact that $\frac {1 - y^k} {1 - y} = \sum _{i = 0}^{k - 1} y^i$. Therefore:
 $$
 \sum _{k \geq 0} a_k x_0^k H_k = \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy
 $$
--- a/Power.md
+++ b/Power.md
@@ -0,0 +1,34 @@
 #Math #Algebra
 # Multiplying an Adjacency Matrix by Itself
 Consider adjacency matrix $M$ with $n$ nodes. We know that if there is a path between $a$ and $b$, $M_{a, b} = 1$, if not $M_{a, b} = 0$. Now consider all possible nodes $c$ to choose a path $a \to c \to b$. We see that for $c$ to be a possible node $M_{a, c} = 1$ and $M_{c, b} = 1$.
 Now observe the definition of multiplication of matrices. We know:
 $$
 MM_{a, b} = \sum _{c = 1}^n M_{a, c} M_{i, c} 
 $$
 For each path $c$, each term summed only equals one when $M_{a, c}$ and $M_{c, b}$ are both $1$, or when path $a \to c \to b$ exists. Therefore, the number of paths of length $2$ between $a$ and $b$ is $MM_{a, b}$.
 # Taking an Adjacency Matrix to $l$
 Now consider a matrix $N$ where there are $N_{a, b}$ paths of length $l$ between $a$ and $b$. Have $M$ be the adjacency matrix for the graph $N$ models. Now say we want all paths of length $l + 1$ that look like $a \to … \to c \to b$. We know the number of paths $a \to … \to c$ of length $l$ is $N_{a, c}$. Similarly, we know there are $M_{c, b}$ paths that satisfy $c \to b$. Therefore, the number of paths that satisfy $a \to … \to c \to b$ is:
 $$
 N_{a, c}M_{c, b}
 $$
 Choosing any node $c$ gives the number of paths of length $l + 1$ satisfying $a \to b$:
 $$
 \sum _{c = 1}^n N_{a, c}M_{c, b} \\
 NM_{a, b}
 $$
 Since $M_{a, b}$ obviously is the amount of paths of length $1$ between $a$ and $b$, by induction, the amount of paths of length $l$ between $a$ and $b$ is:
 $$
 M^l_{a, b}
 $$
--- a/e.md
+++ b/e.md
@@ -0,0 +1,33 @@
 #Math #Calculus
 Limit to solve:
 $$
 \lim _{x\to 0} \frac {e^x-1} {x}
 $$
 Let $t = e^x - 1$
 $$
 \lim _{t\to 0} \frac {t} {\ln(t+1)}
 $$
 $$
 \lim _{t\to 0} \frac {1} {\frac {1} {t} \ln(1+t)}
 $$
 Inverse power log rule
 $$
 \lim _{t\to 0} \frac {1} {\ln(1+t)^{\frac {1} {t}}}
 $$
 Definition of e
 $$
 \frac {1} {\ln e}
 $$
 $$
 1
 $$
--- a/Theory.md
+++ b/Theory.md
@@ -0,0 +1,17 @@
 #Math #CT
 # Categories
 Categories contain:
 - A collection of **objects**
 - A collection of **morphisms** (also called **arrows**) connecting objects denoted by $f: S \to T$, where $f$ is the **morphism**, $S$ is the **source**, and $T$ is the **target**
    - Note: $f: A \to B$ and $g: A \to B$ **DOES NOT IMPLY** $f = g$
    - Formally this can also be expressed as a relation between a collection of objects and a collection of morphisms
    - Morphisms have a notion of **composition**, that being if $f: A \to B$, $g: B \to C$, then $g \circ f: A \to C$
 There are three rules for categories:
 - **Associativity:** For morphisms $a$, $b$, and $c$, $(a \circ b) \circ c = a \circ (b \circ c)$
 - **Closed composition:** If for morphisms $a$ and $b$,  $a \circ b$ exists, then there must be morphism $c = a \circ b$
 - **Identity morphisms:** For every object $A$ in a category, there must be an identity morphism $\text{id}_A: A \to A$
--- a/Identity.md
+++ b/Identity.md
@@ -0,0 +1,51 @@
 #Math #NT
 # Statement
 Let $x \in \mathbb{Z}$, $y \in \mathbb{Z}$, $x \neq 0$, $y \neq 0$, and $g = \gcd(x, y)$. Bezout's Identity states that $\alpha \in \mathbb{Z}$ and $\beta \in \mathbb{Z}$ exists when:
 $$
 \alpha x + \beta y = g
 $$
 Furthermore, $g$ is the least positive integer able to be expressed in this form.
 # Proof
 ## First Statement
 Let $x = gx_1$ and $y = gy_1$, and notice $\gcd(x_1, y_1) = 1$ and $\operatorname{lcm} (x_1, y_1) = x_1 y_1$.
 Since this is true, the smallest integer $\alpha$ for $\alpha x_1 \equiv 0 \mod y$ is $a = y_1$.
 For all integers $0 \leq a, b < y_1$, $ax_1 \not\equiv bx_1 \mod y$. (If not, we get $|b - a| > y_1$, which is contradictory). Thus, by pigeonhole principle, there exists $\alpha$ such that $\alpha x_1 \equiv 1 \mod y_1$.
 Therefore, there is an $\alpha$ such that $ax_1 - 1 \equiv 0 \mod y_1$, and by extension, there exists an integer $\beta$ such that:
 $$
 \alpha x_1 - 1 = -\beta y_1 \\
 \alpha x_1 + \beta y_1 = 1
 $$
 By multiplying by $g$:
 $$
 \alpha x + \beta y = g
 $$
 ## Second Statement
 To prove $g$ is minimum, let’s consider another positive integer $g\prime$:
 $$
 \alpha\prime x + \beta\prime y = g\prime
 $$
 Since all values are a multiple of $g$:
 $$
 0 \equiv \alpha \prime x + \beta \prime x \mod g \\
 0 \equiv g\prime \mod g
 $$
 Since $g$ and $g\prime$ are positive integers, $g\prime \geq g$.
--- a/K.md
+++ b/K.md
@@ -0,0 +1,21 @@
 #Math #Probability
 # Problem
 Why does n choose k, or $\frac{n!}{k!(n-k)!}$ generate the coefficient for $x^ky^{n-k}$ in $(x+y)^n$?
 # Explanation
 Let’s see what happens when expanding $(x+y)^4$:
 $$
 (x+y)^4\\
 =(x+y)(x+y)(x+y)(x+y)\\
 =xxxx+\\
 yxxx+xyxx+xxyx+xxxy+\\
 yyxx+yxyx+yxxy+xyyx+xyxy+xxyy+\\
 xyyy+yxyy+yyxy+yyyx+\\
 yyyy
 $$
 When expanding, notice the number of terms with k of x (and likewise 4-k of y) is the number of combinations of 4 choose k, as you choose k slots to put k x’s in out of 4 slots. Therefore, $(x+y)^n={n \choose 0}x^0y^n+{n \choose 1}x^1y^{n-1}...+{n \choose n-1}x^{n-1}y^1+{n \choose n}x^ny^0$
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,85 @@
 #Math #Calculus
 # Intro
 The Gamma function $\Gamma(x)$ is a way to extend the factorial function, where $\Gamma(n + 1) = n!$. This gives us two conditions defining $\Gamma (x)$:
 $$
 \Gamma(1) = 1 \\
 \Gamma(x + 1) = x \Gamma (x)
 $$
 However, by adding a third condition stating $\Gamma (x)$ is logarithimically convex ($\log \circ \space \Gamma$ is convex), we can prove that $\Gamma (x)$ is unique!
 # Proof
 Let $G$ be a function with the properties above. Since $G(x + 1) = xG(x)$, we can define any $G(x + n)$, where $n \in \mathbb{N}$ as:
 $$
 G(x + n) = G(x)\prod _{i = 0}^{n - 1}(x + i)
 $$
 This means that it is sufficient to define $G(x)$ on $x \in (0, 1]$ for a unique $G(x)$.
 Let $S(x_1, x_2)$ be defined as $\frac {\log (\Gamma(x_2)) - \log (\Gamma(x_1))} {x_2 - x_1}$. Observe that by log-concavity, for all $0 \lt x \leq 1$ and $n \in \mathbb{N}$:
 $$
 S(n - 1, n) \leq S(n, n +x) \leq S(n, n + 1) \\
 \log (G(n))) - \log (G(n-1)) \leq \frac {\log (G(n + x)) - \log (G(n))} {x} \leq \log (G(n + 1)) - \log (G(n)) \\
 \log ((n - 1)!) - \log ((n-2)!) \leq \frac {\log (G(x + n)) - \log ((n - 1)!)} {x} \leq \log (n!) - \log ((n - 1)!) \\
 \log(n - 1) \leq \frac {\log (\frac{G(x + n)}{(n - 1)!})} {x} \leq \log(n) \\
 \log((n - 1)^x) \leq \log (\frac {G(x + n)}{(n - 1)!})\leq \log(n^x) \\
 $$
 Raising to the $n$:
 $$
 (n - 1)^x \leq \frac {G(x + n)}{(n - 1)!}\leq n^x \\
 (n - 1)^x(n - 1)! \leq G(x + n)\leq n^x(n - 1)! \\
 $$
 Using the above work to expand $G(x + n)$:
 $$
 \frac{(n - 1)^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \leq G(x) \leq \frac{n^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \\
 \frac{(n - 1)^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \leq G(x) \leq \frac{n^xn!} {\prod _{i = 0}^n(x + i)}(\frac {n + x} n) \\
 $$
 Of course, taking the limit as $n$ goes to infinity on both sides by brute force will produce the value of $G(x)$, however I will present a more elegant solution. Notice we can take the inequalities separately, resulting in:
 $$
 \frac{(n_1 - 1)^x(n_1 - 1)!} {\prod _{i = 0}^{n_1 - 1}(x + i)} \leq G(x)\\
 G(x) \leq \frac{n_2^xn_2!} {\prod _{i = 0}^{n_2}(x + i)}(\frac {n_2 + x} {n_2}) \\
 $$
 This shows that no matter $n_1$ and $n_2$, the equality still holds!
 Now we can sub in $n_1 = n + 1$, $n_2 = n$, to get:
 $$
 \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} (\frac {n + x} n)\\
 $$
 Taking a limit to infinity on both sides:
 $$
 \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} (\frac {n + x} n)\\
 \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \\
 $$
 <aside>
 <img src="https://www.notion.so/icons/star_yellow.svg" alt="https://www.notion.so/icons/star_yellow.svg" width="40px" />
 Therefore there is only a singular function satisfying $G(x)$, as it is squeezed on $[0, 1)$.
 </aside>
 # Exercise to the Reader
 Prove that the definition:
 $$
 \Gamma(n) = \int _{0}^{\infty} x^{n - 1} e^{-x} dx 
 $$
 is valid.
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,245 @@
 #Math #Probability
 # The Central Limit Theorem
 Let us sum $n$ instances from an i.i.d (independent and identical distribution) with defined first and second moments (mean and variance). Center the distribution on $0$ and scale it by its standard deviation. As $n$ goes to infinity, the distribution of that variable goes toward
 $$
 \frac 1 {\sqrt 2 \pi} e^{- \frac {x^2} 2}
 $$
 or the standard  normal distribution
 ## Mathematical Definition
 Let Y be the mean of a sequence of n i.i.ds
 $$
 Y = \frac 1 n \sum _{i=1}^{n} X_i
 $$
 Let $\mu=E(X_i)$, the expected value of $X$, and $\sigma = \sqrt {Var(X)}$, the standard deviation of $X$
 Calculate the expected value of Y, $E(Y)$, and the variance, $Var(Y)$:
 $$
 E(Y) \\
 = E(\frac 1 n \sum _{i=1}^{n} X_i) \\
 = \frac 1 n \sum _{i=1}^{n} E(X_i) \\
 = \frac 1 n \sum _{i=1}^{n} \mu \\
 = \frac {n \mu} {n} \\
 = \mu
 $$
 $$
 Var(Y) \\
 = Var(\frac 1 n \sum _{i=1}^n X_i) \\
 = \frac 1 {n^2} \sum _{i=1}^n Var(X_i) \\
 = \frac \sigma n
 $$
 Let $Y^*$ be centered by $E(Y)$ and scaled by it's standard deviation, $\sqrt {Var(Y)}$
 $$
 Y^* \\ = \frac {Y - E(Y)} {\sqrt {Var(Y)}} \\ = \frac {Y - \mu} {\sqrt {\frac {\sigma^2} {n}}} \\ = \frac {\sqrt n (Y - \mu)} \sigma \\= \frac {\sqrt n (\frac 1 n \sum _{i=0}^n X_i - \mu)} \sigma \\ = \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma 
 $$
 The CLT states
 $$
 Y^* \overset d \to N(0, 1)
 $$
 Or $Y^*$ converges in distribution to the standard normal distribution with a mean of 0 and a standard deviation of 1
 # Proof
 ## A Change in Variables
 Let $S$ be the sum of our sequence of n i.i.ds
 $$
 S = \sum _{i=1}^{n} X_i
 $$
 Let’s calculate $E(S)$ and $Var(S)$
 $$
 E(S) \\
 =E(\sum _{i=1}^n X_i) \\
 =\sum _{i=1}^n E(X_i) \\
 =\sum _{i=1}^n \mu \\
 = n\mu
 $$
 $$
 Var(S) \\
 =Var(\sum _{i=1}^n X_i) \\
 =\sum _{i=1}^n Var(X_i) \\
 =\sum _{i=1}^n \sigma^2 \\
 =n\sigma^2
 $$
 Center $S$ by $E(S)$ and scale it by $\sqrt {Var(S)}$ for $S^*$
 $$
 S^* \\
 = \frac {S - E(S)} {\sqrt {Var(S)}} \\
 = \frac {S - n\mu} {\sqrt {n\sigma^2}} \\
 = \frac {S - n\mu} {\sqrt {n}\sigma} \\
 = \frac {\frac 1 {\sqrt n} (S-n\mu)} { \sigma} \\
 = \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma 
 $$
 From the above, $Y^*=S^*$. In the proof, we will use $S^*$, as it is easier to manipulate.
 ## MGFs
 An MGF is a function where
 $$
 M_V(t) = E(e^{tV})
 $$
 where $V$ is a random variable
 (reminder for me to do another notion on this)
 ### Properties of MGFs
 Property 1:
 If
 $$
 C=A+B
 $$
 Then
 $$
 M_C(t) \\
 = E(e^{tC}) \\
 = E(e^{ta + tb}) \\
 = E(e^{ta}e^{tb}) \\
 = E(e^{ta}) + E(e^{tb}) \\
 = M_A(t) + M_B(t)
 $$
 Property 2:
 $$
 M_V^{(r)}(0) = E(V^r)
 $$
 The $r$ derivative of $M_V$ gives the $r$ moment of $V$
 Property 3:
 Let $A$ be a sequence of random variables with MGFs of $A_1$, $A_2$… $A_n$
 If
 $$
 M_{A_n}(t) \to M_B(t)
 $$
 Then
 $$
 A \overset d \to B
 $$
 ### MGF of a Normal Distribution
 Let a random variable derived from a standard normal distribution be Z
 $$
 Z \sim N(0, 1)
 $$
 $$
 M_z(t) \\
 = E(e^{xt}) \\
 = \int _{-\infty}^{\infty} e^{xt} \frac 1 {\sqrt {2\pi}} e^{-\frac {x^2} 2} dx \\
 = \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{tx-\frac 1 2 x^2} dx \\
 = \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx )} dx \\
 = \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx + t ) + \frac 1 2 t^2 } dx \\
 = \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 + \frac 1 2 t^2 } dx \\
 = e ^ {\frac 1 2 t^2} \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 } dx \\
 = e ^ {\frac {t^2} 2} 
 $$
 ## The Argument
 To prove the CLT, we need to prove that $S^*$ converges to $N(0, 1)$ as $n \to \infty$. Our approach will be to prove that the MGF of $N(0, 1)$ converges to the distribution of $S^*$ as $n \to \infty$.
 $$
 S^* \\
 = \frac {S - E(S)} {\sqrt {Var(S)}} \\
 = \frac {S - n\mu} {\sqrt {n \sigma^2}} \\
 = \frac {\sum _{i=1}^{n} X_i - n\mu} {\sqrt n \sigma} \\
 = \sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma}
 $$
 Start manipulating MGF of $S^*$:
 $$
 M_{S^*}(t) \\
 = E(e^{tS^*}) \\
 = E(e^{t(\sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma})}) \\
 = E(e^{t(\frac {(X-\mu)} {\sqrt n \sigma})})^n \\
 = (M_{\frac {(X-\mu)} {\sqrt n \sigma}}(t))^n \\
 =(M_{(X - \mu)} (\frac t {\sqrt n \sigma })^n
 $$
 Expand out Taylor series for $(M_{(x-\mu)}(\frac t {\sqrt n \sigma}))^n$ (note $O(t^3)$ means order $t^3$ and above, and tends to zero as $n$ goes to $\infty$ ):
 $$
 M_{(X-\mu)}(\frac t {\sqrt n \sigma}) \\
 = (M_{(X-\mu)}(0)) + (\frac {M_{(X-\mu)}\prime(0)} {1!})(\frac t {\sqrt n \sigma}) + (\frac {M_{(X-\mu)}\prime\prime(0)} {2!})(\frac t {\sqrt n \sigma})^2 + (\frac {M_{(X-\mu)}\prime\prime\prime(0)} {1!})(\frac t {\sqrt n \sigma})^3 + ...\\
 = 1 + (\frac {t} {\sqrt n \sigma})E(X-\mu) + (\frac {t^2} {2 n \sigma^2})E((X-\mu)^2) + (\frac {t^3} {6n ^ {\frac 3 2} \sigma ^ 3})E((X-\mu)^3) + ... \\
 = 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2}
 {2n \sigma^2})E((X-\mu)^2) + O(t^3) \\
 \approx 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2}
 {2n \sigma^2})E((X-\mu)^2)
 $$
 Remember $E(X-\mu) = 0$ and $E((X-\mu)^2) = \sigma^2$
 $$
 = 1 + (\frac t {\sqrt n \sigma})(0) + (\frac {t^2} {2n \sigma^2})(\sigma ^ 2) \\
 = 1 + \frac {t^2} {2n}
 $$
 Solve for $M_{S^*} (t)$:
 $$
 M_{S^*}(t) = (1 + \frac {t^2} {2n})^n
 $$
 Solve $M_{S^*} (t)$ for $\lim _{n \to \infty}$:
 $$
 \lim _{n \to \infty} M_{S^*}(t) \\
 = \lim _{n \to \infty} (1 + \frac {t^2} {2n})^n \\
 = \lim _{n \to \infty} (1 + \frac 1 {(\frac {2n} {t^2})})^{\frac {t^2} 2 (\frac {2n} {t^2})} \\
 = e^{\frac {t^2} 2}
 $$
 Since $\lim _{n \to \infty} M_{S^*} (t) \to M_Z(t)$, $\lim _{n \to \infty}S^* \overset d \to N(0, 1)$. Therefore:
 $$
 Y^* \overset d \to N(0, 1)
 $$
 proving the Central Limit Theorem
 ## Summary of the Argument
 $$
 Y^* = S^* \\
 \lim _{n \to \infty} M_{S^*}(t) \to M_Z (t) \\
 \lim _{n \to \infty} S^* \to N(0, 1) \\
 \lim _{n \to \infty} Y^* \to N(0, 1) \\
 $$
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,98 @@
 #Math  #NT
 # Theorem
 Say $m$ and $n$ are two coprime positive integers. The Chicken McNugget Theorem states the highest number that can't be expressed by $am + bn$, $a \in \mathbb{Z}$, $b \in \mathbb{Z}$, and $a, b \geq 0$ is:
 $$
 mn - m - n
 $$
 # Proof
 Let a purchasable number relative to $m$ and $n$ be able to be represented by
 $$
 am + bn
 $$
 Where $a$ and $b$ are two non negative integers
 ## Lemma 1
 Let $A_N \subset \mathbb{Z} \times \mathbb{Z}$ and $A_N$ be all $(x, y)$ such that for $m \in \mathbb{Z}$ and $n \in \mathbb{Z}$, $xm + yn. = N$. For $(x, y) \in A_N$:
 $$
 A_N = \{(x + kn, y - km): k \in \mathbb{Z}\}
 $$
 ### Proof
 By Bezout's Lemma, there exists integers $x\prime$ and $y\prime$ such that $x\prime m + y\prime n = 1$. Then, $Nx\prime m + Ny\prime n = N$. Thus, $A_N$ is nonempty.
 Each addition of $kn$ to $x$ adds $kmn$ to $N$, and each subtraction of $km$ from $y$ subtracts $kmn$ from $N$, so all these values are in $A_N$.
 To prove these are the only solutions, let $(x_1, y_1) \in A_N$ and $(x_2, y_2) \in A_N$. This means:
 $$
 mx_1 + ny_1 = mx_2 + ny_2 \\
 m(x_1 - x_2) = n(y_2 - y_1) \\
 $$
 Since $m$ and $n$ are coprime, and $m$ divides $n(y_2 - y_1)$:
 $$
 y_2 - y_1 \equiv 0 \mod m \\
 y_2 \equiv y_1 \mod m
 $$
 Similarly:
 $$
 x_2 \equiv x_1 \mod n
 $$
 Let $k_1, k_2 \in \mathbb{Z}$ such that:
 $$
 x_2 - x_1 = k_1n \\
 y_1 - y_2 = k_2m \\
 $$
 By multiplying by $m$ and $n$ respectively, we get $k_1 = -k_2$, proving the lemma.
 ## Lemma 2
 For $N \in \mathbb{Z}$, there is a unique $(a_N, b_N) \in \mathbb{Z} \times \{0, 1, 2… m - 1\}$ such that $a_Nm + b_Nn = N$.
 ## Proof
 There is only one possible $k$ for
 $N$ is purchasable if and only if $a_N \geq 0$.
 ## Lemma 3
 $$
 0 \leq y - km \leq m - 1
 $$
 ### Proof
 If $a_N \geq 0$, we can pick $(a_N, b_N)$ so $N$ is purchasable. If $a_N < 0$, $a_N + kn < 0$ when $k \leq 0$, or $b_N + km < 0$ for $k > 0$.
 ## Putting it Together
 Therefore, the set of non purchasable integers is:
 $$
 \{xm + yn : x<0, 0 \leq y \leq m -1\}
 $$
 To maximize this set, we chose $x = -1$ and $y = m - 1$:
 $$
 -m + (m - 1)n \\
 mn - m - n
 $$
--- a/Proof).md
+++ b/Proof).md
@@ -0,0 +1,91 @@
 #Math #NT 
 For the proof, let p and q be coprime 
 # Rearrangement
 $$
 x=a \: mod \: p\\
 x=b \: mod \: q
 $$
 Subtract a from both equations
 $$
 x-a=0 \: mod \: p\\
 x-a=b-a \: mod \: q
 $$
 # The Underlying Problem
 Let m be an integer from 0 to q-1 (inclusive), and r be an integer from 0 to q-1 (inclusive)
 $$
 mp=r \: mod \: q
 $$
 There are q possible values of m, and q possible values of r.
 Since p and q are coprime, the remainders cannot repeat until after m > q-1
 Therefore, there is a unique value of m to produce any remainder r in the above equation.
 # Putting it all Together
 If we look at the last equation in *Rearrangement*, we see it matches the equation in *The Underlying Problem*, where b-a corresponds to r, and x-a corresponds to mp.
 So, we can see there is one unique solution for x in the interval of 0 to pq-1 (inclusive)
 We can extend this by saying there will be a solution pq larger than another solution, making the solutions expressible via mod.
 # The Underlying Problem (but rigour)
 <aside>
 hi LH, this was actually one of the theorems that got me into compo math (Raina can actually vouch). as such the above section is lwk bad, so here is an update since you asked
 (oh jeez I did not know how latex worked back then sorry)
 </aside>
 Again start with
 $$
 mp \equiv r \mod q
 $$
 Suppose $m_1$ and $m_2$ are two $m$ that give the same $r$. Then $pm_1 \equiv pm_2 \mod q$. By the cancellation law $m_1 \equiv m_2 \mod q$, since $\gcd(p, q) = 1$.
 ### Cancellation Law Proof (Brownie Points)
 $$
 pm_1 - pm_2 = p(m_1 - m_2)
 $$
 Know $q$ divides $pm_1 - pm_2$ since they are both the same mod $q$, therefore $q$ divides the RHS. By Euclid’s Lemma $q$ must divide $m_1 - m_2$, meaning $m_1 \equiv m_2 \mod q$.
 # Final Theorem
 Let p and q be coprime. If:
 then:
 $$
 x \: rem \: pq
 $$
 exists and is unique.
 # Notes
 $$
 x \: = 0 \: mod \: y\\
 x \: rem \: y = 0
 $$
 both mean x is divisible by y.
 $$
 x=a \: mod \: p\\
 x=b \: mod \: q
 $$
--- a/Stuff.md
+++ b/Stuff.md
@@ -0,0 +1,33 @@
 #Math #Probability
 # Problem
 Given $m$ items of one type and $n$ items of another type, what is the probability of choosing $l$ items of type one and $o$ items of type two if you pick $l + o$ items?
 # Solution
 Total ways to choose the items not considering types:
 $$
 {m + n} \choose {l + o}
 $$
 Total ways to choose $l$ items of type one: 
 $$
 m \choose l
 $$
 Total ways to choose $o$ items of type two: 
 $$
 n \choose o
 $$
 Multiply the ways to choose both items to get the number of ways to choose $l$ items of type one and $o$ items of type two, divide by total number of combinations: 
 $$
 \frac {{m \choose l} {n \choose o}} {{m + n} \choose {l + o}}
 $$
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,40 @@
 #Math #Probability
 # Conditional Probability
 Conditional probability, or the probability of $A$ given $B$ is:
 $$
 P(A|B)
 $$
 Let's start with the probability of $P(A \cup B)$. We know that when $P(A | B)$, $B$ is given to be true. Therefore, we must divide the probably of $P(A \cup B)$ by $P(B)$.
 $$
 P(A | B) = \frac {P(A \cup B)} {P(B)}
 $$
 This defines $P(A | B)$ for events. When $P(A | B) = P(A)$, $A$ and $B$ are independent.
 # Bayes’ Theorem
 Let's start with the definitions of conditional probability:
 $$
 P(A | B) = \frac {P(A \cup B)} {P(B)} \\
 P(B | A) = \frac {P(A \cup B)} {P(A)}
 $$
 Rearrange the second equation to define $P(A \cup B)$:
 $$
 P(A \cup B) = P(A) P(B | A)
 $$
 Now substitute that equation into the first equation:
 $$
 P(A | B) = \frac {P(A) P(B | A)} {P(B)}
 $$
 The above equation is Bayes' Theorem for events.
--- a/Convolutions.md
+++ b/Convolutions.md
@@ -0,0 +1,45 @@
 #Math #Probability
 # Discrete Case
 Let’s create a function expressing the probability two functions results have a sum of $s$.
 $$
 \sum _{x = -\infty}^{\infty} f(x)g(s-x)
 $$
 Let's unpack this formula. The inside of the sum finds the probability of a single case where $f$ and $g$ adds to $s$. By using a summation, we can run through every possible case that this happens.
 This operation is called a discrete convolution. Convolutions are notated as
 $$
 [f * g](s)
 $$
 # Continuous Case
 Extending the previous equation over to a continuous function, we can attain a definition like this:
 $$
 [f * g](s) = \int _{-\infty}^{\infty} f(x)g(s-x) dx
 $$
 Naturally, we'd expect this to be a probably density function of $f + g$. This is from the same effect as the discrete convolution, except we talk about this for an infinitely small point and probability densities.
 # Summary
 Convolutions return the probability or probability density of adding two functions together (this depends on the type of function you use).
 They are defined by:
 Discrete:
 $$
 [f * g](s) = \sum _{x = -\infty}^{\infty} f(x)g(s-x)
 $$
 Continuous:
 $$
 [f * g](s) = \int _{-\infty}^{\infty} f(x)g(s-x) dx
 $$
--- a/Dearrangement.md
+++ b/Dearrangement.md
@@ -0,0 +1,45 @@
 #Math #Probability
 # Problem
 How many ways are there to arrange a set of $n$ distinct elements such that no element is in it's original position? 
 # Solution
 The way to arrange the set without consideration for position is:
 $$
 n!
 $$
 Now accounting for the values that have one element in it's original position: 
 $$
 n! - {n\choose 1}(n - 1)!
 $$
 However, we subtracted arrangements with two elements in their original position twice: 
 $$
 n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)!
 $$
 Now, we readded arrangements with three elements in their original position:
 $$
 n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! - {n \choose 3}(n - 3)!
 $$
 This pattern continues by PIE, giving us: 
 $$
 n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! - {n \choose 3}(n - 3)! ... + (-1)^n{n \choose n}(n - n)!
 $$
 Since ${n \choose k}(n - k)! = \frac {n!} {k!}$, we can rewrite as:
 $$
 \frac {n!} {0!} - \frac {n!} {1!} + \frac {n!} {2!} ... + (-1)^n\frac {n!} {n!} \\
 = \sum _{k = 0}^n (-1)^k \frac {n!} {k!} \\
 = n! \sum _{k = 0}^n \frac {(-1)^k} {k!}
 $$
--- a/Derivatives.md
+++ b/Derivatives.md
@@ -0,0 +1,91 @@
 #Calculus #Math
 # Intuition & Definition
 How can instant rate of change be defined at a point?
 Call our function of choice $y$:
 Slope of $y$ between $x_1$ and $x_2$:
 $$
 \frac {y_1 - y_2} {x_1 - x_2}
 $$
 $$
 = \frac {y(x_1) - y(x_2)} {x_1 - x_2}
 $$
 However, $x_1 \neq x_2$ due to division by $0$.
 ## Definitions
 Avoid division by $0$ via using a limit such that $x_1 \to x_2$:
 $$
 \frac {dy} {dx} = \lim_{x_1 \to x_2} \frac {y(x_1) - y(x_2)} {x_1 - x_2}
 $$
 Changing variables:
 $$
 \frac {dy} {dx} = \lim_{a \to x} \frac {y(a) - y(x)} {a - x}
 $$
 Define $a = \lim _{\Delta x \to 0}(x + \Delta x)$:
 $$
 \frac {dy} {dx} = \lim_{\Delta x \to 0} \frac {y(x + \Delta x) - y(x)} {\Delta x}
 $$
 # Derivative Rules
 ## Constant Rule
 When $y = a$ and $a$ is constant:
 $$
 \frac {dy} {dx}
 $$
 $$
 = \lim_{\Delta x \to 0} \frac {a - a} {\Delta x}
 $$
 $$
 = \lim_{\Delta x \to 0} \frac {0} {\Delta x}
 $$
 $$
 = 0
 $$
 ## Sum and Difference Rule
 $$
 \frac {df} {dx} + \frac {dg} {dx}
 $$
 $$
 = \lim_{\Delta x \to 0} \frac {f(x + \Delta x) - f(x)} {\Delta x} + \frac {g(x + \Delta x) - g(x)} {\Delta x}
 $$
 $$
 = \lim_{\Delta x \to 0} \frac {[f(x + \Delta x) + g(x + \Delta x)] - [f(x) + g(x)]}{\Delta x}
 $$
 $$
 = \frac d {dx} (f + g)
 $$
 ## Power Rule
 > **Note:** This proof of power rule only extends to $n \in \mathbb{N}$. Power rule can be extended to $n \in \mathbb{Z}$ through the use of the derivative of $\ln$, but this article does not cover such a proof as of now.
 $$
 \frac {d} {dx} x^n
 $$
 $$
 = \lim_{\Delta x \to 0} \frac {(x + \Delta x)^n - x^n} {\Delta x}
 $$
 Use a binomial expansion:
 $$
 = \lim_{\Delta x \to 0} \frac {\sum_{i = 0}^n {n \choose i}x^i{\Delta x}^{n - i} - x^n} {\Delta x}
 $$
 Take out last term in sum:
 $$
 = \lim_{\Delta x \to 0} \frac {x^n + \sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i} - x^n} {\Delta x}
 $$
 $$
 = \lim_{\Delta x \to 0} \frac {\sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i}} {\Delta x}
 $$
 $$
 = \lim_{\Delta x \to 0} \sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i - 1}
 $$
 Bring limit inside sum:
 $$
 = \sum_{i = 0}^{n - 1} \left[{n \choose i}x^i \lim_{\Delta x \to 0} {\Delta x}^{n - i - 1}\right]
 $$
 For $i < n - 1$, $\lim {\Delta x \to 0} \Delta x^{n - i - 1} = 0$, so only the case where $i = n - 1$ matters:
 $$
 = {n \choose {n - 1}} x^{n - 1}
 $$
 $$
 = nx^{n - 1}
 $$
 >**Therefore:**
 >$$ \frac d {dx} x^n = nx^{n - 1} $$
--- a/Function.md
+++ b/Function.md
@@ -0,0 +1,34 @@
 #Math #Calculus
 # Extending the Factorial Function
 We know $n!$ has a restricted domain of $n \in \mathbb{N}$, but we want to extend this function to $n \in \mathbb{R}$. To do this, we define two basic properties for the gamma function:
 $$
 n\Gamma(n) = \Gamma(n + 1) \\
 \Gamma(n + 1) = n!, \space n\in \mathbb{N}
 $$
 # Derivation
 We know repeated differentiation can generate a factorial function, so we start by differentiating:
 $$
 \int _{0}^{\infty} e^{-ax} dx = \frac 1 a
 $$
 **Lebeniz Integral Rule** allows us to differentiate inside the integral, so by repeated differentiation with respect to $a$ and cancelling out the negative sign we get:
 $$
 \int _{0}^{\infty} xe^{-ax} dx = \frac 1 {a^2} \\
 \int _{0}^{\infty} x^2e^{-ax} dx = \frac 2 {a^3} \\
 \int _{0}^{\infty} x^ne^{-ax} dx = \frac {n!} {a^{n + 1}} \\
 $$
 Plugging $a = 1$ we get:
 $$
 \Gamma(n) = \int _{0}^{\infty} x^{n - 1} e^{-x} dx 
 $$
 Plugging the definition into the above properties should affirm that this defines the gamma function.
--- a/Limit.md
+++ b/Limit.md
@@ -0,0 +1,55 @@
 #Math #Calculus
 # Definition
 When
 $$
 \lim _{x \to c} f(x) = L
 $$
 For $\epsilon > 0$ and $\delta > 0$, there is a value $\delta$ for every value of $\epsilon$ such that
 $$
 0 < |x - c| < \delta\\
 0 < |f(x) - L| < \epsilon\\
 $$
 # Proving a Limit
 Let’s prove:
 $$
 \lim _{h \to 0} \frac {(x + h)^2 - x^2} h = 2x
 $$
 Let:
 $$
 0 < |\frac {(x + h)^2 - x^2} h - 2x| < \epsilon \\
 0 < |\frac {(x + h)^2 - x^2} h - 2x| < \epsilon \\0 < |\frac {(x^2 + 2xh + h^2 - x^2)} h - 2x| < \epsilon \\0 < |\frac {2xh + h^2} h - 2x| < \epsilon \\
 $$
 Remember $\epsilon > 0$:
 $$
 0 < |2x + h - 2x| < \epsilon \\
 0 < |h| < \epsilon
 $$
 We have to prove for every $\epsilon$:
 $$
 0 < |h - 0| < \delta \\
 0 < |h| < \delta
 $$
 These two inequalities are the same, so they are easily satisfied just by setting:
 $$
 \delta = \epsilon
 $$
 # Graphical Explanation
 [https://www.desmos.com/calculator/tucchymbrq](https://www.desmos.com/calculator/tucchymbrq)
--- a/Functions.md
+++ b/Functions.md
@@ -0,0 +1,57 @@
 #Math #Trig
 # Euler's Formula
 Euler's formula states:
 $$
 e^{i \theta} = i\sin \theta + \cos \theta
 $$
 ## Proof
 $$
 \frac d {d \theta} \frac {i \sin \theta + \cos \theta} {e^{i \theta}} \\
 = e^{-i\theta}(i \sin \theta + \cos \theta) \\
 = (e^{-i\theta})(i \sin \theta + \cos \theta)\prime + (e^{-i\theta}) \prime (i \sin \theta + \cos \theta) \\
 = (e^{-i\theta})(i \cos \theta - \sin \theta) - i(e^{-i\theta})(i \sin \theta + \cos \theta) \\
 = (e^{-i\theta})(i \cos \theta - \sin \theta) - (e^{-i\theta})(i \cos \theta - \sin \theta) \\
 = 0
 $$
 Therefore $\frac {i \sin \theta + \cos \theta} {e^{i \theta}}$ is a constant. Plug in $\theta = 0$, to get $\frac {i \sin \theta + \cos \theta} {e^{i \theta}} = 1$. Multiply both sides by $e^{i\theta}$ to get
 $$
 e^{i \theta} = i\sin \theta + \cos \theta
 $$
 ## Euler's Identity
 Plug $\theta = π$ into Euler's Formula
 $$
 e^{i \pi} = i\sin \pi + \cos \pi \\
 e^{i \pi} = -1
 $$
 # Trig Functions Redefined
 Sine:
 $$
 e^{i \theta} = i\sin \theta + \cos \theta \\
 -e^{-i \theta} = -i\sin -\theta - \cos -\theta \\
 -e^{-i \theta} = i\sin \theta - \cos \theta \\
 e^{i\theta} - e^{-i\theta} = 2i \sin \theta \\
 \sin \theta = \frac {e^{i\theta} - e^{-i\theta}} {2i}
 $$
 Cosine:
 $$
 e^{i \theta} = i\sin \theta + \cos \theta \\
 e^{-i \theta} = i\sin -\theta + \cos -\theta \\
 e^{-i \theta} = -i\sin \theta + \cos \theta \\
 e^{i\theta} + e^{-i \theta} = 2\cos \theta \\
 \cos \theta = \frac {e^{i\theta} + e^{-i \theta}} 2
 $$
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,22 @@
 #Math #NT
 # Fermet’s Little Theorem
 If $p$ is a prime integer:
 $$
 a^{p - 1} \equiv 1 \mod p \\
 a^p \equiv a \mod p
 $$
 $$
 a^{p - 1} \equiv 1 \mod p \\
 a^p \equiv a \mod p
 $$
 # Proof
 Let $p$ be a prime integer. Say a necklace has $p$ beads and $a$ possible colors per bread. Except for a necklace with only one color, each combination of necklace colors has $p$ permutations. Therefore:
 $$
 a^p \equiv a \mod p
 $$
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,31 @@
 #Math #NT
 # Theorem
 Let $a$ and $m$ be coprime numbers.
 $$
 a^{\phi(m)} \equiv 1 \mod m
 $$
 This is a generalization of Fermet's Little Theorem, as $m$ is a prime number in Fermet’s Little Theorem.
 # Proof
 Let:
 $$
 A = \{p_1, p_2, p_3,... p_{\phi(m)} \} \mod m \\
 B = \{ap_1, ap_2, ap_3,...ap_{\phi(m)}\} \mod m
 $$
 Where $p_x$ is the $x$th number relatively prime to $m$.
 Since $a$ and $p_x$ are coprime to $m$, $ap_x$ is coprime to $m$. Since each $p_x$ is unique, $ap_x$ is unique, which makes set $B$ the same as set $A$.
 Since all terms are coprime to $m$:
 $$
 a^{\phi(m)} \prod _{k = 0}^{\phi(m)} p_k \equiv \prod _{k = 0}^{\phi(m)} p_k \mod m \\
 a^{\phi(m)} \equiv 1 \mod m
 $$
--- a/Proof.md
+++ b/Proof.md
@@ -0,0 +1,184 @@
 #Math #Calculus
 # Starting the Proof Off
 The Taylor Series uses $x^n$ as building blocks for a function:
 [[Taylor Series Proof]]
 However, we can use $\sin (nx)$ and $\cos(nx)$ as well. This will be our starting point to derive the Fourier Series:
 $$
 f(x) = a_0\cos (0x) + b_0\sin(0x) + a_1\cos (x) + b_1\sin(x) + a_2\cos (2x) + b_2\sin(2x)... \\
 f(x) = a_0 + \sum _{n = 1}^\infty (a_n\sin(nx) + b_n\cos(nx))
 $$
 This will be the basic equation we will use.
 # Finding $a_0$
 Let’s integrate the equation on both sides, and bound by $[-\pi, \pi]$:
 $$
 \int _{-\pi}^\pi f(x) dx = \int _{-\pi}^\pi a_0 dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx) dx
 $$
 The first integral evaluates to $2\pi a_0$. Since the third integral is an odd function, it evaluates to $0$. The second integral can be expressed as:
 $$
 a_n \int _{-\pi}^\pi \cos(nx) dx \\
 = \frac {a_n} n (\sin(n\pi) - \sin(-n\pi)) \\
 = 0
 $$
 So now we have:
 $$
 2\pi a_0 = \int _{-\pi}^\pi f(x) dx \\
 a_0 = \frac 1 {2\pi} \int _{-\pi}^\pi f(x) dx
 $$
 # Finding $a_n$
 Let’s multiply the entire equation by $\cos(mx)$, where $m \in \mathbb{Z}^+$ ($m$ is a positive integer):
 $$
 f(x)\cos(mx) = a_0\cos(mx) + \sum _{n = 1}^\infty a_n\cos(nx)\cos(mx) + b_n\sin(nx)\cos(mx)
 $$
 Now integrate on both sides, and bound by $[-\pi, \pi]$:
 $$
 \int _{-\pi}^\pi f(x)\cos(mx) dx = \int _{-\pi}^\pi a_0\cos(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx)\cos(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx)\cos(mx) dx
 $$
 We have three integrals on the right hand side to evaluate:
 ## First Integral
 $$
 \int _{-\pi}^\pi a_0 \cos(mx) dx \\
 = \frac{a_0} m \sin(m\pi)- \frac{a_0} m \sin(-m\pi)
 $$
 Since $m\pi$ is always a multiple of $\pi$:
 $$
 =0
 $$
 ## Second Integral
 $$
 \int _{-\pi}^\pi a_n\cos(nx)\cos(mx) dx
 $$
 Using $\cos$ addition formula:
 $$
 = \frac {a_0} 2 \int _{-\pi}^\pi \cos(nx + mx) + \cos(nx - mx) dx \\
 = \frac {a_0} 2 (\int _{-\pi}^\pi \cos(nx + mx) dx + \int _{-\pi}^\pi  \cos(nx - mx) dx) \\
 = [\frac {a_0} 2 (\frac {\sin(nx + mx)} {n + m} + \frac {\sin(nx - mx)} {n - m})]_{-\pi}^{\pi} \\
 $$
 Here you will notice that this integral doesn’t work for $n = m$. We’ll circle back to that later. For now, this is two odd functions being added together. Since the bounds are the negatives of one another:
 $$
 = 0
 $$
 Now, circling back to the extra case, where $n = m$:
 $$
 a_m\int _{-\pi}^\pi \cos^2(nx)dx \\
 = a_m\int _{-\pi}^\pi \frac {1 + \cos(2x)} 2 dx \\
 = a_m[\frac x 2 + \frac {\sin 2x} 4 ]_{-\pi}^\pi \\
 = a_m[(\frac {\pi} 2 + \frac {\sin 2\pi} 4 ) - (\frac {-\pi} 2 + \frac {\sin -2\pi} 4 )] \\
 = a_m\pi
 $$
 So, the second term in the right hand side evaluates to $a_m\pi$.
 ## Third Integral
 $$
 \int _{-\pi}^{\pi} \sin(nx)\cos(mx) dx \\
 = \frac 1 2 \int _{-\pi}^{\pi} \sin(nx + mx) dx + \frac 1 2 \int _{-\pi}^\pi \sin(nx - mx) dx \\
 = [-\frac 1 2(\frac {\cos(nx + mx)} {n + m} + \frac {\cos(nx - mx)} {n - m})]_{-\pi}^\pi \\
 $$
 Remember that $\cos x = -cos(x + \pi)$:
 $$
 = 0
 $$
 ## Putting it Together
 Now we have:
 $$
 \int _{-\pi}^\pi f(x)\cos(mx) dx = a_m\pi \\
 \frac 1 \pi \int _{-\pi}^\pi f(x)\cos(mx) dx = a_m
 $$
 Note in this case $m$ and $n$ both represent any positive integer, and are therefore interchangeable:
 $$
 a_n = \frac 1 \pi \int _{-\pi}^\pi f(x)\cos(nx) dx \\
 $$
 # Finding $b_n$
 Multiply the equation by $\sin mx$, where $m \in \mathbb{Z}^+$,integrate, and bound between $[-\pi, \pi]$:
 $$
 \int _{-\pi}^\pi f(x)\sin(mx) dx = \int _{-\pi}^\pi a_0\sin(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx)\sin(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx)\sin(mx) dx
 $$
 The first two terms are already covered, so let’s focus on the final term.
 ## Last Integral
 $$
 \int _{-\pi}^\pi b_n\sin(nx)\sin(mx) dx \\
 = b_n\int _{-\pi}^\pi \cos(nx - mx) - \cos(nx + mx) dx \\
 = b_n [\frac {\sin(nx - mx)} {n - m} - \frac {\sin(nx + mx)} {n + m}]_{-\pi}^\pi
 $$
 Again, there is a special case where $n = m$. Remember $\sin \pi = 0$, so:
 $$
 = 0
 $$
 With the special case:
 $$
 b_m\int _{-\pi}^\pi \sin^2(mx) dx \\
 = b_m\int _{-\pi}^\pi \frac {-\cos(2mx) + 1} 2 dx \\
 = b_m[\frac 1 2 (x - \frac {\sin(2mx)} {2m})]_{-\pi}^\pi \\
 = b_m\pi
 $$
 ## Putting it Together
 $$
 b_m\pi = \int _{-\pi}^\pi f(x)\sin(mx) dx \\
 b_m = \frac 1 \pi \int _{-\pi}^\pi f(x)\sin(mx) dx \\
 b_n = \frac 1 \pi \int _{-\pi}^\pi f(x)\sin(nx) dx
 $$
 # Fourier Series
 Using the above, let’s express $f(x)$ as a Fourier Series:
 $$
 f(x) = \frac 1 {2\pi} \int _{-\pi}^\pi f(x) dx + \sum _{n = 1}^\infty \frac {\cos (nx)} \pi \int _{-\pi}^\pi f(x)\cos(nx) dx + \sum _{n = 1}^\infty \frac {\sin (nx)} \pi \int _{-\pi}^\pi f(x)\sin(nx) dx
 $$
 Note that this representation only works for when the function repeats from $[0, 2\pi]$. Using a similar proof, we can get:
 $$
 f(x) = \frac 1 P \int _{-\frac P 2}^{\frac P 2} f(x) dx + \sum _{n = 1}^\infty \frac {2 \cos (\frac {2\pi nx} P)} P \int _{-\frac P 2}^{\frac P 2} f(x)\cos(\frac {2\pi nx} P) dx + \sum _{n = 1}^\infty \frac {2 \sin (\frac {2\pi nx} P)} P \int _{-\frac P 2}^{\frac P 2} f(x)\sin(\frac {2\pi nx} P) dx
 $$
--- a/e.md
+++ b/e.md
@@ -0,0 +1,26 @@
 #Math #Calculus
 # Proof
 Let's express a Fourier Series as:
 $$
 v = \frac {2\pi nx} P \\
 f(x) = \sum _{n = 0}^\infty A_n \cos v + B_n \sin v
 $$
 We can deduce:
 $$
 f(x) = \sum _{n = 0}^{\infty} \frac {A_n e^{iv} + A_n e^{-iv} - iB_n e^{iv} + iB_n e^{-iv}} 2 \\
 = \sum _{n = 0}^{\infty} 0.5(A_n + iB_n)e^{-iv} + 0.5(A_n - iB_n)e^{iv} \\
 = \sum _{n = 0}^{\infty} \frac {e^{-iv}} P \int _{-P/2}^{P/2} f(x) (\cos v + i\sin v) dx + \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x) (\cos -v + i\sin -v) dx \\
 = \sum _{n = 0}^{\infty} \frac {e^{-iv}} P \int _{-P/2}^{P/2} f(x)e^{iv} dx + \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x)e^{-iv} dx \\
 = \sum _{n = -\infty}^{\infty} \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x)e^{-iv} dx
 $$
 ## Definitions
 Definitions of $A_n$ and $B_n$:
 [[Fourier Series Proof]]
--- a/Identity.md
+++ b/Identity.md
@@ -0,0 +1,27 @@
 #Math #Probability
 # Statement
 For $n \geq r$, $n, r \in \mathbb{N}$:
 $$
 \sum _{i = r}^n {i \choose r} = {n + 1 \choose r + 1}
 $$
 # Proof
 Let us have a base case $n = r$:
 $$
 {r \choose r} = {r + 1 \choose r + 1} = 1
 $$
 Now suppose $\sum _{i = r}^n {i \choose r} = {n + 1 \choose r + 1}$ for a certain $n$:
 $$
 \sum _{i = r}^n {i \choose r} + {n + 1 \choose r} \\
 = {n + 1 \choose r + 1} + {n + 1 \choose r} \\
 = {n + 2 \choose r + 1}
 $$
 Since $n = r$ is true, and if a case is true for $n$, it is true for $n + 1$, this statement is true for all $n \geq r$.
--- a/Trig.md
+++ b/Trig.md
@@ -0,0 +1,46 @@
 #Math #Trig
 # Definition
 ## Definition in terms of $e$
 We define $\cosh$ and $\sinh$ to be the even and odd parts of $e^x$ respectively:
 $$
 \cosh x = \frac {e^x + e^{-x}} 2 \\
 \sinh x = \frac {e^x - e^{-x}} 2
 $$
 Note this gives us:
 $$
 \sinh x + \cosh x = e^x
 $$
 similar to Euler's Formula for circular trig functions.
 ## Definition in terms of a hyperbola
 [https://www.desmos.com/calculator/ixmjpfmukk](https://www.desmos.com/calculator/ixmjpfmukk)
 Know that the geometric definition of $\cosh$ is that $B = \cosh 2b$, where $b$ is the blue area. To find $b$, we can use:
 $$
 b = \frac {B\sqrt{B^2 - 1}} 2 -\int _1^B \sqrt {x^2 - 1} dx \\
 = \frac {B\sqrt{B^2 - 1}} 2 - \frac {B\sqrt {B^2 - 1} - \ln(B + \sqrt {B^2 - 1})} 2\\
 = \frac {\ln(B + \sqrt {B^2 - 1})} 2
 $$
 Now let $a = 2b = -\ln(B + \sqrt {B^2 - 1})$. Now we can solve for $B$ in terms of $a$ to define $\cosh$:
 $$
 a = \ln(B + \sqrt {B^2 - 1}) \\
 B = \frac {e^a + e^{-a}} 2 \\
 \cosh x = \frac {e^x + e^{-x}} 2
 $$
 Now using the fact $\cosh$ and $\sinh$ lie on a hyperbola (can be proved algebraically) we get:
 $$
 \sinh x = \frac {e^x - e^{-x}} 2
 $$
--- a/Transforms.md
+++ b/Transforms.md
@@ -0,0 +1,22 @@
 #Calculus #Math
 # Background - Analytic Continuation
 $$
 \int _0^\infty e^{-st} dt = \frac 1 {s}
 $$
 is used as an analytic continuation of the function. For the Laplace Transform to work, most of the integrals used must be extended to analytic continuations.
 # Definition - Laplace Transform
 $$
 F(s) = \int _0^\infty f(x) e^{-st} dt
 $$
 # Intuition - The $e^{sx}$ Finding Machine
 Take $f(x)$ as $\sum c_n e^{at}$. Plugging into the Laplace Transform:
 $$
 F(s) = \int _0^\infty \sum c_ne^{(a - s)t} dt
 $$
 $$
 = \sum c_n \int _0^\infty e^{-(s - a)t} dt
 $$
 $$
 = \sum \frac {c_n} {s - a}
 $$
 Therefore the Laplace Transform of a function reveals both $c_n$ and $s$ in the sum based upon the parts that make up the transform: poles reveal all $s$ values, while the "magnitude" of each pole reveals the magnitude of each $e^{sx}$ term.
--- a/Rule.md
+++ b/Rule.md
@@ -0,0 +1,44 @@
 #Math #Calculus
 # Theorem
 Let $f(x, t)$ be such that both $f(x, t)$ and its partial derivative $f_x (x, t)$ be continuous in $t$ and $x$ in a region of the $xt$-plane, such that $a(x) \leq t \leq b(x)$, $x_0 \leq x \leq x_1$. Also let $a(x)$ and $b(x)$ be continuous and have continuous derivatives for $x_0 \leq x \leq x_1$. Then, for $x_0 \leq x \leq x_1$:
 $$
 \frac d {dx} (\int _{a(x)}^{b(x)} f(x, t) dt) = f(x, b(x)) \cdot \frac d {dx} b(x) - f(x, a(x)) \cdot \frac d {dx} a(x) + \int _{a(x)}^{b(x)} \frac \partial {\partial x} f(x, t) dt
 $$
 Notably, this also means:
 $$
 \frac d {dx} (\int _{c_1}^{c_2} f(x) dx) = \int _{c_1}^{c_2} \frac d {dx} f(x) dx
 $$
 # Proof
 Let $\varphi(x) = \int _a^b f(x, t) dt$ where $a$ and $b$ are functions of $x$i. Define $\Delta a = a(x + \Delta x) - a(x)$ and $\Delta b = b(x + \Delta x) - b(x)$. Then,
 $$
 \Delta \varphi = \varphi(x + \Delta x)- \varphi(x) \\
 = \int _{a + \Delta a}^{b + \Delta b} f(x + \Delta x, t) dt - \int _a^b f(x, t) dt \\ 
 $$
 Now expand the first integral by integrating over 3 separate ranges:
 $$
 \int _{a + \Delta a}^a f(x + \Delta x, t) dt + \int _a^b f(x + \Delta x, t) dt + \int _b^{b + \Delta b} f(x + \Delta x, t) dt - \int _a^b f(x, t) dt \\
 = -\int _a^{a + \Delta a} f(x + \Delta x, t) dt + \int _a^b [f(x + \Delta x, t) - f(x, t)]dt + \int _b^{b + \Delta b} f(x + \Delta x, t) dt
 $$
 From mean value theorem we know $\int _a^b f(t) dt = (b - a)f(\xi)$, which applies to the first and last integrals:
 $$
 \Delta \varphi = -\Delta a f(x + \Delta x, \xi_1) + \int _a^b [f(x + \Delta x, t) - f(x, t)]dt + \Delta b f(x + \Delta x, \xi_2) \\
 \frac {\Delta \varphi} {\Delta x} = -\frac {\Delta a} {\Delta x} f(x + \Delta x, \xi_1) + \int _a^b \frac {f(x + \Delta x, t) - f(x, t)} {\Delta x} dt + \frac {\Delta b} {\Delta x} f(x + \Delta x, \xi_2) \\
 $$
 Now as we set $\Delta x \to 0$, we can express many of the terms as definitions of derivatives (note we pass the limit sign through the integral via bounded convergence theorem). Note now that $\xi_1 \to a$ and $\xi_2 \to b$, which gives us:
 $$
 \frac d {dx} \int _a^b f(x, t) dt = -\frac {da} {dx} f(x, a) + \int _a^b \frac {\partial} {\partial x} f(x, t) dt + \frac {db} {dx} f(x + \Delta x, b) \\
 $$
--- a/Limits.md
+++ b/Limits.md
@@ -0,0 +1,33 @@
 #Math #Calculus
 Limits are when a number gets really close to another number but never actually reaches it. It is notated by
 $$
 \lim _{x \to y}
 $$
 where x approaches y.
 You can substitute numbers in for limit variables, such as
 $$
 \lim _{x \to 1} x + 1 = 2
 $$
 Limits can go around certain constraints.
 $$
 \frac {1-x} {1-x}
 $$
 would not be defined at $x=1$, however
 $$
 \lim _{x \to 1} \frac {1-x} {1-x} = 1
 $$
 Limits can also approach infinity, to use “infinity” in certain situations.
 $$
 \lim _{x \to \infty} \frac 1 x = 0
 $$
--- a/Matrices.md
+++ b/Matrices.md
@@ -0,0 +1,45 @@
 #Math #Algebra
 A matrix is an $n$ by $m$ set of values. A $4 \times 3$can be notated by:
 $$
 \begin{bmatrix}a_1 & a_2 & a_3 \cr b_1 & b_2 & b_3 \cr c_1 & c_2 & c_3 \cr d_1 & d_2 & d_3 \end{bmatrix}
 $$
 To get a value from matrix $a$ in row $r$ and column $c$, use:
 $$
 a_{r, c}
 $$
 # Addition
 With two matrices of the same order, add corresponding elements:
 $$
 \begin{bmatrix} a_1 & b_1 \cr c_1 & d_1 \end{bmatrix} + \begin{bmatrix} a_2 & b_2 \cr c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 + a_2 & b_1 + b_2 \cr c_1 + c_2 & d_1 + d_2 \end{bmatrix}
 $$
 # Subtraction
 With two matrices of the same order, subtract corresponding elements:
 $$
 \begin{bmatrix} a_1 & b_1 \cr c_1 & d_1 \end{bmatrix} - \begin{bmatrix} a_2 & b_2 \cr c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 - a_2 & b_1 - b_2 \cr c_1 - c_2 & d_1 - d_2 \end{bmatrix}
 $$
 # Scalar Multiplication
 When multiplying a matrix by a scalar, multiply each element by said scalar:
 $$
 s\begin{bmatrix} a & b \cr c & d \end{bmatrix} = \begin{bmatrix} sa & sb \cr sc & sd \end{bmatrix}
 $$
 # Matrix Multiplication
 Let $a$ be an $i$ by $j$ matrix and $b$ be a $m$ by $n$ matrix. If $j = m$, $ab$ is defined.
 $$
 ab_{c, d} = \sum _{k = 1}^{j} a_{c, k}b_{k, d}
 $$
--- a/Coefficients.md
+++ b/Coefficients.md
@@ -0,0 +1,44 @@
 #Math #Probability
 # Observing Pascal’s Triangle
 | n/k | 0 | 1 | 2 | 3 | 4 | 5 |
 | --- | --- | --- | --- | --- | --- | --- |
 | 0 | 1 |  |  |  |  |  |
 | 1 | 1 | 1 |  |  |  |  |
 | 2 | 1 | 2 | 1 |  |  |  |
 | 3 | 1 | 3 | 3 | 1 |  |  |
 | 4 | 1 | 4 | 6 | 4 | 1 |  |
 | 5 | 1 | 5 | 10 | 10 | 5 | 1 |
 As you can see, Pascal’s Triangle generates:
 $$
 {n \choose k}
 $$
 or
 $$
 \frac{n!}{k!(n-k)!}
 $$
 But how does this work?
 First, we can manually prove the top two rows of Pascal’s Triangle by plugging the values into the binomial coefficient formula.
 Afterward, we can use the property of Pascal’s Triangle, taking Pascal’s Triangle as a function P:
 $$
 P(n + 1, k) = P(n, k) + P(n, k-1)
 $$
 By proving this property in the binomial coefficient formula, we can deduce that Pascal’s Triangle generates binomial coefficients
 # The Proof
 $$
 \frac{n!}{k!(n-k)!}+\frac{n!}{(k-1)!(n-k+1)!}\\=\frac{n!(n-k+1)}{k!(n-k)!(n-k+1)}+\frac{n!k}{(k-1)!(n-k+1)!k}\\=\frac{n!(n-k+1)}{k!(n-k+1)!}+\frac{n!k}{k!(n-k+1)!}\\=\frac{n!(n+k+1-k)}{k!(n-k+1)!}\\=\frac{n!(n+1)}{k!(n-k+1)!}\\=\frac{(n+1)!}{k!(n-k+1)!}
 $$
 From this, we have proven that we can generate binomial coefficients using Pascal’s Triangle
--- a/Distribution.md
+++ b/Distribution.md
@@ -0,0 +1,97 @@
 #Math #Probability
 # The Poisson Distribution
 The Poisson Distribution describes a distribution where an event occurs for an interval of time, where there is an a mean number of times the event happens in the same interval of time.
 # Binomial Distribution to Poisson Distribution
 Binomial Distribution 
 $$
 \frac {n!} {k!(n-k)!} p^k (1-p)^{n-k}
 $$
 Binomial Distribution with infinite trials
 $$
 \lim _{n\to\infty} \frac {n!} {k!(n-k)!} p^k (1-p)^{n-k}
 $$
 Let a be np, the average success rate in n intervals. This gives us the Poisson Distribution in another form.
 $$
 \lim _{n\to\infty} \frac {n!} {k!(n-k)!} (\frac {a} {n})^k (1-\frac {a} {n})^{n-k}
 $$
 $$
 \lim _{n\to\infty} \frac {n!} {k!(n-k)!} (\frac {a^k} {n^k}) (1-\frac {a} {n})^n(1-\frac {a} {n})^{-k}
 $$
 $$
 \frac {a^k} {k!} \lim _{n\to\infty} \frac {n!} {n^a(n-k)!} (1-\frac {a} {n})^n(1-\frac {a} {n})^{-k}
 $$
 Now we have three limits to evaluate
 # Evaluating the Limits
 ## First Limit
 $$
 \lim _{n \to\infty} \frac {n!} {n^k(n-k)!}
 $$
 $$
 \lim _{n \to\infty} \frac {n(n-1)(n-2)...(n-k)(n-k-1)...(1)} {n^k(n-k)(n-k-1)...(1)}
 $$
 $$
 \lim _{n\to\infty} \frac {n(n-1)...(n-k+1)} {n^k}
 $$
 $$
 \lim _{n\to\infty} (\frac {n} {n})(\frac {n-1} {n})...(\frac {n-k+1} {n})
 $$
 As n goes to infinity, all the terms tend to 1. Therefore, the limit tends to 1.
 ## Second Limit
 $$
 \lim _{n\to\infty} (1-\frac {a} {n})^n
 $$
 Let u be -n/x (note this tends to negative infinity)
 $$
 \lim _{n\to\infty}(1+\frac {1} {u})^{-au}
 $$
 Use definition of e
 $$
 e^{-a}
 $$
 ## Third Limit
 $$
 \lim _{n\to\infty}(1-\frac{a} {n})^{-k}
 $$
 a/n tends to 0
 $$
 1^k
 $$
 Therefore this limit tends to 1.
 # Putting it together
 $$
 \frac {e^{-a}a^{k}}{k!}
 $$
 is the formula for the probability of an event happening k times in an interval of time, where a is the mean number of times of the event happening in the interval of time the event ran in. This is the formula for the Poisson Distribution.
--- a/Numbers.md
+++ b/Numbers.md
@@ -0,0 +1,37 @@
 #Math #NT #Probability
 # Problem
 Calculate:
 $$
 P(x, y \in \mathbb{N}: gcd(x, y) = 1)
 $$
 # Solution
 Each number has a $\frac 1 p$ chance to be divisible by prime $p$, so the probability that two numbers do not share prime factor $p$ is
 $$
 1 - p^{-2}
 $$
 Therefore, the probability two numbers are coprime is:
 $$
 \prod _{p \in \mathbb{P}} 1 - p^{-2}
 $$
 Since $1 - x = (\frac 1 {1 - x})^{-1} = (\sum _{n = 0}^{\infty} x^n)^{-1}$, we can express the above as: 
 $$
 (\prod _{p \in \mathbb{P}} \sum _{n = 0}^{\infty} p^{-2n})^{-1}
 $$
 We can choose any $n$ for $p^{2n}$ for each prime $p$, so by the Unique Factorization Theorem (any natural number can be prime factored one and only one way), we get:
 $$
 (\sum _{n = 0} n^{-2})^{-1} \\
 = (\frac {\pi^2} 6)^{-1} \\
 = \frac 6 {\pi^2}
 $$
--- a/Theorem.md
+++ b/Theorem.md
@@ -0,0 +1,49 @@
 #Math #Algebra
 # Proof
 Let polynomial
 $$
 P(x) = \sum _{i = 0}^n c_i x^i
 $$
 where $c_i \in \mathbb{Z}$ (all values of $c$ are integers).
 Now let $P(\frac p q) = 0$, where $p$ and $q$ are coprime integers (let a fraction $\frac p q$ be in simplest form and be a root of $P$).
 $$
 \sum _{i = 0}^n c_i (\frac p q)^i = 0
 $$
 Multiplying by $q^n$:
 $$
 \sum _{i = 0}^n c_i p^n q^{n - i} = 0
 $$
 Now subtract $c_0 q^n$ from both sides and factor $p$ out to get:
 $$
 p\sum _{i = 1}^n c_i p^{n - 1} q^{n - i} = -c_0 q^n
 $$
 Now $p$ must divide $-c_0q^n$. However, we know $p$ cannot divide $q^n$ (since $\frac p q$ is in simplest form / $p$ and $q$ are coprime), so $p$ must divide $c_0$.
 Doing the same thing as above but with the $a_n$ term and $q$:
 $$
 q\sum _{i = 0}^{n - 1} c_i p^n q^{n - i - 1} = -c_n p^n
 $$
 By the above logic, $q$ must divide $c_n$.
 ## Conclusion
 For all rational roots in simplest form ($\frac p q$ where $p$ and $q$ are coprime integers), $p$ must be a factor of the last coefficient while $q$ must be a factor of the first coefficient.
 ## Notes
 For the curious, coprime integers $p$ and $q$ mean that $\gcd(p, q) = 1$.
 If future me or someone else is wondering about the excess definitions, this was made for a friend.
--- a/Gram.md
+++ b/Gram.md
@@ -0,0 +1,263 @@
 #Coding
 # Abstract
 > \"No one is going to implement word2vec from scratch\" or sm 🤓
 > commentary like that idk
 This notebook provides a brief explanation and implementation of a Skip
 Gram model, one of the two types of models word2vec refers to.
 # Intuition
 ## Problem
 Given a corpus C, map all tokens to a vector such that words with
 similar semantics (similar probability of appearing within a context)
 close to each other.
 ## Idea
 **The idea of a skip gram model proceeds from these two observations:**
 1.  Similar words should appear in similar contexts
 2.  Similar words should appear together
 The intuition behind the Skip Gram model is to map a target token to all
 the words appearing in a context window around it.
 > The MIMS major **Quentin** is a saber fencer.
 In this case the target token **Quentin** should map to all the other
 tokens in the window. As such the target token should have similar
 mappings to words such as MIMS, saber, and fencer.
 Skip Gram treats each vector representation of a token as a set of
 weights, and uses a linear-linear-softmax model to optimize them. At the
 end, the first set of weights are a list of $n$ vectors that map a token
 to a prediction of output tokens - solving the initial mapping problem.
 # Code & Detailed Implementation
 ## Preproccessing
 Tokenize all the words, and build training pairs using words in a
 context window:
 ``` python
 import numpy as np
 class Preproccess:
  @staticmethod
  def tokenize(text):
    """Returns a list of lowercase tokens"""
    return "".join([t for t in text.lower().replace("\n", " ") if t.isalpha() or t == " "]).split(" ")
  @staticmethod
  def build_vocab(tokens, min_count=1):
    """Create an id to word and a word to id mapping"""
    token_counts = {}
    for token in tokens:
      if token not in token_counts:
        token_counts[token] = 0
      token_counts[token] += 1
    sorted_tokens = sorted(token_counts.items(), key=lambda t:t[1], reverse=True) # Sort tokens by frequency
    vocab = {}
    id_to_word = [0] * len(sorted_tokens)
    for i in range(len(sorted_tokens)):
      token, count = sorted_tokens[i]
      if count < min_count:
        break
      id_to_word[i] = token
      vocab[token] = i
    return vocab, id_to_word
  @staticmethod
  def build_pairs(tokens, vocab, window_size=5):
    """Generate training pairs"""
    pairs = []
    token_len = len(tokens)
    for center in range(token_len):
      tokens_before = tokens[max(0, center-window_size):center]
      tokens_after = tokens[(center + 1):min(token_len, center + 1 + window_size)]
      context_tokens = tokens_before + tokens_after
      for context in context_tokens:
        if tokens[center] in vocab and context in vocab:
          pairs.append((tokens[center], context))
    return pairs
  @staticmethod
  def build_neg_sample(word, context, vocab, samples=5):
    """Build negative samples"""
    neg_samples = []
    neg_words = [vocab[w] for w in vocab if (w != word) and (w != context)]
    neg_samples = np.random.choice(neg_words, size=samples, replace=False)
 ```
 ## Build Model
 -   3 layers used as an optimizer:
    -   $L_1 = XW_1$
    -   $S = W_2 L_1$
    -   $P = \text{softmax(S)}$
 -   Loss function: $-\sum \log(P_{\text{context}} | P_{\text{target}})$
 -   Negative sampling used to speed up training, compare and update
    against \~20 negative vocab terms instead of updating all weights
 ``` python
 class Word2Vec:
  def __init__(self, vocab_size, embedding_dim=100):
    """Initialize weights"""
    self.vocab_size = vocab_size
    self.embedding_dim = embedding_dim
    self.W1 = np.random.normal(0, 0.1, (vocab_size, embedding_dim)) # First layer - word encoding
    self.w2 = np.random.normal(0, 0.1, (embedding_dim, vocab_size)) # Second layer - context encoding
  def sigmoid(self, x):
    """Numerically stable sigmoid"""
    x = np.clip(x, -500, 500)
    return 1 / (1 + np.exp(-x))
  def cross_entropy_loss(self, probability):
    """Cross entropy loss function"""
    return -np.log(probability + 1e-10) # 1e-10 added for numerical stability
  def neg_sample_train(self, center_token, context_token, negative_tokens, learning_rate=0.01):
    """Negative sampling training for a single training pair"""
    total_loss = 0
    total_W1_gradient = 0
    # Forward prop for positive case
    center_embedding = self.W1[center_token, :] # L₁ = XW₁
    context_vector = self.W2[:, context_token]
    score = np.dot(center_embedding, context_vector) #L₂ = L₁W₂, but only for the context token vector
    sigmoid_score = self.sigmoid(score)
    loss = self.cross_entropy_loss(sigmoid_score)
    total_loss += loss
    # Backward prop for positive case
    score_gradient = 1 - sigmoid_score # ∂L/∂S
    W2_gradient = center_embedding * score_gradient # ∂L/∂W₂ = ∂L/∂S * ∂S/∂W₂ = XW₁ * ∂L/∂S
    W1_gradient = context_vector * score_gradient # ∂L/∂W₁ = ∂L/∂S * ∂S/∂W₁ = W₂ * ∂L/∂S
    # Update weights
    self.W2[:, context_token] -= learning_rate * W2_gradient
    total_W1_gradient += learning_rate * W1_gradient
    for neg_token in negative_tokens:
      # Forward prop for negative case
      neg_vector = self.W2[:, neg_token]
      neg_score = np.dot(center_embedding, neg_vector)
      neg_sigmoid_score = self.sigmoid(neg_score)
      neg_loss = -np.log(1 - neg_sigmoid_score)
      total_loss += neg_loss
      # Backward prop for negative case
      neg_score_gradient = sigmoid_score
      neg_W2_gradient = center_embedding * neg_score_gradient
      neg_W1_gradient = context_vector * neg_score_gradient
      # Update weights
      self.W2[:, neg_token] -= learning_rate * neg_W2_gradient
      total_W1_gradient -= learning_rate * neg_W1_gradient
    # Update W1
    total_W1_gradient = np.clip(total_W1_gradient, -1, 1)
    self.W1[center_token, :] += total_W1_gradient
    return total_loss
  def find_similar(self, token):
    """Use cos similarity to find similar words"""
    word_vec = self.W1[token, :]
    similar = []
    for i in range(self.vocab_size):
      if i != token:
        other_vec = self.W1[i, :]
        norm_word = np.linalg.norm(word_vec)
        norm_other = np.linalg.norm(other_vec)
        if norm_word > 0 and norm_other > 0:
            cosine_sim = np.dot(word_vec, other_vec) / (norm_word * norm_other)
        else:
            cosine_sim = 0
        similar.append((cosine_sim, i))
    similar.sort(key=lambda x:x[0], reverse=True)
    return [word[1] for word in similar]
 ```
 ## Run Model
 ``` python
 def epoch(model, pairs, vocab):
  loss = 0
  pair_len = len(pairs)
  done = 0
  for word, context in pairs:
    neg_samples = Preproccess.build_neg_sample(word, context, vocab, samples=5)
    loss += model.neg_sample_train(word, context, neg_samples)
    done += 1
    if ((100 * done) / pair_len) // 1 > ((100 * done - 100) / pair_len) // 1:
      print("_", end="")
  return loss
 with open("corpus.txt") as corpus_file:
  CORPUS = corpus_file.read()
 EPOCHS = 100
 tokens = Preproccess.tokenize(CORPUS)
 vocab, id_to_token = Preproccess.build_vocab(tokens, min_count=3)
 print("~VOCAB LEN~:", len(vocab))
 pairs = Preproccess.build_pairs(tokens, vocab, window_size=5)
 model = Word2Vec(len(id_to_token), embedding_dim=100)
 print("~STARTING TRAINING~")
 for i in range(EPOCHS):
  print(f"Epoch {i}: {epoch(model, pairs, vocab) / len(id_to_token)}")
 print("~FINISHED TRAINING~")
 ```
 # Notes (Pedantic Commentary Defense :P)
 1.  I use the term \"similar\" and \"related\" in reference to words,
    which implies some sort of meaning is encoded. However in practice
    word2vec is just looking for words with high probabilities of being
    in similar contexts, which happens to correlate to \"meaning\"
    decently well.
 2.  CBOW shares a very similar intuition to Skip Gram, the only
    difference is which way you map a target token to context tokens.
 3.  Of course, a good deal of mathamatical pain can be shaved off this
    excercise by using Tensorflow (here is a
    [Colab](https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/word2vec.ipynb#scrollTo=iLKwNAczHsKg)
    from Tensorflow that does it) - but this is done from scratch so the
    inner workings of word2vec can be more easily seen.
 4.  Results are (very) subpar with a small corpus size, and this isn\'t
    optimized for GPUs sooo\... at least the error goes down!
 # Sources
 1.  <https://en.wikipedia.org/wiki/Word2vec>
 2.  <https://arxiv.org/abs/1301.3781> (worth a read - not a long paper
    and def on the less math intensive side of things)
 3.  <https://ahammadnafiz.github.io/posts/Word2Vec-From-Scratch-A-Complete-Mathematical-and-Implementation-Guide/#implementation>
--- a/Proof.md
+++ b/Proof.md
@@ -0,0 +1,61 @@
 #Math #Calculus 
 Represent function using power series:
 $$
 f(x) = \sum _{n=0}^{\infty} c_n (x-a)^n
 $$
 Find $c_0$
 $$
 c_0=f(a)
 $$
 Take derivative of function
 $$
 \frac d {dx} f(x) = \sum _{n=0}^\infty c_{n+1} (n+1)(x-a)^n
 $$
 Find $c_1$
 $$
 c_1=\frac {d} {dx} f(a)
 $$
 Take second derivative of function
 $$
 \frac {d^2} {d^2x} f(x) = \sum _{n=0}^\infty c_{n+2} (n+1)(n+2)(x-a)^n
 $$
 Find $c_2$
 $$
 c_2=\frac {\frac {d^2} {d^2x} f(a)} {2}
 $$
 Take third derivative of function
 $$
 \frac {d^3} {d^3x} f(x) = \sum _{n=0}^\infty c_{n+3} (n+1)(n+2)(n+3)(x-a)^n
 $$
 Find $c_3$
 $$
 c_3=\frac {\frac {d^3} {d^3x} f(a)} {6}
 $$
 Create general formula for $n$th element of $c$
 $$
 c_n = \frac {\frac {d^n} {d^nx}f(a)} {n!}
 $$
 Create general formula for function as polynomial
 $$
 f(x)=\sum _{n=0}^\infty \frac {\frac {d^n} {d^nx}f(a)} {n!} (x-a)^n
 $$
--- a/Problem.md
+++ b/Problem.md
@@ -0,0 +1,45 @@
 #Math #NT 
 # Basel Problem Solution
 ## Base Sum
 $$
 \frac {\pi^2} 4 \\
 = \frac {\pi^2} 4 \csc^2 (\frac \pi 2) \\
 = \frac {\pi^2} {4^2} (\csc^2 (\frac \pi 4) + \csc^2 (\frac \pi 4 + \pi))
 $$
 Do this operation $a$ times, with the above equation being the second time:
 $$
 = \frac {\pi^2} {4^{a + 1}}\sum _{n = 1}^{2^{a}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\
 = \sum _{n = 1}^{2^{a}} \frac {\pi^2} {4^{a + 1}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\
 = \sum _{n = 1}^{2^{a}} \frac {\pi ^2}{4^{a + 1}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\
 = \sum _{n = 1}^{2^{a}} (\frac {2^{a + 1}} \pi \sin(\frac \pi {2^{a+1}} + \frac \pi {2^a}))^{-2} \\
 $$
 As $a$ approaches $\infty$:
 $$
 = 2\sum _{n=1}^{\infty} (2n - 1)^{-2}
 $$
 Therefore:
 $$
 \sum _{n = 1}^{\infty} (2n - 1)^{-2} = \frac {\pi^2} {8}
 $$
 ## Manipulating this Sum
 $$
 \sum _{n = 1}^{\infty} (2n)^{-2} = \frac 1 4 \sum _{n = 1}^{\infty} n^{-2} \\\sum _{n = 1}^{\infty} (2n - 1)^{-2} = \frac 3 4 \sum _{n = 1}^{\infty} n^{-2} \\\frac {\pi ^2} 8 = \frac 3 4 \sum _{n = 1}^{\infty} n^{-2} \\
 \frac {\pi ^2} 6 = \sum _{n = 1}^{\infty} n^{-2} \\
 $$
 Therefore
 $$
 \frac {\pi ^2} 6 = \sum _{n = 1}^{\infty} n^{-2} \\
 $$
--- a/Function.md
+++ b/Function.md
@@ -0,0 +1,81 @@
 #Math #NT
 # Definition
 Euler’s totient function returns the number of integers from $1 \leq k \leq n$ for a positive integer $n$. It is notated as:
 $$
 \phi(n)
 $$
 # $\phi(n)$ for Prime Powers
 Through prime factorization, for $p^k$, the only positive integers below $p^k$ where $\gcd(p^k, n) > 1$ is where $n = mp$, for $1 \leq m \leq p^{k - 1}$. Therefore:
 $$
 \phi(p^k) \\ = p^k - p^{k - 1} \\ = p^{k - 1}(p - 1) \\ p^k(1 - \frac 1 p)
 $$
 # Multiplicative Property of $\phi$
 If $m$ and $n$ are coprime:
 $$
 \phi(m)\phi(n) = \phi(mn)
 $$
 Proof: Let set $A$ be all numbers coprime to $m$ below $m$, and set $B$ be all numbers coprime to $n$ below $n$.
 $$
 |A| = \phi(m) \\ |B| = \phi(n)
 $$
 Let set $D$ be all possible ordered pairs using elements from $A$ and $B$, where the element of $A$ is first. If for each element $(k_1, k_2)$in set $D$ we return a value $\theta$ where:
 $$
 \theta \equiv k_1 \mod m \\ \theta \equiv k_2 \mod n
 $$
 CRT ensures $\theta$ is unique to $\mod ab$ and exists. Given the fact $\gcd(x + yz, z) = \gcd(x, z)$, we can say that:
 $$
 \gcd(\theta, m) = \gcd(k_1, m) = 1 \\ \gcd(\theta, n) = \gcd(k_2, n) = 1 \\ \gcd(\theta, mn) = 1
 $$
 If we put all $\theta$ in set $C$, we can see that set $C$ has all the elements fitting the above conditions. Looking at the length of $C$:
 $$
 |C| = \phi(mn) \\
 |C| = |A| * |B| = \phi(m)\phi(n) \\
 \phi(mn) = \phi(m)\phi(n)
 $$
 # Value of $\phi$ for any Number
 Let a positive integer $n$ prime factorization be:
 $$
 n = p_1^{k_1}p_2^{k_2}p_3^{k_3}...p_l^{k_l}
 $$
 Now using the properties above:
 $$
 \phi(n) \\
 = \prod _{i = 1}^l \phi(p_i^{k_i}) \\
 = \prod _{i = 1}^l p_i^{k_i}(1 - \frac 1 {p_i})
 $$
 Multiplying all $p_i^{k_i}$ gives $n$, so factor that out:
 $$
 = n \prod _{i = 1}^l (1 - \frac 1 {p_i})
 $$
 (you can derive most textbook definitions from this formula easily)
 Final formula:
 $$
 \phi(n) = n \prod _{i = 1}^l (1 - \frac 1 {p_i})
 $$
--- a/Vectors.md
+++ b/Vectors.md
@@ -0,0 +1,97 @@
 #Math #Algebra
 # Defining Vectors
 Vectors are a list of components. They can be expressed in ij notation by:
 $$
 \mathbf a = 2i + 3j -4k
 $$
 or
 $$
 \vec a = 2i + 3j -4k
 $$
 You can also express a vector as a matrix:
 $$
 \vec a = \begin {bmatrix}
 2 \\
 3 \\
 -4 \\
 \end {bmatrix} \\
 \vec a = \begin {bmatrix} 2 & 3 & -4 \end {bmatrix}
 $$
 # Adding and Subtracting Vectors
 To add vectors, add their corresponding components. For example:
 $$
 \vec a = 4i + 7j - 9k \\
 \vec b = 3i - 5j - 8k \\
 \vec a + \vec b = 7i + 2j - 17k
 $$
 Subtracting vectors works in a similar fashion:
 $$
 \vec a - \vec b = i + 12j - k
 $$
 Here are the formulas:
 $$
 \vec a + \vec b = a_i+b_i \\
 \vec a - \vec b = a_i-b_i
 $$
 Here’s a graph visualizing the addition and subtraction of vectors: [https://www.desmos.com/calculator/gavjpwhnuo](https://www.desmos.com/calculator/gavjpwhnuo)
 # Multiplication by Scalar
 To multiply a vector by a scalar (regular number), just multiply all the components by that number:
 $$
 m\vec a = ma_i
 $$
 # Multiplication by Another Vector: Dot Product
 There are two different ways to multiply a vector by another vector. The first way is a dot product. Here is the algebraic definition, where n is the length of the two vectors:
 $$
 \vec a \cdot \vec b = \sum _{i = 0}^n a_ib_i
 $$
 With two two dimensional vectors, we can also provide a geometric definition, where $||\vec a||$ is the magnitude of $\vec a$, and $\theta$ is the angle between the vectors:
 $$
 \vec a \cdot \vec b = ||a|| \: ||b|| \: \cos \theta
 $$
 As you can see, the dot product returns a single value, or scalar. From the geometric definition, you can see that it describes how much one vector “aligns” to the other.
 ## Proving that the Definitions are the Same
 Let $\vec a$ have a magnitude of $m$ and an angle of $p$, let $\vec b$ have a magnitude of $n$ and an angle of $q$.
 $$
 \vec a \cdot \vec b \\
 = m\cos p \: n \cos q + m\sin p \: n \sin q \\
 = mn(\cos p \: cos q + \sin p \: \sin q) \\
 = mn\cos(p-q)
 $$
 Using the algebraic definition, we can get the geometric definition as shown above.
 # Cross Product
 Let $n$ be a unit vector perpendicular to $\vec a$ and $\vec b$, and $\theta$ be the angle between them. The cross product is:
 $$
 \vec a \times \vec b = ||\vec a|| \: ||\vec b|| \: \sin \theta \: n
 $$
--- a/Formulas.md
+++ b/Formulas.md
@@ -0,0 +1,29 @@
 #Math #Algebra
 Let polynomial $a$ be:
 $$
 a = c_n \prod _{i = 0}^n (x - r_i)
 $$
 where $r_i$ is a root of $a$, and $c_n$ is the leading coefficient of $a$.
 We can also represent $a$ as:
 $$
 a = \sum _{i = 0}^n c_i x^i
 $$
 By expanding the first definition of $a$, we can define $c_i$ by:
 $$
 c_{n-i} = (-1)^i c_n\sum _{sym}^i r
 $$
 This is through the nature of multiplying binomials, with the coefficient resulting in the sum of all possible combinations of $i$ roots multiplied together, or the $i$th elementary symmetric sum of set $r$. We also have to multiply by the negative sign, resulting in $(-1)^i$
 We can refactor to state:
 $$
 \sum _{sym}^i r = (-1)^i \frac {c_{n-i}} {c_n}
 $$
--- a/index.md
+++ b/index.md
@@ -0,0 +1,3 @@
 # Welcome! 👋
 I am [@craisin](https://craisin.tech), a CS and math enthusiast! These are a set of notes related to anything adjacent to math/CS that I am interested in. Enjoy!
--- a/2.md
+++ b/2.md
@@ -0,0 +1,40 @@
 #Math #Trig
 $$
 \sin x = 2
 $$
 $$
 \frac {e^{ix} - e^{-ix}} {2i} = 2
 $$
 $$
 e^{ix} - e^{-ix} = 4i \\
 $$
 $$
 e^{ix} - (e^{ix})^{-1} = 4i
 $$
 Let $u = e^{ix}$:
 $$
 u - u^{-1} = 4i
 $$
 $$
 u^2 - 1 = 4iu \\ $$$$
 u^2 - 4iu - 1 = 0
 $$$$
 u^2 - 4iu - 4 = -3 $$$$
 (u - 2i)^2 = -3 \\ $$$$
 u - 2i = \pm \sqrt {-3} $$$$
 u = 2i \pm \sqrt {-3} \\ $$$$
 u = i(2 \pm \sqrt 3)
 $$
 Substitute back into $u$, for $n \in \mathbb{Z}$:
 $$
 e^{ix} = i(2 \pm \sqrt 3) \\ $$$$
 ix = \ln (i(2 \pm \sqrt 3)) \\ $$$$
 ix = \ln i + 2\pi n+ \ln(2 \pm \sqrt 3) \\ $$$$
 ix = \frac {i\pi} 2 + 2\pi n + \ln(2 \pm \sqrt 3) $$$$
 x = \frac \pi 2 - i\ln(2 \pm \sqrt 3) + 2\pi n
 $$
		`@@ -0,0 +1,3 @@`
							`# Welcome! 👋`

							`I am [@craisin](https://craisin.tech), a CS and math enthusiast! These are a set of notes related to anything adjacent to math/CS that I am interested in. Enjoy!`