commit 9ce7679e9c6c57cc4849b0b956a10f16a0a3e303 Author: craisin Date: Thu Dec 25 21:13:43 2025 -0800 Initial commit diff --git a/A Fun Harmonic Series Problem.md b/A Fun Harmonic Series Problem.md new file mode 100644 index 0000000..dcfb8e7 --- /dev/null +++ b/A Fun Harmonic Series Problem.md @@ -0,0 +1,29 @@ +#Math #Calculus + +# The Problem + +If $f(x) = \sum _{k \geq 0} a_k x^k$, and this series converges for $x = x_0$, prove: + +$$ +\sum _{k \geq 0} a_k x_0^k H_k = \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy +$$ + +where $H_k$ is defined to be the partial sums of the harmonic series ($H_0 = 0$, $H_k = \sum _{i = 1}^k \frac 1 i$ for $k \geq 1$). + +(from *The Art of Computer Programming*) + +# Solution + +Although this problem might seem intimidating with a power series involving the harmonic numbers on the LHS and a summation function inside an integral on the RHS, it is fairly trivial to bring out the summation and express the RHS as a power series: + +$$ +\int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy \\ += \int _0^1 \frac {\sum _{k \geq 0} a_k x_0^k - \sum _{k \geq 0} a_k x_0^k y^k} {1 - y} dy \\ += \sum _{k \geq 1} a_k x_0^k \int _0^1 \frac {1 - y^k} {1 - y} dy +$$ + +The integral factor on the last step is now merely Euler’s integral representation for the harmonic numbers, which is easily proven by the simple fact that $\frac {1 - y^k} {1 - y} = \sum _{i = 0}^{k - 1} y^i$. Therefore: + +$$ +\sum _{k \geq 0} a_k x_0^k H_k = \int _0^1 \frac {f(x_0) - f(x_0 y)} {1 - y} dy +$$ \ No newline at end of file diff --git a/Adjacency Matrix to the nth Power.md b/Adjacency Matrix to the nth Power.md new file mode 100644 index 0000000..0a39a5f --- /dev/null +++ b/Adjacency Matrix to the nth Power.md @@ -0,0 +1,34 @@ +#Math #Algebra + +# Multiplying an Adjacency Matrix by Itself + +Consider adjacency matrix $M$ with $n$ nodes. We know that if there is a path between $a$ and $b$, $M_{a, b} = 1$, if not $M_{a, b} = 0$. Now consider all possible nodes $c$ to choose a path $a \to c \to b$. We see that for $c$ to be a possible node $M_{a, c} = 1$ and $M_{c, b} = 1$. + +Now observe the definition of multiplication of matrices. We know: + +$$ +MM_{a, b} = \sum _{c = 1}^n M_{a, c} M_{i, c} +$$ + +For each path $c$, each term summed only equals one when $M_{a, c}$ and $M_{c, b}$ are both $1$, or when path $a \to c \to b$ exists. Therefore, the number of paths of length $2$ between $a$ and $b$ is $MM_{a, b}$. + +# Taking an Adjacency Matrix to $l$ + +Now consider a matrix $N$ where there are $N_{a, b}$ paths of length $l$ between $a$ and $b$. Have $M$ be the adjacency matrix for the graph $N$ models. Now say we want all paths of length $l + 1$ that look like $a \to … \to c \to b$. We know the number of paths $a \to … \to c$ of length $l$ is $N_{a, c}$. Similarly, we know there are $M_{c, b}$ paths that satisfy $c \to b$. Therefore, the number of paths that satisfy $a \to … \to c \to b$ is: + +$$ +N_{a, c}M_{c, b} +$$ + +Choosing any node $c$ gives the number of paths of length $l + 1$ satisfying $a \to b$: + +$$ +\sum _{c = 1}^n N_{a, c}M_{c, b} \\ +NM_{a, b} +$$ + +Since $M_{a, b}$ obviously is the amount of paths of length $1$ between $a$ and $b$, by induction, the amount of paths of length $l$ between $a$ and $b$ is: + +$$ +M^l_{a, b} +$$ \ No newline at end of file diff --git a/Another Way to Define e.md b/Another Way to Define e.md new file mode 100644 index 0000000..968357b --- /dev/null +++ b/Another Way to Define e.md @@ -0,0 +1,33 @@ +#Math #Calculus + +Limit to solve: + +$$ +\lim _{x\to 0} \frac {e^x-1} {x} +$$ + +Let $t = e^x - 1$ + +$$ +\lim _{t\to 0} \frac {t} {\ln(t+1)} +$$ + +$$ +\lim _{t\to 0} \frac {1} {\frac {1} {t} \ln(1+t)} +$$ + +Inverse power log rule + +$$ +\lim _{t\to 0} \frac {1} {\ln(1+t)^{\frac {1} {t}}} +$$ + +Definition of e + +$$ +\frac {1} {\ln e} +$$ + +$$ +1 +$$ \ No newline at end of file diff --git a/Basic Category Theory.md b/Basic Category Theory.md new file mode 100644 index 0000000..2691732 --- /dev/null +++ b/Basic Category Theory.md @@ -0,0 +1,17 @@ +#Math #CT + +# Categories + +Categories contain: + +- A collection of **objects** +- A collection of **morphisms** (also called **arrows**) connecting objects denoted by $f: S \to T$, where $f$ is the **morphism**, $S$ is the **source**, and $T$ is the **target** + - Note: $f: A \to B$ and $g: A \to B$ **DOES NOT IMPLY** $f = g$ + - Formally this can also be expressed as a relation between a collection of objects and a collection of morphisms + - Morphisms have a notion of **composition**, that being if $f: A \to B$, $g: B \to C$, then $g \circ f: A \to C$ + +There are three rules for categories: + +- **Associativity:** For morphisms $a$, $b$, and $c$, $(a \circ b) \circ c = a \circ (b \circ c)$ +- **Closed composition:** If for morphisms $a$ and $b$, $a \circ b$ exists, then there must be morphism $c = a \circ b$ +- **Identity morphisms:** For every object $A$ in a category, there must be an identity morphism $\text{id}_A: A \to A$ \ No newline at end of file diff --git a/Bezout’s Identity.md b/Bezout’s Identity.md new file mode 100644 index 0000000..f1a94ea --- /dev/null +++ b/Bezout’s Identity.md @@ -0,0 +1,51 @@ +#Math #NT + +# Statement + +Let $x \in \mathbb{Z}$, $y \in \mathbb{Z}$, $x \neq 0$, $y \neq 0$, and $g = \gcd(x, y)$. Bezout's Identity states that $\alpha \in \mathbb{Z}$ and $\beta \in \mathbb{Z}$ exists when: + +$$ +\alpha x + \beta y = g +$$ + +Furthermore, $g$ is the least positive integer able to be expressed in this form. + +# Proof + +## First Statement + +Let $x = gx_1$ and $y = gy_1$, and notice $\gcd(x_1, y_1) = 1$ and $\operatorname{lcm} (x_1, y_1) = x_1 y_1$. + +Since this is true, the smallest integer $\alpha$ for $\alpha x_1 \equiv 0 \mod y$ is $a = y_1$. + +For all integers $0 \leq a, b < y_1$, $ax_1 \not\equiv bx_1 \mod y$. (If not, we get $|b - a| > y_1$, which is contradictory). Thus, by pigeonhole principle, there exists $\alpha$ such that $\alpha x_1 \equiv 1 \mod y_1$. + +Therefore, there is an $\alpha$ such that $ax_1 - 1 \equiv 0 \mod y_1$, and by extension, there exists an integer $\beta$ such that: + +$$ +\alpha x_1 - 1 = -\beta y_1 \\ +\alpha x_1 + \beta y_1 = 1 +$$ + +By multiplying by $g$: + +$$ +\alpha x + \beta y = g +$$ + +## Second Statement + +To prove $g$ is minimum, let’s consider another positive integer $g\prime$: + +$$ +\alpha\prime x + \beta\prime y = g\prime +$$ + +Since all values are a multiple of $g$: + +$$ +0 \equiv \alpha \prime x + \beta \prime x \mod g \\ +0 \equiv g\prime \mod g +$$ + +Since $g$ and $g\prime$ are positive integers, $g\prime \geq g$. \ No newline at end of file diff --git a/Binomial Coefficients and N Choose K.md b/Binomial Coefficients and N Choose K.md new file mode 100644 index 0000000..b374f4d --- /dev/null +++ b/Binomial Coefficients and N Choose K.md @@ -0,0 +1,21 @@ +#Math #Probability + +# Problem + +Why does n choose k, or $\frac{n!}{k!(n-k)!}$ generate the coefficient for $x^ky^{n-k}$ in $(x+y)^n$? + +# Explanation + +Let’s see what happens when expanding $(x+y)^4$: + +$$ +(x+y)^4\\ +=(x+y)(x+y)(x+y)(x+y)\\ +=xxxx+\\ +yxxx+xyxx+xxyx+xxxy+\\ +yyxx+yxyx+yxxy+xyyx+xyxy+xxyy+\\ +xyyy+yxyy+yyxy+yyyx+\\ +yyyy +$$ + +When expanding, notice the number of terms with k of x (and likewise 4-k of y) is the number of combinations of 4 choose k, as you choose k slots to put k x’s in out of 4 slots. Therefore, $(x+y)^n={n \choose 0}x^0y^n+{n \choose 1}x^1y^{n-1}...+{n \choose n-1}x^{n-1}y^1+{n \choose n}x^ny^0$ \ No newline at end of file diff --git a/Bohr Mollerup Theorem.md b/Bohr Mollerup Theorem.md new file mode 100644 index 0000000..0e5f060 --- /dev/null +++ b/Bohr Mollerup Theorem.md @@ -0,0 +1,85 @@ +#Math #Calculus + +# Intro + +The Gamma function $\Gamma(x)$ is a way to extend the factorial function, where $\Gamma(n + 1) = n!$. This gives us two conditions defining $\Gamma (x)$: + +$$ +\Gamma(1) = 1 \\ +\Gamma(x + 1) = x \Gamma (x) +$$ + +However, by adding a third condition stating $\Gamma (x)$ is logarithimically convex ($\log \circ \space \Gamma$ is convex), we can prove that $\Gamma (x)$ is unique! + +# Proof + +Let $G$ be a function with the properties above. Since $G(x + 1) = xG(x)$, we can define any $G(x + n)$, where $n \in \mathbb{N}$ as: + +$$ +G(x + n) = G(x)\prod _{i = 0}^{n - 1}(x + i) +$$ + +This means that it is sufficient to define $G(x)$ on $x \in (0, 1]$ for a unique $G(x)$. + +Let $S(x_1, x_2)$ be defined as $\frac {\log (\Gamma(x_2)) - \log (\Gamma(x_1))} {x_2 - x_1}$. Observe that by log-concavity, for all $0 \lt x \leq 1$ and $n \in \mathbb{N}$: + +$$ +S(n - 1, n) \leq S(n, n +x) \leq S(n, n + 1) \\ +\log (G(n))) - \log (G(n-1)) \leq \frac {\log (G(n + x)) - \log (G(n))} {x} \leq \log (G(n + 1)) - \log (G(n)) \\ +\log ((n - 1)!) - \log ((n-2)!) \leq \frac {\log (G(x + n)) - \log ((n - 1)!)} {x} \leq \log (n!) - \log ((n - 1)!) \\ +\log(n - 1) \leq \frac {\log (\frac{G(x + n)}{(n - 1)!})} {x} \leq \log(n) \\ +\log((n - 1)^x) \leq \log (\frac {G(x + n)}{(n - 1)!})\leq \log(n^x) \\ +$$ + +Raising to the $n$: + +$$ +(n - 1)^x \leq \frac {G(x + n)}{(n - 1)!}\leq n^x \\ +(n - 1)^x(n - 1)! \leq G(x + n)\leq n^x(n - 1)! \\ +$$ + +Using the above work to expand $G(x + n)$: + +$$ +\frac{(n - 1)^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \leq G(x) \leq \frac{n^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \\ +\frac{(n - 1)^x(n - 1)!} {\prod _{i = 0}^{n - 1}(x + i)} \leq G(x) \leq \frac{n^xn!} {\prod _{i = 0}^n(x + i)}(\frac {n + x} n) \\ +$$ + +Of course, taking the limit as $n$ goes to infinity on both sides by brute force will produce the value of $G(x)$, however I will present a more elegant solution. Notice we can take the inequalities separately, resulting in: + +$$ +\frac{(n_1 - 1)^x(n_1 - 1)!} {\prod _{i = 0}^{n_1 - 1}(x + i)} \leq G(x)\\ +G(x) \leq \frac{n_2^xn_2!} {\prod _{i = 0}^{n_2}(x + i)}(\frac {n_2 + x} {n_2}) \\ +$$ + +This shows that no matter $n_1$ and $n_2$, the equality still holds! + +Now we can sub in $n_1 = n + 1$, $n_2 = n$, to get: + +$$ +\frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} (\frac {n + x} n)\\ +$$ + +Taking a limit to infinity on both sides: + +$$ +\lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} (\frac {n + x} n)\\ +\lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \leq G(x) \leq \lim _{n \to \infty} \frac{n^xn!} {\prod _{i = 0}^{n}(x + i)} \\ +$$ + + + +# Exercise to the Reader + +Prove that the definition: + +$$ +\Gamma(n) = \int _{0}^{\infty} x^{n - 1} e^{-x} dx +$$ + +is valid. \ No newline at end of file diff --git a/Central Limit Theorem.md b/Central Limit Theorem.md new file mode 100644 index 0000000..6731ee5 --- /dev/null +++ b/Central Limit Theorem.md @@ -0,0 +1,245 @@ +#Math #Probability + +# The Central Limit Theorem + +Let us sum $n$ instances from an i.i.d (independent and identical distribution) with defined first and second moments (mean and variance). Center the distribution on $0$ and scale it by its standard deviation. As $n$ goes to infinity, the distribution of that variable goes toward + +$$ +\frac 1 {\sqrt 2 \pi} e^{- \frac {x^2} 2} +$$ + +or the standard normal distribution + +## Mathematical Definition + +Let Y be the mean of a sequence of n i.i.ds + +$$ +Y = \frac 1 n \sum _{i=1}^{n} X_i +$$ + +Let $\mu=E(X_i)$, the expected value of $X$, and $\sigma = \sqrt {Var(X)}$, the standard deviation of $X$ + +Calculate the expected value of Y, $E(Y)$, and the variance, $Var(Y)$: + +$$ +E(Y) \\ += E(\frac 1 n \sum _{i=1}^{n} X_i) \\ += \frac 1 n \sum _{i=1}^{n} E(X_i) \\ += \frac 1 n \sum _{i=1}^{n} \mu \\ += \frac {n \mu} {n} \\ += \mu +$$ + +$$ +Var(Y) \\ += Var(\frac 1 n \sum _{i=1}^n X_i) \\ += \frac 1 {n^2} \sum _{i=1}^n Var(X_i) \\ += \frac \sigma n +$$ + +Let $Y^*$ be centered by $E(Y)$ and scaled by it's standard deviation, $\sqrt {Var(Y)}$ + +$$ +Y^* \\ = \frac {Y - E(Y)} {\sqrt {Var(Y)}} \\ = \frac {Y - \mu} {\sqrt {\frac {\sigma^2} {n}}} \\ = \frac {\sqrt n (Y - \mu)} \sigma \\= \frac {\sqrt n (\frac 1 n \sum _{i=0}^n X_i - \mu)} \sigma \\ = \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma +$$ + +The CLT states + +$$ +Y^* \overset d \to N(0, 1) +$$ + +Or $Y^*$ converges in distribution to the standard normal distribution with a mean of 0 and a standard deviation of 1 + +# Proof + +## A Change in Variables + +Let $S$ be the sum of our sequence of n i.i.ds + +$$ +S = \sum _{i=1}^{n} X_i +$$ + +Let’s calculate $E(S)$ and $Var(S)$ + +$$ +E(S) \\ +=E(\sum _{i=1}^n X_i) \\ +=\sum _{i=1}^n E(X_i) \\ +=\sum _{i=1}^n \mu \\ += n\mu +$$ + +$$ +Var(S) \\ +=Var(\sum _{i=1}^n X_i) \\ +=\sum _{i=1}^n Var(X_i) \\ +=\sum _{i=1}^n \sigma^2 \\ +=n\sigma^2 +$$ + +Center $S$ by $E(S)$ and scale it by $\sqrt {Var(S)}$ for $S^*$ + +$$ +S^* \\ += \frac {S - E(S)} {\sqrt {Var(S)}} \\ += \frac {S - n\mu} {\sqrt {n\sigma^2}} \\ += \frac {S - n\mu} {\sqrt {n}\sigma} \\ += \frac {\frac 1 {\sqrt n} (S-n\mu)} { \sigma} \\ += \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma +$$ + +From the above, $Y^*=S^*$. In the proof, we will use $S^*$, as it is easier to manipulate. + +## MGFs + +An MGF is a function where + +$$ +M_V(t) = E(e^{tV}) +$$ + +where $V$ is a random variable + +(reminder for me to do another notion on this) + +### Properties of MGFs + +Property 1: + +If + +$$ +C=A+B +$$ + +Then + +$$ +M_C(t) \\ += E(e^{tC}) \\ += E(e^{ta + tb}) \\ += E(e^{ta}e^{tb}) \\ += E(e^{ta}) + E(e^{tb}) \\ += M_A(t) + M_B(t) +$$ + +Property 2: + +$$ +M_V^{(r)}(0) = E(V^r) +$$ + +The $r$ derivative of $M_V$ gives the $r$ moment of $V$ + +Property 3: + +Let $A$ be a sequence of random variables with MGFs of $A_1$, $A_2$… $A_n$ + +If + +$$ +M_{A_n}(t) \to M_B(t) +$$ + +Then + +$$ +A \overset d \to B +$$ + +### MGF of a Normal Distribution + +Let a random variable derived from a standard normal distribution be Z + +$$ +Z \sim N(0, 1) +$$ + +$$ +M_z(t) \\ += E(e^{xt}) \\ += \int _{-\infty}^{\infty} e^{xt} \frac 1 {\sqrt {2\pi}} e^{-\frac {x^2} 2} dx \\ += \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{tx-\frac 1 2 x^2} dx \\ += \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx )} dx \\ += \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx + t ) + \frac 1 2 t^2 } dx \\ += \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 + \frac 1 2 t^2 } dx \\ += e ^ {\frac 1 2 t^2} \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 } dx \\ += e ^ {\frac {t^2} 2} +$$ + +## The Argument + +To prove the CLT, we need to prove that $S^*$ converges to $N(0, 1)$ as $n \to \infty$. Our approach will be to prove that the MGF of $N(0, 1)$ converges to the distribution of $S^*$ as $n \to \infty$. + +$$ +S^* \\ += \frac {S - E(S)} {\sqrt {Var(S)}} \\ += \frac {S - n\mu} {\sqrt {n \sigma^2}} \\ += \frac {\sum _{i=1}^{n} X_i - n\mu} {\sqrt n \sigma} \\ += \sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma} +$$ + +Start manipulating MGF of $S^*$: + +$$ +M_{S^*}(t) \\ += E(e^{tS^*}) \\ += E(e^{t(\sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma})}) \\ += E(e^{t(\frac {(X-\mu)} {\sqrt n \sigma})})^n \\ += (M_{\frac {(X-\mu)} {\sqrt n \sigma}}(t))^n \\ +=(M_{(X - \mu)} (\frac t {\sqrt n \sigma })^n +$$ + +Expand out Taylor series for $(M_{(x-\mu)}(\frac t {\sqrt n \sigma}))^n$ (note $O(t^3)$ means order $t^3$ and above, and tends to zero as $n$ goes to $\infty$ ): + +$$ +M_{(X-\mu)}(\frac t {\sqrt n \sigma}) \\ += (M_{(X-\mu)}(0)) + (\frac {M_{(X-\mu)}\prime(0)} {1!})(\frac t {\sqrt n \sigma}) + (\frac {M_{(X-\mu)}\prime\prime(0)} {2!})(\frac t {\sqrt n \sigma})^2 + (\frac {M_{(X-\mu)}\prime\prime\prime(0)} {1!})(\frac t {\sqrt n \sigma})^3 + ...\\ += 1 + (\frac {t} {\sqrt n \sigma})E(X-\mu) + (\frac {t^2} {2 n \sigma^2})E((X-\mu)^2) + (\frac {t^3} {6n ^ {\frac 3 2} \sigma ^ 3})E((X-\mu)^3) + ... \\ += 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2} + {2n \sigma^2})E((X-\mu)^2) + O(t^3) \\ +\approx 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2} + {2n \sigma^2})E((X-\mu)^2) +$$ + +Remember $E(X-\mu) = 0$ and $E((X-\mu)^2) = \sigma^2$ + +$$ += 1 + (\frac t {\sqrt n \sigma})(0) + (\frac {t^2} {2n \sigma^2})(\sigma ^ 2) \\ += 1 + \frac {t^2} {2n} +$$ + +Solve for $M_{S^*} (t)$: + +$$ +M_{S^*}(t) = (1 + \frac {t^2} {2n})^n +$$ + +Solve $M_{S^*} (t)$ for $\lim _{n \to \infty}$: + +$$ +\lim _{n \to \infty} M_{S^*}(t) \\ += \lim _{n \to \infty} (1 + \frac {t^2} {2n})^n \\ += \lim _{n \to \infty} (1 + \frac 1 {(\frac {2n} {t^2})})^{\frac {t^2} 2 (\frac {2n} {t^2})} \\ += e^{\frac {t^2} 2} +$$ + +Since $\lim _{n \to \infty} M_{S^*} (t) \to M_Z(t)$, $\lim _{n \to \infty}S^* \overset d \to N(0, 1)$. Therefore: + +$$ +Y^* \overset d \to N(0, 1) +$$ + +proving the Central Limit Theorem + +## Summary of the Argument + +$$ +Y^* = S^* \\ +\lim _{n \to \infty} M_{S^*}(t) \to M_Z (t) \\ +\lim _{n \to \infty} S^* \to N(0, 1) \\ +\lim _{n \to \infty} Y^* \to N(0, 1) \\ +$$ \ No newline at end of file diff --git a/Chicken McNugget Theorem.md b/Chicken McNugget Theorem.md new file mode 100644 index 0000000..2d534f6 --- /dev/null +++ b/Chicken McNugget Theorem.md @@ -0,0 +1,98 @@ +#Math #NT + +# Theorem + +Say $m$ and $n$ are two coprime positive integers. The Chicken McNugget Theorem states the highest number that can't be expressed by $am + bn$, $a \in \mathbb{Z}$, $b \in \mathbb{Z}$, and $a, b \geq 0$ is: + +$$ +mn - m - n +$$ + +# Proof + +Let a purchasable number relative to $m$ and $n$ be able to be represented by + +$$ +am + bn +$$ + +Where $a$ and $b$ are two non negative integers + +## Lemma 1 + +Let $A_N \subset \mathbb{Z} \times \mathbb{Z}$ and $A_N$ be all $(x, y)$ such that for $m \in \mathbb{Z}$ and $n \in \mathbb{Z}$, $xm + yn. = N$. For $(x, y) \in A_N$: + +$$ +A_N = \{(x + kn, y - km): k \in \mathbb{Z}\} +$$ + +### Proof + +By Bezout's Lemma, there exists integers $x\prime$ and $y\prime$ such that $x\prime m + y\prime n = 1$. Then, $Nx\prime m + Ny\prime n = N$. Thus, $A_N$ is nonempty. + +Each addition of $kn$ to $x$ adds $kmn$ to $N$, and each subtraction of $km$ from $y$ subtracts $kmn$ from $N$, so all these values are in $A_N$. + +To prove these are the only solutions, let $(x_1, y_1) \in A_N$ and $(x_2, y_2) \in A_N$. This means: + +$$ +mx_1 + ny_1 = mx_2 + ny_2 \\ +m(x_1 - x_2) = n(y_2 - y_1) \\ +$$ + +Since $m$ and $n$ are coprime, and $m$ divides $n(y_2 - y_1)$: + +$$ +y_2 - y_1 \equiv 0 \mod m \\ +y_2 \equiv y_1 \mod m +$$ + +Similarly: + +$$ +x_2 \equiv x_1 \mod n +$$ + +Let $k_1, k_2 \in \mathbb{Z}$ such that: + +$$ +x_2 - x_1 = k_1n \\ +y_1 - y_2 = k_2m \\ + +$$ + +By multiplying by $m$ and $n$ respectively, we get $k_1 = -k_2$, proving the lemma. + +## Lemma 2 + + For $N \in \mathbb{Z}$, there is a unique $(a_N, b_N) \in \mathbb{Z} \times \{0, 1, 2… m - 1\}$ such that $a_Nm + b_Nn = N$. + +## Proof + +There is only one possible $k$ for + +$N$ is purchasable if and only if $a_N \geq 0$. + +## Lemma 3 + +$$ +0 \leq y - km \leq m - 1 +$$ + +### Proof + +If $a_N \geq 0$, we can pick $(a_N, b_N)$ so $N$ is purchasable. If $a_N < 0$, $a_N + kn < 0$ when $k \leq 0$, or $b_N + km < 0$ for $k > 0$. + +## Putting it Together + +Therefore, the set of non purchasable integers is: + +$$ +\{xm + yn : x<0, 0 \leq y \leq m -1\} +$$ + +To maximize this set, we chose $x = -1$ and $y = m - 1$: + +$$ +-m + (m - 1)n \\ +mn - m - n +$$ \ No newline at end of file diff --git a/Chinese Remainder Theorem (+ Cancellation Law Proof).md b/Chinese Remainder Theorem (+ Cancellation Law Proof).md new file mode 100644 index 0000000..cfc8975 --- /dev/null +++ b/Chinese Remainder Theorem (+ Cancellation Law Proof).md @@ -0,0 +1,91 @@ +#Math #NT + +For the proof, let p and q be coprime + +# Rearrangement + +$$ +x=a \: mod \: p\\ +x=b \: mod \: q +$$ + +Subtract a from both equations + +$$ +x-a=0 \: mod \: p\\ +x-a=b-a \: mod \: q +$$ + +# The Underlying Problem + +Let m be an integer from 0 to q-1 (inclusive), and r be an integer from 0 to q-1 (inclusive) + +$$ +mp=r \: mod \: q +$$ + +There are q possible values of m, and q possible values of r. + +Since p and q are coprime, the remainders cannot repeat until after m > q-1 + +Therefore, there is a unique value of m to produce any remainder r in the above equation. + +# Putting it all Together + +If we look at the last equation in *Rearrangement*, we see it matches the equation in *The Underlying Problem*, where b-a corresponds to r, and x-a corresponds to mp. + +So, we can see there is one unique solution for x in the interval of 0 to pq-1 (inclusive) + +We can extend this by saying there will be a solution pq larger than another solution, making the solutions expressible via mod. + +# The Underlying Problem (but rigour) + + + +Again start with + +$$ +mp \equiv r \mod q +$$ + +Suppose $m_1$ and $m_2$ are two $m$ that give the same $r$. Then $pm_1 \equiv pm_2 \mod q$. By the cancellation law $m_1 \equiv m_2 \mod q$, since $\gcd(p, q) = 1$. + +### Cancellation Law Proof (Brownie Points) + +$$ +pm_1 - pm_2 = p(m_1 - m_2) +$$ + +Know $q$ divides $pm_1 - pm_2$ since they are both the same mod $q$, therefore $q$ divides the RHS. By Euclid’s Lemma $q$ must divide $m_1 - m_2$, meaning $m_1 \equiv m_2 \mod q$. + +# Final Theorem + +Let p and q be coprime. If: + +then: + +$$ +x \: rem \: pq +$$ + +exists and is unique. + +# Notes + +$$ +x \: = 0 \: mod \: y\\ +x \: rem \: y = 0 +$$ + +both mean x is divisible by y. + +$$ +x=a \: mod \: p\\ +x=b \: mod \: q +$$ \ No newline at end of file diff --git a/Choosing Stuff.md b/Choosing Stuff.md new file mode 100644 index 0000000..fd7244f --- /dev/null +++ b/Choosing Stuff.md @@ -0,0 +1,33 @@ +#Math #Probability + +# Problem + +Given $m$ items of one type and $n$ items of another type, what is the probability of choosing $l$ items of type one and $o$ items of type two if you pick $l + o$ items? + +# Solution + +Total ways to choose the items not considering types: + + + +$$ +{m + n} \choose {l + o} +$$ + +Total ways to choose $l$ items of type one: + +$$ +m \choose l +$$ + +Total ways to choose $o$ items of type two: + +$$ +n \choose o +$$ + +Multiply the ways to choose both items to get the number of ways to choose $l$ items of type one and $o$ items of type two, divide by total number of combinations: + +$$ +\frac {{m \choose l} {n \choose o}} {{m + n} \choose {l + o}} +$$ \ No newline at end of file diff --git a/Conditional Probability and Bayes Theorem.md b/Conditional Probability and Bayes Theorem.md new file mode 100644 index 0000000..294383f --- /dev/null +++ b/Conditional Probability and Bayes Theorem.md @@ -0,0 +1,40 @@ +#Math #Probability + +# Conditional Probability + +Conditional probability, or the probability of $A$ given $B$ is: + +$$ +P(A|B) +$$ + +Let's start with the probability of $P(A \cup B)$. We know that when $P(A | B)$, $B$ is given to be true. Therefore, we must divide the probably of $P(A \cup B)$ by $P(B)$. + +$$ +P(A | B) = \frac {P(A \cup B)} {P(B)} +$$ + +This defines $P(A | B)$ for events. When $P(A | B) = P(A)$, $A$ and $B$ are independent. + +# Bayes’ Theorem + +Let's start with the definitions of conditional probability: + +$$ +P(A | B) = \frac {P(A \cup B)} {P(B)} \\ +P(B | A) = \frac {P(A \cup B)} {P(A)} +$$ + +Rearrange the second equation to define $P(A \cup B)$: + +$$ +P(A \cup B) = P(A) P(B | A) +$$ + +Now substitute that equation into the first equation: + +$$ +P(A | B) = \frac {P(A) P(B | A)} {P(B)} +$$ + +The above equation is Bayes' Theorem for events. \ No newline at end of file diff --git a/Convolutions.md b/Convolutions.md new file mode 100644 index 0000000..d2b7ba7 --- /dev/null +++ b/Convolutions.md @@ -0,0 +1,45 @@ +#Math #Probability + +# Discrete Case + +Let’s create a function expressing the probability two functions results have a sum of $s$. + +$$ +\sum _{x = -\infty}^{\infty} f(x)g(s-x) +$$ + +Let's unpack this formula. The inside of the sum finds the probability of a single case where $f$ and $g$ adds to $s$. By using a summation, we can run through every possible case that this happens. + +This operation is called a discrete convolution. Convolutions are notated as + +$$ +[f * g](s) +$$ + +# Continuous Case + +Extending the previous equation over to a continuous function, we can attain a definition like this: + +$$ +[f * g](s) = \int _{-\infty}^{\infty} f(x)g(s-x) dx +$$ + +Naturally, we'd expect this to be a probably density function of $f + g$. This is from the same effect as the discrete convolution, except we talk about this for an infinitely small point and probability densities. + +# Summary + +Convolutions return the probability or probability density of adding two functions together (this depends on the type of function you use). + +They are defined by: + +Discrete: + +$$ +[f * g](s) = \sum _{x = -\infty}^{\infty} f(x)g(s-x) +$$ + +Continuous: + +$$ +[f * g](s) = \int _{-\infty}^{\infty} f(x)g(s-x) dx +$$ \ No newline at end of file diff --git a/Dearrangement.md b/Dearrangement.md new file mode 100644 index 0000000..9af74c0 --- /dev/null +++ b/Dearrangement.md @@ -0,0 +1,45 @@ +#Math #Probability + +# Problem + +How many ways are there to arrange a set of $n$ distinct elements such that no element is in it's original position? + +# Solution + +The way to arrange the set without consideration for position is: + +$$ +n! +$$ + +Now accounting for the values that have one element in it's original position: + +$$ +n! - {n\choose 1}(n - 1)! +$$ + +However, we subtracted arrangements with two elements in their original position twice: + +$$ +n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! +$$ + +Now, we readded arrangements with three elements in their original position: + +$$ +n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! - {n \choose 3}(n - 3)! +$$ + +This pattern continues by PIE, giving us: + +$$ +n! - {n\choose 1}(n - 1)! + {n \choose 2}(n - 2)! - {n \choose 3}(n - 3)! ... + (-1)^n{n \choose n}(n - n)! +$$ + +Since ${n \choose k}(n - k)! = \frac {n!} {k!}$, we can rewrite as: + +$$ +\frac {n!} {0!} - \frac {n!} {1!} + \frac {n!} {2!} ... + (-1)^n\frac {n!} {n!} \\ += \sum _{k = 0}^n (-1)^k \frac {n!} {k!} \\ += n! \sum _{k = 0}^n \frac {(-1)^k} {k!} +$$ \ No newline at end of file diff --git a/Derivatives.md b/Derivatives.md new file mode 100644 index 0000000..e702cdd --- /dev/null +++ b/Derivatives.md @@ -0,0 +1,91 @@ +#Calculus #Math +# Intuition & Definition +How can instant rate of change be defined at a point? +Call our function of choice $y$: +Slope of $y$ between $x_1$ and $x_2$: +$$ +\frac {y_1 - y_2} {x_1 - x_2} +$$ +$$ += \frac {y(x_1) - y(x_2)} {x_1 - x_2} +$$ +However, $x_1 \neq x_2$ due to division by $0$. +## Definitions +Avoid division by $0$ via using a limit such that $x_1 \to x_2$: +$$ +\frac {dy} {dx} = \lim_{x_1 \to x_2} \frac {y(x_1) - y(x_2)} {x_1 - x_2} +$$ +Changing variables: +$$ +\frac {dy} {dx} = \lim_{a \to x} \frac {y(a) - y(x)} {a - x} +$$ +Define $a = \lim _{\Delta x \to 0}(x + \Delta x)$: +$$ +\frac {dy} {dx} = \lim_{\Delta x \to 0} \frac {y(x + \Delta x) - y(x)} {\Delta x} +$$ +# Derivative Rules +## Constant Rule +When $y = a$ and $a$ is constant: +$$ +\frac {dy} {dx} +$$ +$$ += \lim_{\Delta x \to 0} \frac {a - a} {\Delta x} +$$ +$$ += \lim_{\Delta x \to 0} \frac {0} {\Delta x} +$$ +$$ += 0 +$$ +## Sum and Difference Rule +$$ +\frac {df} {dx} + \frac {dg} {dx} +$$ +$$ += \lim_{\Delta x \to 0} \frac {f(x + \Delta x) - f(x)} {\Delta x} + \frac {g(x + \Delta x) - g(x)} {\Delta x} +$$ +$$ += \lim_{\Delta x \to 0} \frac {[f(x + \Delta x) + g(x + \Delta x)] - [f(x) + g(x)]}{\Delta x} +$$ +$$ += \frac d {dx} (f + g) +$$ +## Power Rule + +> **Note:** This proof of power rule only extends to $n \in \mathbb{N}$. Power rule can be extended to $n \in \mathbb{Z}$ through the use of the derivative of $\ln$, but this article does not cover such a proof as of now. + +$$ +\frac {d} {dx} x^n +$$ +$$ += \lim_{\Delta x \to 0} \frac {(x + \Delta x)^n - x^n} {\Delta x} +$$ + +Use a binomial expansion: +$$ += \lim_{\Delta x \to 0} \frac {\sum_{i = 0}^n {n \choose i}x^i{\Delta x}^{n - i} - x^n} {\Delta x} +$$ +Take out last term in sum: +$$ += \lim_{\Delta x \to 0} \frac {x^n + \sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i} - x^n} {\Delta x} +$$ +$$ += \lim_{\Delta x \to 0} \frac {\sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i}} {\Delta x} +$$ +$$ += \lim_{\Delta x \to 0} \sum_{i = 0}^{n - 1} {n \choose i}x^i{\Delta x}^{n - i - 1} +$$ +Bring limit inside sum: +$$ += \sum_{i = 0}^{n - 1} \left[{n \choose i}x^i \lim_{\Delta x \to 0} {\Delta x}^{n - i - 1}\right] +$$ +For $i < n - 1$, $\lim {\Delta x \to 0} \Delta x^{n - i - 1} = 0$, so only the case where $i = n - 1$ matters: +$$ += {n \choose {n - 1}} x^{n - 1} +$$ +$$ += nx^{n - 1} +$$ +>**Therefore:** +>$$ \frac d {dx} x^n = nx^{n - 1} $$ diff --git a/Deriving the Gamma Function.md b/Deriving the Gamma Function.md new file mode 100644 index 0000000..8db2a05 --- /dev/null +++ b/Deriving the Gamma Function.md @@ -0,0 +1,34 @@ +#Math #Calculus + +# Extending the Factorial Function + +We know $n!$ has a restricted domain of $n \in \mathbb{N}$, but we want to extend this function to $n \in \mathbb{R}$. To do this, we define two basic properties for the gamma function: + +$$ +n\Gamma(n) = \Gamma(n + 1) \\ +\Gamma(n + 1) = n!, \space n\in \mathbb{N} +$$ + +# Derivation + +We know repeated differentiation can generate a factorial function, so we start by differentiating: + +$$ +\int _{0}^{\infty} e^{-ax} dx = \frac 1 a +$$ + +**Lebeniz Integral Rule** allows us to differentiate inside the integral, so by repeated differentiation with respect to $a$ and cancelling out the negative sign we get: + +$$ +\int _{0}^{\infty} xe^{-ax} dx = \frac 1 {a^2} \\ +\int _{0}^{\infty} x^2e^{-ax} dx = \frac 2 {a^3} \\ +\int _{0}^{\infty} x^ne^{-ax} dx = \frac {n!} {a^{n + 1}} \\ +$$ + +Plugging $a = 1$ we get: + +$$ +\Gamma(n) = \int _{0}^{\infty} x^{n - 1} e^{-x} dx +$$ + +Plugging the definition into the above properties should affirm that this defines the gamma function. \ No newline at end of file diff --git a/Epsilon Delta Definition of a Limit.md b/Epsilon Delta Definition of a Limit.md new file mode 100644 index 0000000..f07172e --- /dev/null +++ b/Epsilon Delta Definition of a Limit.md @@ -0,0 +1,55 @@ +#Math #Calculus + +# Definition + +When + +$$ +\lim _{x \to c} f(x) = L +$$ + +For $\epsilon > 0$ and $\delta > 0$, there is a value $\delta$ for every value of $\epsilon$ such that + +$$ +0 < |x - c| < \delta\\ +0 < |f(x) - L| < \epsilon\\ +$$ + +# Proving a Limit + +Let’s prove: + +$$ +\lim _{h \to 0} \frac {(x + h)^2 - x^2} h = 2x +$$ + +Let: + +$$ +0 < |\frac {(x + h)^2 - x^2} h - 2x| < \epsilon \\ +0 < |\frac {(x + h)^2 - x^2} h - 2x| < \epsilon \\0 < |\frac {(x^2 + 2xh + h^2 - x^2)} h - 2x| < \epsilon \\0 < |\frac {2xh + h^2} h - 2x| < \epsilon \\ +$$ + +Remember $\epsilon > 0$: + +$$ +0 < |2x + h - 2x| < \epsilon \\ +0 < |h| < \epsilon +$$ + +We have to prove for every $\epsilon$: + +$$ +0 < |h - 0| < \delta \\ +0 < |h| < \delta +$$ + +These two inequalities are the same, so they are easily satisfied just by setting: + +$$ +\delta = \epsilon +$$ + +# Graphical Explanation + +[https://www.desmos.com/calculator/tucchymbrq](https://www.desmos.com/calculator/tucchymbrq) \ No newline at end of file diff --git a/Euler's Formula and Complex Trig Functions.md b/Euler's Formula and Complex Trig Functions.md new file mode 100644 index 0000000..8a6b28e --- /dev/null +++ b/Euler's Formula and Complex Trig Functions.md @@ -0,0 +1,57 @@ +#Math #Trig + +# Euler's Formula + +Euler's formula states: + +$$ +e^{i \theta} = i\sin \theta + \cos \theta +$$ + +## Proof + +$$ +\frac d {d \theta} \frac {i \sin \theta + \cos \theta} {e^{i \theta}} \\ += e^{-i\theta}(i \sin \theta + \cos \theta) \\ += (e^{-i\theta})(i \sin \theta + \cos \theta)\prime + (e^{-i\theta}) \prime (i \sin \theta + \cos \theta) \\ += (e^{-i\theta})(i \cos \theta - \sin \theta) - i(e^{-i\theta})(i \sin \theta + \cos \theta) \\ += (e^{-i\theta})(i \cos \theta - \sin \theta) - (e^{-i\theta})(i \cos \theta - \sin \theta) \\ += 0 +$$ + +Therefore $\frac {i \sin \theta + \cos \theta} {e^{i \theta}}$ is a constant. Plug in $\theta = 0$, to get $\frac {i \sin \theta + \cos \theta} {e^{i \theta}} = 1$. Multiply both sides by $e^{i\theta}$ to get + +$$ +e^{i \theta} = i\sin \theta + \cos \theta +$$ + +## Euler's Identity + +Plug $\theta = π$ into Euler's Formula + +$$ +e^{i \pi} = i\sin \pi + \cos \pi \\ +e^{i \pi} = -1 +$$ + +# Trig Functions Redefined + +Sine: + +$$ +e^{i \theta} = i\sin \theta + \cos \theta \\ +-e^{-i \theta} = -i\sin -\theta - \cos -\theta \\ +-e^{-i \theta} = i\sin \theta - \cos \theta \\ +e^{i\theta} - e^{-i\theta} = 2i \sin \theta \\ +\sin \theta = \frac {e^{i\theta} - e^{-i\theta}} {2i} +$$ + +Cosine: + +$$ +e^{i \theta} = i\sin \theta + \cos \theta \\ +e^{-i \theta} = i\sin -\theta + \cos -\theta \\ +e^{-i \theta} = -i\sin \theta + \cos \theta \\ +e^{i\theta} + e^{-i \theta} = 2\cos \theta \\ +\cos \theta = \frac {e^{i\theta} + e^{-i \theta}} 2 +$$ \ No newline at end of file diff --git a/Fermat’s Little Theorem.md b/Fermat’s Little Theorem.md new file mode 100644 index 0000000..cde9adb --- /dev/null +++ b/Fermat’s Little Theorem.md @@ -0,0 +1,22 @@ +#Math #NT + +# Fermet’s Little Theorem + +If $p$ is a prime integer: + +$$ +a^{p - 1} \equiv 1 \mod p \\ +a^p \equiv a \mod p +$$ +$$ +a^{p - 1} \equiv 1 \mod p \\ +a^p \equiv a \mod p +$$ + +# Proof + +Let $p$ be a prime integer. Say a necklace has $p$ beads and $a$ possible colors per bread. Except for a necklace with only one color, each combination of necklace colors has $p$ permutations. Therefore: + +$$ +a^p \equiv a \mod p +$$ \ No newline at end of file diff --git a/Fermet Euler Theorem.md b/Fermet Euler Theorem.md new file mode 100644 index 0000000..2c11c0b --- /dev/null +++ b/Fermet Euler Theorem.md @@ -0,0 +1,31 @@ +#Math #NT + +# Theorem + +Let $a$ and $m$ be coprime numbers. + +$$ +a^{\phi(m)} \equiv 1 \mod m +$$ + +This is a generalization of Fermet's Little Theorem, as $m$ is a prime number in Fermet’s Little Theorem. + +# Proof + +Let: + +$$ +A = \{p_1, p_2, p_3,... p_{\phi(m)} \} \mod m \\ +B = \{ap_1, ap_2, ap_3,...ap_{\phi(m)}\} \mod m +$$ + +Where $p_x$ is the $x$th number relatively prime to $m$. + +Since $a$ and $p_x$ are coprime to $m$, $ap_x$ is coprime to $m$. Since each $p_x$ is unique, $ap_x$ is unique, which makes set $B$ the same as set $A$. + +Since all terms are coprime to $m$: + +$$ +a^{\phi(m)} \prod _{k = 0}^{\phi(m)} p_k \equiv \prod _{k = 0}^{\phi(m)} p_k \mod m \\ +a^{\phi(m)} \equiv 1 \mod m +$$ \ No newline at end of file diff --git a/Fourier Series Proof.md b/Fourier Series Proof.md new file mode 100644 index 0000000..2e67694 --- /dev/null +++ b/Fourier Series Proof.md @@ -0,0 +1,184 @@ +#Math #Calculus + +# Starting the Proof Off + +The Taylor Series uses $x^n$ as building blocks for a function: +[[Taylor Series Proof]] + + +However, we can use $\sin (nx)$ and $\cos(nx)$ as well. This will be our starting point to derive the Fourier Series: + +$$ +f(x) = a_0\cos (0x) + b_0\sin(0x) + a_1\cos (x) + b_1\sin(x) + a_2\cos (2x) + b_2\sin(2x)... \\ +f(x) = a_0 + \sum _{n = 1}^\infty (a_n\sin(nx) + b_n\cos(nx)) +$$ + +This will be the basic equation we will use. + +# Finding $a_0$ + +Let’s integrate the equation on both sides, and bound by $[-\pi, \pi]$: + +$$ +\int _{-\pi}^\pi f(x) dx = \int _{-\pi}^\pi a_0 dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx) dx +$$ + +The first integral evaluates to $2\pi a_0$. Since the third integral is an odd function, it evaluates to $0$. The second integral can be expressed as: + +$$ +a_n \int _{-\pi}^\pi \cos(nx) dx \\ += \frac {a_n} n (\sin(n\pi) - \sin(-n\pi)) \\ += 0 +$$ + + So now we have: + +$$ +2\pi a_0 = \int _{-\pi}^\pi f(x) dx \\ +a_0 = \frac 1 {2\pi} \int _{-\pi}^\pi f(x) dx +$$ + +# Finding $a_n$ + +Let’s multiply the entire equation by $\cos(mx)$, where $m \in \mathbb{Z}^+$ ($m$ is a positive integer): + +$$ +f(x)\cos(mx) = a_0\cos(mx) + \sum _{n = 1}^\infty a_n\cos(nx)\cos(mx) + b_n\sin(nx)\cos(mx) +$$ + +Now integrate on both sides, and bound by $[-\pi, \pi]$: + +$$ +\int _{-\pi}^\pi f(x)\cos(mx) dx = \int _{-\pi}^\pi a_0\cos(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx)\cos(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx)\cos(mx) dx +$$ + +We have three integrals on the right hand side to evaluate: + +## First Integral + +$$ +\int _{-\pi}^\pi a_0 \cos(mx) dx \\ += \frac{a_0} m \sin(m\pi)- \frac{a_0} m \sin(-m\pi) +$$ + +Since $m\pi$ is always a multiple of $\pi$: + +$$ +=0 +$$ + +## Second Integral + +$$ +\int _{-\pi}^\pi a_n\cos(nx)\cos(mx) dx +$$ + +Using $\cos$ addition formula: + +$$ += \frac {a_0} 2 \int _{-\pi}^\pi \cos(nx + mx) + \cos(nx - mx) dx \\ += \frac {a_0} 2 (\int _{-\pi}^\pi \cos(nx + mx) dx + \int _{-\pi}^\pi \cos(nx - mx) dx) \\ += [\frac {a_0} 2 (\frac {\sin(nx + mx)} {n + m} + \frac {\sin(nx - mx)} {n - m})]_{-\pi}^{\pi} \\ +$$ + +Here you will notice that this integral doesn’t work for $n = m$. We’ll circle back to that later. For now, this is two odd functions being added together. Since the bounds are the negatives of one another: + +$$ += 0 +$$ + +Now, circling back to the extra case, where $n = m$: + +$$ +a_m\int _{-\pi}^\pi \cos^2(nx)dx \\ += a_m\int _{-\pi}^\pi \frac {1 + \cos(2x)} 2 dx \\ += a_m[\frac x 2 + \frac {\sin 2x} 4 ]_{-\pi}^\pi \\ += a_m[(\frac {\pi} 2 + \frac {\sin 2\pi} 4 ) - (\frac {-\pi} 2 + \frac {\sin -2\pi} 4 )] \\ += a_m\pi +$$ + +So, the second term in the right hand side evaluates to $a_m\pi$. + +## Third Integral + +$$ +\int _{-\pi}^{\pi} \sin(nx)\cos(mx) dx \\ += \frac 1 2 \int _{-\pi}^{\pi} \sin(nx + mx) dx + \frac 1 2 \int _{-\pi}^\pi \sin(nx - mx) dx \\ += [-\frac 1 2(\frac {\cos(nx + mx)} {n + m} + \frac {\cos(nx - mx)} {n - m})]_{-\pi}^\pi \\ +$$ + +Remember that $\cos x = -cos(x + \pi)$: + +$$ += 0 +$$ + +## Putting it Together + +Now we have: + +$$ +\int _{-\pi}^\pi f(x)\cos(mx) dx = a_m\pi \\ +\frac 1 \pi \int _{-\pi}^\pi f(x)\cos(mx) dx = a_m +$$ + +Note in this case $m$ and $n$ both represent any positive integer, and are therefore interchangeable: + +$$ +a_n = \frac 1 \pi \int _{-\pi}^\pi f(x)\cos(nx) dx \\ +$$ + +# Finding $b_n$ + +Multiply the equation by $\sin mx$, where $m \in \mathbb{Z}^+$,integrate, and bound between $[-\pi, \pi]$: + +$$ +\int _{-\pi}^\pi f(x)\sin(mx) dx = \int _{-\pi}^\pi a_0\sin(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi a_n\cos(nx)\sin(mx) dx + \sum _{n = 1}^\infty \int _{-\pi}^\pi b_n\sin(nx)\sin(mx) dx +$$ + +The first two terms are already covered, so let’s focus on the final term. + +## Last Integral + +$$ +\int _{-\pi}^\pi b_n\sin(nx)\sin(mx) dx \\ += b_n\int _{-\pi}^\pi \cos(nx - mx) - \cos(nx + mx) dx \\ += b_n [\frac {\sin(nx - mx)} {n - m} - \frac {\sin(nx + mx)} {n + m}]_{-\pi}^\pi +$$ + +Again, there is a special case where $n = m$. Remember $\sin \pi = 0$, so: + +$$ += 0 +$$ + +With the special case: + +$$ +b_m\int _{-\pi}^\pi \sin^2(mx) dx \\ += b_m\int _{-\pi}^\pi \frac {-\cos(2mx) + 1} 2 dx \\ += b_m[\frac 1 2 (x - \frac {\sin(2mx)} {2m})]_{-\pi}^\pi \\ += b_m\pi +$$ + +## Putting it Together + +$$ +b_m\pi = \int _{-\pi}^\pi f(x)\sin(mx) dx \\ +b_m = \frac 1 \pi \int _{-\pi}^\pi f(x)\sin(mx) dx \\ +b_n = \frac 1 \pi \int _{-\pi}^\pi f(x)\sin(nx) dx +$$ + +# Fourier Series + +Using the above, let’s express $f(x)$ as a Fourier Series: + +$$ +f(x) = \frac 1 {2\pi} \int _{-\pi}^\pi f(x) dx + \sum _{n = 1}^\infty \frac {\cos (nx)} \pi \int _{-\pi}^\pi f(x)\cos(nx) dx + \sum _{n = 1}^\infty \frac {\sin (nx)} \pi \int _{-\pi}^\pi f(x)\sin(nx) dx +$$ + +Note that this representation only works for when the function repeats from $[0, 2\pi]$. Using a similar proof, we can get: + +$$ +f(x) = \frac 1 P \int _{-\frac P 2}^{\frac P 2} f(x) dx + \sum _{n = 1}^\infty \frac {2 \cos (\frac {2\pi nx} P)} P \int _{-\frac P 2}^{\frac P 2} f(x)\cos(\frac {2\pi nx} P) dx + \sum _{n = 1}^\infty \frac {2 \sin (\frac {2\pi nx} P)} P \int _{-\frac P 2}^{\frac P 2} f(x)\sin(\frac {2\pi nx} P) dx +$$ \ No newline at end of file diff --git a/Fourier Series in Terms of e.md b/Fourier Series in Terms of e.md new file mode 100644 index 0000000..f882f04 --- /dev/null +++ b/Fourier Series in Terms of e.md @@ -0,0 +1,26 @@ +#Math #Calculus + +# Proof + +Let's express a Fourier Series as: + +$$ +v = \frac {2\pi nx} P \\ +f(x) = \sum _{n = 0}^\infty A_n \cos v + B_n \sin v +$$ + +We can deduce: + +$$ +f(x) = \sum _{n = 0}^{\infty} \frac {A_n e^{iv} + A_n e^{-iv} - iB_n e^{iv} + iB_n e^{-iv}} 2 \\ += \sum _{n = 0}^{\infty} 0.5(A_n + iB_n)e^{-iv} + 0.5(A_n - iB_n)e^{iv} \\ += \sum _{n = 0}^{\infty} \frac {e^{-iv}} P \int _{-P/2}^{P/2} f(x) (\cos v + i\sin v) dx + \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x) (\cos -v + i\sin -v) dx \\ += \sum _{n = 0}^{\infty} \frac {e^{-iv}} P \int _{-P/2}^{P/2} f(x)e^{iv} dx + \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x)e^{-iv} dx \\ += \sum _{n = -\infty}^{\infty} \frac {e^{iv}} P \int _{-P/2}^{P/2} f(x)e^{-iv} dx +$$ + +## Definitions + +Definitions of $A_n$ and $B_n$: + +[[Fourier Series Proof]] \ No newline at end of file diff --git a/Hockey Stick Identity.md b/Hockey Stick Identity.md new file mode 100644 index 0000000..2d2db0e --- /dev/null +++ b/Hockey Stick Identity.md @@ -0,0 +1,27 @@ +#Math #Probability + +# Statement + +For $n \geq r$, $n, r \in \mathbb{N}$: + +$$ +\sum _{i = r}^n {i \choose r} = {n + 1 \choose r + 1} +$$ + +# Proof + +Let us have a base case $n = r$: + +$$ +{r \choose r} = {r + 1 \choose r + 1} = 1 +$$ + +Now suppose $\sum _{i = r}^n {i \choose r} = {n + 1 \choose r + 1}$ for a certain $n$: + +$$ +\sum _{i = r}^n {i \choose r} + {n + 1 \choose r} \\ += {n + 1 \choose r + 1} + {n + 1 \choose r} \\ += {n + 2 \choose r + 1} +$$ + +Since $n = r$ is true, and if a case is true for $n$, it is true for $n + 1$, this statement is true for all $n \geq r$. \ No newline at end of file diff --git a/Hyperbolic Trig.md b/Hyperbolic Trig.md new file mode 100644 index 0000000..987ed3b --- /dev/null +++ b/Hyperbolic Trig.md @@ -0,0 +1,46 @@ +#Math #Trig + +# Definition + +## Definition in terms of $e$ + +We define $\cosh$ and $\sinh$ to be the even and odd parts of $e^x$ respectively: + +$$ +\cosh x = \frac {e^x + e^{-x}} 2 \\ +\sinh x = \frac {e^x - e^{-x}} 2 +$$ + +Note this gives us: + +$$ +\sinh x + \cosh x = e^x +$$ + +similar to Euler's Formula for circular trig functions. + +## Definition in terms of a hyperbola + +[https://www.desmos.com/calculator/ixmjpfmukk](https://www.desmos.com/calculator/ixmjpfmukk) + +Know that the geometric definition of $\cosh$ is that $B = \cosh 2b$, where $b$ is the blue area. To find $b$, we can use: + +$$ +b = \frac {B\sqrt{B^2 - 1}} 2 -\int _1^B \sqrt {x^2 - 1} dx \\ += \frac {B\sqrt{B^2 - 1}} 2 - \frac {B\sqrt {B^2 - 1} - \ln(B + \sqrt {B^2 - 1})} 2\\ += \frac {\ln(B + \sqrt {B^2 - 1})} 2 +$$ + +Now let $a = 2b = -\ln(B + \sqrt {B^2 - 1})$. Now we can solve for $B$ in terms of $a$ to define $\cosh$: + +$$ +a = \ln(B + \sqrt {B^2 - 1}) \\ +B = \frac {e^a + e^{-a}} 2 \\ +\cosh x = \frac {e^x + e^{-x}} 2 +$$ + +Now using the fact $\cosh$ and $\sinh$ lie on a hyperbola (can be proved algebraically) we get: + +$$ +\sinh x = \frac {e^x - e^{-x}} 2 +$$ \ No newline at end of file diff --git a/Laplace Transforms.md b/Laplace Transforms.md new file mode 100644 index 0000000..cd86ade --- /dev/null +++ b/Laplace Transforms.md @@ -0,0 +1,22 @@ +#Calculus #Math +# Background - Analytic Continuation +$$ +\int _0^\infty e^{-st} dt = \frac 1 {s} +$$ +is used as an analytic continuation of the function. For the Laplace Transform to work, most of the integrals used must be extended to analytic continuations. +# Definition - Laplace Transform +$$ +F(s) = \int _0^\infty f(x) e^{-st} dt +$$ +# Intuition - The $e^{sx}$ Finding Machine +Take $f(x)$ as $\sum c_n e^{at}$. Plugging into the Laplace Transform: +$$ +F(s) = \int _0^\infty \sum c_ne^{(a - s)t} dt +$$ +$$ += \sum c_n \int _0^\infty e^{-(s - a)t} dt +$$ +$$ += \sum \frac {c_n} {s - a} +$$ +Therefore the Laplace Transform of a function reveals both $c_n$ and $s$ in the sum based upon the parts that make up the transform: poles reveal all $s$ values, while the "magnitude" of each pole reveals the magnitude of each $e^{sx}$ term. \ No newline at end of file diff --git a/Leibniz Integral Rule.md b/Leibniz Integral Rule.md new file mode 100644 index 0000000..4e2abd8 --- /dev/null +++ b/Leibniz Integral Rule.md @@ -0,0 +1,44 @@ +#Math #Calculus + +# Theorem + +Let $f(x, t)$ be such that both $f(x, t)$ and its partial derivative $f_x (x, t)$ be continuous in $t$ and $x$ in a region of the $xt$-plane, such that $a(x) \leq t \leq b(x)$, $x_0 \leq x \leq x_1$. Also let $a(x)$ and $b(x)$ be continuous and have continuous derivatives for $x_0 \leq x \leq x_1$. Then, for $x_0 \leq x \leq x_1$: + +$$ +\frac d {dx} (\int _{a(x)}^{b(x)} f(x, t) dt) = f(x, b(x)) \cdot \frac d {dx} b(x) - f(x, a(x)) \cdot \frac d {dx} a(x) + \int _{a(x)}^{b(x)} \frac \partial {\partial x} f(x, t) dt +$$ + +Notably, this also means: + +$$ +\frac d {dx} (\int _{c_1}^{c_2} f(x) dx) = \int _{c_1}^{c_2} \frac d {dx} f(x) dx +$$ + +# Proof + +Let $\varphi(x) = \int _a^b f(x, t) dt$ where $a$ and $b$ are functions of $x$i. Define $\Delta a = a(x + \Delta x) - a(x)$ and $\Delta b = b(x + \Delta x) - b(x)$. Then, + +$$ +\Delta \varphi = \varphi(x + \Delta x)- \varphi(x) \\ += \int _{a + \Delta a}^{b + \Delta b} f(x + \Delta x, t) dt - \int _a^b f(x, t) dt \\ +$$ + +Now expand the first integral by integrating over 3 separate ranges: + +$$ +\int _{a + \Delta a}^a f(x + \Delta x, t) dt + \int _a^b f(x + \Delta x, t) dt + \int _b^{b + \Delta b} f(x + \Delta x, t) dt - \int _a^b f(x, t) dt \\ += -\int _a^{a + \Delta a} f(x + \Delta x, t) dt + \int _a^b [f(x + \Delta x, t) - f(x, t)]dt + \int _b^{b + \Delta b} f(x + \Delta x, t) dt +$$ + +From mean value theorem we know $\int _a^b f(t) dt = (b - a)f(\xi)$, which applies to the first and last integrals: + +$$ +\Delta \varphi = -\Delta a f(x + \Delta x, \xi_1) + \int _a^b [f(x + \Delta x, t) - f(x, t)]dt + \Delta b f(x + \Delta x, \xi_2) \\ +\frac {\Delta \varphi} {\Delta x} = -\frac {\Delta a} {\Delta x} f(x + \Delta x, \xi_1) + \int _a^b \frac {f(x + \Delta x, t) - f(x, t)} {\Delta x} dt + \frac {\Delta b} {\Delta x} f(x + \Delta x, \xi_2) \\ +$$ + +Now as we set $\Delta x \to 0$, we can express many of the terms as definitions of derivatives (note we pass the limit sign through the integral via bounded convergence theorem). Note now that $\xi_1 \to a$ and $\xi_2 \to b$, which gives us: + +$$ +\frac d {dx} \int _a^b f(x, t) dt = -\frac {da} {dx} f(x, a) + \int _a^b \frac {\partial} {\partial x} f(x, t) dt + \frac {db} {dx} f(x + \Delta x, b) \\ +$$ \ No newline at end of file diff --git a/Limits.md b/Limits.md new file mode 100644 index 0000000..000052b --- /dev/null +++ b/Limits.md @@ -0,0 +1,33 @@ +#Math #Calculus + +Limits are when a number gets really close to another number but never actually reaches it. It is notated by + +$$ +\lim _{x \to y} +$$ + +where x approaches y. + +You can substitute numbers in for limit variables, such as + +$$ +\lim _{x \to 1} x + 1 = 2 +$$ + +Limits can go around certain constraints. + +$$ +\frac {1-x} {1-x} +$$ + +would not be defined at $x=1$, however + +$$ +\lim _{x \to 1} \frac {1-x} {1-x} = 1 +$$ + +Limits can also approach infinity, to use “infinity” in certain situations. + +$$ +\lim _{x \to \infty} \frac 1 x = 0 +$$ \ No newline at end of file diff --git a/Matrices.md b/Matrices.md new file mode 100644 index 0000000..1d1bf1a --- /dev/null +++ b/Matrices.md @@ -0,0 +1,45 @@ +#Math #Algebra + +A matrix is an $n$ by $m$ set of values. A $4 \times 3$can be notated by: + +$$ +\begin{bmatrix}a_1 & a_2 & a_3 \cr b_1 & b_2 & b_3 \cr c_1 & c_2 & c_3 \cr d_1 & d_2 & d_3 \end{bmatrix} +$$ + +To get a value from matrix $a$ in row $r$ and column $c$, use: + +$$ +a_{r, c} +$$ + +# Addition + +With two matrices of the same order, add corresponding elements: + +$$ +\begin{bmatrix} a_1 & b_1 \cr c_1 & d_1 \end{bmatrix} + \begin{bmatrix} a_2 & b_2 \cr c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 + a_2 & b_1 + b_2 \cr c_1 + c_2 & d_1 + d_2 \end{bmatrix} +$$ + +# Subtraction + +With two matrices of the same order, subtract corresponding elements: + +$$ +\begin{bmatrix} a_1 & b_1 \cr c_1 & d_1 \end{bmatrix} - \begin{bmatrix} a_2 & b_2 \cr c_2 & d_2 \end{bmatrix} = \begin{bmatrix} a_1 - a_2 & b_1 - b_2 \cr c_1 - c_2 & d_1 - d_2 \end{bmatrix} +$$ + +# Scalar Multiplication + +When multiplying a matrix by a scalar, multiply each element by said scalar: + +$$ +s\begin{bmatrix} a & b \cr c & d \end{bmatrix} = \begin{bmatrix} sa & sb \cr sc & sd \end{bmatrix} +$$ + +# Matrix Multiplication + +Let $a$ be an $i$ by $j$ matrix and $b$ be a $m$ by $n$ matrix. If $j = m$, $ab$ is defined. + +$$ +ab_{c, d} = \sum _{k = 1}^{j} a_{c, k}b_{k, d} +$$ \ No newline at end of file diff --git a/Pascal’s Triangle and Binomial Coefficients.md b/Pascal’s Triangle and Binomial Coefficients.md new file mode 100644 index 0000000..9b234de --- /dev/null +++ b/Pascal’s Triangle and Binomial Coefficients.md @@ -0,0 +1,44 @@ +#Math #Probability + +# Observing Pascal’s Triangle + +| n/k | 0 | 1 | 2 | 3 | 4 | 5 | +| --- | --- | --- | --- | --- | --- | --- | +| 0 | 1 | | | | | | +| 1 | 1 | 1 | | | | | +| 2 | 1 | 2 | 1 | | | | +| 3 | 1 | 3 | 3 | 1 | | | +| 4 | 1 | 4 | 6 | 4 | 1 | | +| 5 | 1 | 5 | 10 | 10 | 5 | 1 | + +As you can see, Pascal’s Triangle generates: + +$$ +{n \choose k} +$$ + +or + +$$ +\frac{n!}{k!(n-k)!} +$$ + +But how does this work? + +First, we can manually prove the top two rows of Pascal’s Triangle by plugging the values into the binomial coefficient formula. + +Afterward, we can use the property of Pascal’s Triangle, taking Pascal’s Triangle as a function P: + +$$ +P(n + 1, k) = P(n, k) + P(n, k-1) +$$ + +By proving this property in the binomial coefficient formula, we can deduce that Pascal’s Triangle generates binomial coefficients + +# The Proof + +$$ +\frac{n!}{k!(n-k)!}+\frac{n!}{(k-1)!(n-k+1)!}\\=\frac{n!(n-k+1)}{k!(n-k)!(n-k+1)}+\frac{n!k}{(k-1)!(n-k+1)!k}\\=\frac{n!(n-k+1)}{k!(n-k+1)!}+\frac{n!k}{k!(n-k+1)!}\\=\frac{n!(n+k+1-k)}{k!(n-k+1)!}\\=\frac{n!(n+1)}{k!(n-k+1)!}\\=\frac{(n+1)!}{k!(n-k+1)!} +$$ + +From this, we have proven that we can generate binomial coefficients using Pascal’s Triangle \ No newline at end of file diff --git a/Poisson Distribution.md b/Poisson Distribution.md new file mode 100644 index 0000000..30a3558 --- /dev/null +++ b/Poisson Distribution.md @@ -0,0 +1,97 @@ +#Math #Probability + +# The Poisson Distribution + +The Poisson Distribution describes a distribution where an event occurs for an interval of time, where there is an a mean number of times the event happens in the same interval of time. + +# Binomial Distribution to Poisson Distribution + +Binomial Distribution + +$$ +\frac {n!} {k!(n-k)!} p^k (1-p)^{n-k} +$$ + +Binomial Distribution with infinite trials + +$$ +\lim _{n\to\infty} \frac {n!} {k!(n-k)!} p^k (1-p)^{n-k} +$$ + +Let a be np, the average success rate in n intervals. This gives us the Poisson Distribution in another form. + +$$ +\lim _{n\to\infty} \frac {n!} {k!(n-k)!} (\frac {a} {n})^k (1-\frac {a} {n})^{n-k} +$$ + +$$ +\lim _{n\to\infty} \frac {n!} {k!(n-k)!} (\frac {a^k} {n^k}) (1-\frac {a} {n})^n(1-\frac {a} {n})^{-k} +$$ + +$$ +\frac {a^k} {k!} \lim _{n\to\infty} \frac {n!} {n^a(n-k)!} (1-\frac {a} {n})^n(1-\frac {a} {n})^{-k} +$$ + +Now we have three limits to evaluate + +# Evaluating the Limits + +## First Limit + +$$ +\lim _{n \to\infty} \frac {n!} {n^k(n-k)!} +$$ + +$$ +\lim _{n \to\infty} \frac {n(n-1)(n-2)...(n-k)(n-k-1)...(1)} {n^k(n-k)(n-k-1)...(1)} +$$ + +$$ +\lim _{n\to\infty} \frac {n(n-1)...(n-k+1)} {n^k} +$$ + +$$ +\lim _{n\to\infty} (\frac {n} {n})(\frac {n-1} {n})...(\frac {n-k+1} {n}) +$$ + +As n goes to infinity, all the terms tend to 1. Therefore, the limit tends to 1. + +## Second Limit + +$$ +\lim _{n\to\infty} (1-\frac {a} {n})^n +$$ + +Let u be -n/x (note this tends to negative infinity) + +$$ +\lim _{n\to\infty}(1+\frac {1} {u})^{-au} +$$ + +Use definition of e + +$$ +e^{-a} +$$ + +## Third Limit + +$$ +\lim _{n\to\infty}(1-\frac{a} {n})^{-k} +$$ + +a/n tends to 0 + +$$ +1^k +$$ + +Therefore this limit tends to 1. + +# Putting it together + +$$ +\frac {e^{-a}a^{k}}{k!} +$$ + +is the formula for the probability of an event happening k times in an interval of time, where a is the mean number of times of the event happening in the interval of time the event ran in. This is the formula for the Poisson Distribution. \ No newline at end of file diff --git a/Probability of Choosing 2 Coprime Numbers.md b/Probability of Choosing 2 Coprime Numbers.md new file mode 100644 index 0000000..cf7f1ca --- /dev/null +++ b/Probability of Choosing 2 Coprime Numbers.md @@ -0,0 +1,37 @@ +#Math #NT #Probability + +# Problem + +Calculate: + +$$ +P(x, y \in \mathbb{N}: gcd(x, y) = 1) +$$ + +# Solution + +Each number has a $\frac 1 p$ chance to be divisible by prime $p$, so the probability that two numbers do not share prime factor $p$ is + +$$ +1 - p^{-2} +$$ + +Therefore, the probability two numbers are coprime is: + +$$ +\prod _{p \in \mathbb{P}} 1 - p^{-2} +$$ + +Since $1 - x = (\frac 1 {1 - x})^{-1} = (\sum _{n = 0}^{\infty} x^n)^{-1}$, we can express the above as: + +$$ +(\prod _{p \in \mathbb{P}} \sum _{n = 0}^{\infty} p^{-2n})^{-1} +$$ + +We can choose any $n$ for $p^{2n}$ for each prime $p$, so by the Unique Factorization Theorem (any natural number can be prime factored one and only one way), we get: + +$$ +(\sum _{n = 0} n^{-2})^{-1} \\ += (\frac {\pi^2} 6)^{-1} \\ += \frac 6 {\pi^2} +$$ \ No newline at end of file diff --git a/Rational Root Theorem.md b/Rational Root Theorem.md new file mode 100644 index 0000000..b48684a --- /dev/null +++ b/Rational Root Theorem.md @@ -0,0 +1,49 @@ +#Math #Algebra + +# Proof + +Let polynomial + +$$ +P(x) = \sum _{i = 0}^n c_i x^i +$$ + +where $c_i \in \mathbb{Z}$ (all values of $c$ are integers). + +Now let $P(\frac p q) = 0$, where $p$ and $q$ are coprime integers (let a fraction $\frac p q$ be in simplest form and be a root of $P$). + +$$ +\sum _{i = 0}^n c_i (\frac p q)^i = 0 +$$ + +Multiplying by $q^n$: + +$$ +\sum _{i = 0}^n c_i p^n q^{n - i} = 0 +$$ + +Now subtract $c_0 q^n$ from both sides and factor $p$ out to get: + +$$ +p\sum _{i = 1}^n c_i p^{n - 1} q^{n - i} = -c_0 q^n +$$ + +Now $p$ must divide $-c_0q^n$. However, we know $p$ cannot divide $q^n$ (since $\frac p q$ is in simplest form / $p$ and $q$ are coprime), so $p$ must divide $c_0$. + +Doing the same thing as above but with the $a_n$ term and $q$: + +$$ +q\sum _{i = 0}^{n - 1} c_i p^n q^{n - i - 1} = -c_n p^n +$$ + +By the above logic, $q$ must divide $c_n$. + +## Conclusion + +For all rational roots in simplest form ($\frac p q$ where $p$ and $q$ are coprime integers), $p$ must be a factor of the last coefficient while $q$ must be a factor of the first coefficient. + +## Notes + +For the curious, coprime integers $p$ and $q$ mean that $\gcd(p, q) = 1$. + +If future me or someone else is wondering about the excess definitions, this was made for a friend. \ No newline at end of file diff --git a/Skip Gram.md b/Skip Gram.md new file mode 100644 index 0000000..31440ec --- /dev/null +++ b/Skip Gram.md @@ -0,0 +1,263 @@ +#Coding +# Abstract + +> \"No one is going to implement word2vec from scratch\" or sm 🤓 +> commentary like that idk + +This notebook provides a brief explanation and implementation of a Skip +Gram model, one of the two types of models word2vec refers to. + +# Intuition + +## Problem + +Given a corpus C, map all tokens to a vector such that words with +similar semantics (similar probability of appearing within a context) +close to each other. + +## Idea + +**The idea of a skip gram model proceeds from these two observations:** + +1. Similar words should appear in similar contexts +2. Similar words should appear together + +The intuition behind the Skip Gram model is to map a target token to all +the words appearing in a context window around it. + +> The MIMS major **Quentin** is a saber fencer. + +In this case the target token **Quentin** should map to all the other +tokens in the window. As such the target token should have similar +mappings to words such as MIMS, saber, and fencer. + +Skip Gram treats each vector representation of a token as a set of +weights, and uses a linear-linear-softmax model to optimize them. At the +end, the first set of weights are a list of $n$ vectors that map a token +to a prediction of output tokens - solving the initial mapping problem. + +# Code & Detailed Implementation + +## Preproccessing + +Tokenize all the words, and build training pairs using words in a +context window: + +``` python +import numpy as np + +class Preproccess: + + @staticmethod + def tokenize(text): + + """Returns a list of lowercase tokens""" + + return "".join([t for t in text.lower().replace("\n", " ") if t.isalpha() or t == " "]).split(" ") + + @staticmethod + def build_vocab(tokens, min_count=1): + + """Create an id to word and a word to id mapping""" + + token_counts = {} + for token in tokens: + if token not in token_counts: + token_counts[token] = 0 + token_counts[token] += 1 + + sorted_tokens = sorted(token_counts.items(), key=lambda t:t[1], reverse=True) # Sort tokens by frequency + vocab = {} + id_to_word = [0] * len(sorted_tokens) + for i in range(len(sorted_tokens)): + token, count = sorted_tokens[i] + if count < min_count: + break + id_to_word[i] = token + vocab[token] = i + + return vocab, id_to_word + + @staticmethod + def build_pairs(tokens, vocab, window_size=5): + + """Generate training pairs""" + + pairs = [] + token_len = len(tokens) + + for center in range(token_len): + tokens_before = tokens[max(0, center-window_size):center] + tokens_after = tokens[(center + 1):min(token_len, center + 1 + window_size)] + context_tokens = tokens_before + tokens_after + for context in context_tokens: + if tokens[center] in vocab and context in vocab: + pairs.append((tokens[center], context)) + + return pairs + + @staticmethod + def build_neg_sample(word, context, vocab, samples=5): + + """Build negative samples""" + + neg_samples = [] + neg_words = [vocab[w] for w in vocab if (w != word) and (w != context)] + neg_samples = np.random.choice(neg_words, size=samples, replace=False) +``` + +## Build Model + +- 3 layers used as an optimizer: + - $L_1 = XW_1$ + - $S = W_2 L_1$ + - $P = \text{softmax(S)}$ +- Loss function: $-\sum \log(P_{\text{context}} | P_{\text{target}})$ +- Negative sampling used to speed up training, compare and update + against \~20 negative vocab terms instead of updating all weights + +``` python +class Word2Vec: + + def __init__(self, vocab_size, embedding_dim=100): + + """Initialize weights""" + + self.vocab_size = vocab_size + self.embedding_dim = embedding_dim + self.W1 = np.random.normal(0, 0.1, (vocab_size, embedding_dim)) # First layer - word encoding + self.w2 = np.random.normal(0, 0.1, (embedding_dim, vocab_size)) # Second layer - context encoding + + def sigmoid(self, x): + + """Numerically stable sigmoid""" + + x = np.clip(x, -500, 500) + return 1 / (1 + np.exp(-x)) + + def cross_entropy_loss(self, probability): + + """Cross entropy loss function""" + + return -np.log(probability + 1e-10) # 1e-10 added for numerical stability + + def neg_sample_train(self, center_token, context_token, negative_tokens, learning_rate=0.01): + + """Negative sampling training for a single training pair""" + + total_loss = 0 + total_W1_gradient = 0 + + # Forward prop for positive case + center_embedding = self.W1[center_token, :] # L₁ = XW₁ + context_vector = self.W2[:, context_token] + score = np.dot(center_embedding, context_vector) #L₂ = L₁W₂, but only for the context token vector + sigmoid_score = self.sigmoid(score) + loss = self.cross_entropy_loss(sigmoid_score) + total_loss += loss + + # Backward prop for positive case + score_gradient = 1 - sigmoid_score # ∂L/∂S + W2_gradient = center_embedding * score_gradient # ∂L/∂W₂ = ∂L/∂S * ∂S/∂W₂ = XW₁ * ∂L/∂S + W1_gradient = context_vector * score_gradient # ∂L/∂W₁ = ∂L/∂S * ∂S/∂W₁ = W₂ * ∂L/∂S + + # Update weights + self.W2[:, context_token] -= learning_rate * W2_gradient + total_W1_gradient += learning_rate * W1_gradient + + for neg_token in negative_tokens: + + # Forward prop for negative case + neg_vector = self.W2[:, neg_token] + neg_score = np.dot(center_embedding, neg_vector) + neg_sigmoid_score = self.sigmoid(neg_score) + neg_loss = -np.log(1 - neg_sigmoid_score) + total_loss += neg_loss + + # Backward prop for negative case + neg_score_gradient = sigmoid_score + neg_W2_gradient = center_embedding * neg_score_gradient + neg_W1_gradient = context_vector * neg_score_gradient + + # Update weights + self.W2[:, neg_token] -= learning_rate * neg_W2_gradient + total_W1_gradient -= learning_rate * neg_W1_gradient + + # Update W1 + total_W1_gradient = np.clip(total_W1_gradient, -1, 1) + self.W1[center_token, :] += total_W1_gradient + + return total_loss + + def find_similar(self, token): + + """Use cos similarity to find similar words""" + + word_vec = self.W1[token, :] + similar = [] + for i in range(self.vocab_size): + if i != token: + other_vec = self.W1[i, :] + norm_word = np.linalg.norm(word_vec) + norm_other = np.linalg.norm(other_vec) + if norm_word > 0 and norm_other > 0: + cosine_sim = np.dot(word_vec, other_vec) / (norm_word * norm_other) + else: + cosine_sim = 0 + similar.append((cosine_sim, i)) + similar.sort(key=lambda x:x[0], reverse=True) + return [word[1] for word in similar] +``` + +## Run Model + +``` python +def epoch(model, pairs, vocab): + loss = 0 + pair_len = len(pairs) + done = 0 + for word, context in pairs: + neg_samples = Preproccess.build_neg_sample(word, context, vocab, samples=5) + loss += model.neg_sample_train(word, context, neg_samples) + done += 1 + if ((100 * done) / pair_len) // 1 > ((100 * done - 100) / pair_len) // 1: + print("_", end="") + return loss + +with open("corpus.txt") as corpus_file: + CORPUS = corpus_file.read() + +EPOCHS = 100 +tokens = Preproccess.tokenize(CORPUS) +vocab, id_to_token = Preproccess.build_vocab(tokens, min_count=3) +print("~VOCAB LEN~:", len(vocab)) +pairs = Preproccess.build_pairs(tokens, vocab, window_size=5) +model = Word2Vec(len(id_to_token), embedding_dim=100) +print("~STARTING TRAINING~") +for i in range(EPOCHS): + print(f"Epoch {i}: {epoch(model, pairs, vocab) / len(id_to_token)}") +print("~FINISHED TRAINING~") +``` +# Notes (Pedantic Commentary Defense :P) + +1. I use the term \"similar\" and \"related\" in reference to words, + which implies some sort of meaning is encoded. However in practice + word2vec is just looking for words with high probabilities of being + in similar contexts, which happens to correlate to \"meaning\" + decently well. +2. CBOW shares a very similar intuition to Skip Gram, the only + difference is which way you map a target token to context tokens. +3. Of course, a good deal of mathamatical pain can be shaved off this + excercise by using Tensorflow (here is a + [Colab](https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/word2vec.ipynb#scrollTo=iLKwNAczHsKg) + from Tensorflow that does it) - but this is done from scratch so the + inner workings of word2vec can be more easily seen. +4. Results are (very) subpar with a small corpus size, and this isn\'t + optimized for GPUs sooo\... at least the error goes down! + +# Sources + +1. +2. (worth a read - not a long paper + and def on the less math intensive side of things) +3. diff --git a/Taylor Series Proof.md b/Taylor Series Proof.md new file mode 100644 index 0000000..3548b81 --- /dev/null +++ b/Taylor Series Proof.md @@ -0,0 +1,61 @@ +#Math #Calculus + +Represent function using power series: + +$$ +f(x) = \sum _{n=0}^{\infty} c_n (x-a)^n +$$ + +Find $c_0$ + +$$ +c_0=f(a) +$$ + +Take derivative of function + +$$ +\frac d {dx} f(x) = \sum _{n=0}^\infty c_{n+1} (n+1)(x-a)^n +$$ + +Find $c_1$ + +$$ +c_1=\frac {d} {dx} f(a) +$$ + +Take second derivative of function + +$$ +\frac {d^2} {d^2x} f(x) = \sum _{n=0}^\infty c_{n+2} (n+1)(n+2)(x-a)^n +$$ + +Find $c_2$ + +$$ +c_2=\frac {\frac {d^2} {d^2x} f(a)} {2} +$$ + +Take third derivative of function + +$$ +\frac {d^3} {d^3x} f(x) = \sum _{n=0}^\infty c_{n+3} (n+1)(n+2)(n+3)(x-a)^n +$$ + +Find $c_3$ + +$$ +c_3=\frac {\frac {d^3} {d^3x} f(a)} {6} +$$ + +Create general formula for $n$th element of $c$ + +$$ +c_n = \frac {\frac {d^n} {d^nx}f(a)} {n!} +$$ + +Create general formula for function as polynomial + +$$ +f(x)=\sum _{n=0}^\infty \frac {\frac {d^n} {d^nx}f(a)} {n!} (x-a)^n +$$ \ No newline at end of file diff --git a/The Basel Problem.md b/The Basel Problem.md new file mode 100644 index 0000000..b10b14a --- /dev/null +++ b/The Basel Problem.md @@ -0,0 +1,45 @@ +#Math #NT + +# Basel Problem Solution + +## Base Sum + +$$ +\frac {\pi^2} 4 \\ += \frac {\pi^2} 4 \csc^2 (\frac \pi 2) \\ += \frac {\pi^2} {4^2} (\csc^2 (\frac \pi 4) + \csc^2 (\frac \pi 4 + \pi)) +$$ + +Do this operation $a$ times, with the above equation being the second time: + +$$ += \frac {\pi^2} {4^{a + 1}}\sum _{n = 1}^{2^{a}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\ += \sum _{n = 1}^{2^{a}} \frac {\pi^2} {4^{a + 1}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\ += \sum _{n = 1}^{2^{a}} \frac {\pi ^2}{4^{a + 1}} \csc^2(\frac \pi {2^{a+1}} + \frac \pi {2^a}) \\ += \sum _{n = 1}^{2^{a}} (\frac {2^{a + 1}} \pi \sin(\frac \pi {2^{a+1}} + \frac \pi {2^a}))^{-2} \\ +$$ + +As $a$ approaches $\infty$: + +$$ += 2\sum _{n=1}^{\infty} (2n - 1)^{-2} +$$ + +Therefore: + +$$ +\sum _{n = 1}^{\infty} (2n - 1)^{-2} = \frac {\pi^2} {8} +$$ + +## Manipulating this Sum + +$$ +\sum _{n = 1}^{\infty} (2n)^{-2} = \frac 1 4 \sum _{n = 1}^{\infty} n^{-2} \\\sum _{n = 1}^{\infty} (2n - 1)^{-2} = \frac 3 4 \sum _{n = 1}^{\infty} n^{-2} \\\frac {\pi ^2} 8 = \frac 3 4 \sum _{n = 1}^{\infty} n^{-2} \\ +\frac {\pi ^2} 6 = \sum _{n = 1}^{\infty} n^{-2} \\ +$$ + +Therefore + +$$ +\frac {\pi ^2} 6 = \sum _{n = 1}^{\infty} n^{-2} \\ +$$ \ No newline at end of file diff --git a/Totient Function.md b/Totient Function.md new file mode 100644 index 0000000..e55ab7a --- /dev/null +++ b/Totient Function.md @@ -0,0 +1,81 @@ +#Math #NT + +# Definition + +Euler’s totient function returns the number of integers from $1 \leq k \leq n$ for a positive integer $n$. It is notated as: + +$$ +\phi(n) +$$ + +# $\phi(n)$ for Prime Powers + +Through prime factorization, for $p^k$, the only positive integers below $p^k$ where $\gcd(p^k, n) > 1$ is where $n = mp$, for $1 \leq m \leq p^{k - 1}$. Therefore: + +$$ +\phi(p^k) \\ = p^k - p^{k - 1} \\ = p^{k - 1}(p - 1) \\ p^k(1 - \frac 1 p) +$$ + +# Multiplicative Property of $\phi$ + +If $m$ and $n$ are coprime: + +$$ +\phi(m)\phi(n) = \phi(mn) +$$ + +Proof: Let set $A$ be all numbers coprime to $m$ below $m$, and set $B$ be all numbers coprime to $n$ below $n$. + +$$ +|A| = \phi(m) \\ |B| = \phi(n) +$$ + +Let set $D$ be all possible ordered pairs using elements from $A$ and $B$, where the element of $A$ is first. If for each element $(k_1, k_2)$in set $D$ we return a value $\theta$ where: + +$$ +\theta \equiv k_1 \mod m \\ \theta \equiv k_2 \mod n +$$ + +CRT ensures $\theta$ is unique to $\mod ab$ and exists. Given the fact $\gcd(x + yz, z) = \gcd(x, z)$, we can say that: + +$$ +\gcd(\theta, m) = \gcd(k_1, m) = 1 \\ \gcd(\theta, n) = \gcd(k_2, n) = 1 \\ \gcd(\theta, mn) = 1 +$$ + +If we put all $\theta$ in set $C$, we can see that set $C$ has all the elements fitting the above conditions. Looking at the length of $C$: + +$$ +|C| = \phi(mn) \\ +|C| = |A| * |B| = \phi(m)\phi(n) \\ + \phi(mn) = \phi(m)\phi(n) +$$ + +# Value of $\phi$ for any Number + +Let a positive integer $n$ prime factorization be: + +$$ +n = p_1^{k_1}p_2^{k_2}p_3^{k_3}...p_l^{k_l} +$$ + +Now using the properties above: + +$$ +\phi(n) \\ += \prod _{i = 1}^l \phi(p_i^{k_i}) \\ += \prod _{i = 1}^l p_i^{k_i}(1 - \frac 1 {p_i}) +$$ + +Multiplying all $p_i^{k_i}$ gives $n$, so factor that out: + +$$ += n \prod _{i = 1}^l (1 - \frac 1 {p_i}) +$$ + +(you can derive most textbook definitions from this formula easily) + +Final formula: + +$$ +\phi(n) = n \prod _{i = 1}^l (1 - \frac 1 {p_i}) +$$ \ No newline at end of file diff --git a/Vectors.md b/Vectors.md new file mode 100644 index 0000000..da3d71e --- /dev/null +++ b/Vectors.md @@ -0,0 +1,97 @@ +#Math #Algebra + +# Defining Vectors + +Vectors are a list of components. They can be expressed in ij notation by: + +$$ +\mathbf a = 2i + 3j -4k +$$ + +or + +$$ +\vec a = 2i + 3j -4k +$$ + +You can also express a vector as a matrix: + +$$ +\vec a = \begin {bmatrix} +2 \\ +3 \\ +-4 \\ +\end {bmatrix} \\ +\vec a = \begin {bmatrix} 2 & 3 & -4 \end {bmatrix} +$$ + +# Adding and Subtracting Vectors + +To add vectors, add their corresponding components. For example: + +$$ +\vec a = 4i + 7j - 9k \\ +\vec b = 3i - 5j - 8k \\ +\vec a + \vec b = 7i + 2j - 17k +$$ + +Subtracting vectors works in a similar fashion: + +$$ +\vec a - \vec b = i + 12j - k +$$ + +Here are the formulas: + +$$ +\vec a + \vec b = a_i+b_i \\ +\vec a - \vec b = a_i-b_i + +$$ + +Here’s a graph visualizing the addition and subtraction of vectors: [https://www.desmos.com/calculator/gavjpwhnuo](https://www.desmos.com/calculator/gavjpwhnuo) + +# Multiplication by Scalar + +To multiply a vector by a scalar (regular number), just multiply all the components by that number: + +$$ +m\vec a = ma_i +$$ + +# Multiplication by Another Vector: Dot Product + +There are two different ways to multiply a vector by another vector. The first way is a dot product. Here is the algebraic definition, where n is the length of the two vectors: + +$$ +\vec a \cdot \vec b = \sum _{i = 0}^n a_ib_i +$$ + +With two two dimensional vectors, we can also provide a geometric definition, where $||\vec a||$ is the magnitude of $\vec a$, and $\theta$ is the angle between the vectors: + +$$ +\vec a \cdot \vec b = ||a|| \: ||b|| \: \cos \theta +$$ + +As you can see, the dot product returns a single value, or scalar. From the geometric definition, you can see that it describes how much one vector “aligns” to the other. + +## Proving that the Definitions are the Same + +Let $\vec a$ have a magnitude of $m$ and an angle of $p$, let $\vec b$ have a magnitude of $n$ and an angle of $q$. + +$$ +\vec a \cdot \vec b \\ += m\cos p \: n \cos q + m\sin p \: n \sin q \\ += mn(\cos p \: cos q + \sin p \: \sin q) \\ += mn\cos(p-q) +$$ + +Using the algebraic definition, we can get the geometric definition as shown above. + +# Cross Product + +Let $n$ be a unit vector perpendicular to $\vec a$ and $\vec b$, and $\theta$ be the angle between them. The cross product is: + +$$ +\vec a \times \vec b = ||\vec a|| \: ||\vec b|| \: \sin \theta \: n +$$ \ No newline at end of file diff --git a/Vieta’s Formulas.md b/Vieta’s Formulas.md new file mode 100644 index 0000000..8b131c3 --- /dev/null +++ b/Vieta’s Formulas.md @@ -0,0 +1,29 @@ +#Math #Algebra + +Let polynomial $a$ be: + +$$ +a = c_n \prod _{i = 0}^n (x - r_i) +$$ + +where $r_i$ is a root of $a$, and $c_n$ is the leading coefficient of $a$. + +We can also represent $a$ as: + +$$ +a = \sum _{i = 0}^n c_i x^i +$$ + +By expanding the first definition of $a$, we can define $c_i$ by: + +$$ +c_{n-i} = (-1)^i c_n\sum _{sym}^i r +$$ + +This is through the nature of multiplying binomials, with the coefficient resulting in the sum of all possible combinations of $i$ roots multiplied together, or the $i$th elementary symmetric sum of set $r$. We also have to multiply by the negative sign, resulting in $(-1)^i$ + +We can refactor to state: + +$$ +\sum _{sym}^i r = (-1)^i \frac {c_{n-i}} {c_n} +$$ \ No newline at end of file diff --git a/index.md b/index.md new file mode 100644 index 0000000..dd10e22 --- /dev/null +++ b/index.md @@ -0,0 +1,3 @@ +# Welcome! 👋 + +I am [@craisin](https://craisin.tech), a CS and math enthusiast! These are a set of notes related to anything adjacent to math/CS that I am interested in. Enjoy! diff --git a/sin x = 2.md b/sin x = 2.md new file mode 100644 index 0000000..ddd484e --- /dev/null +++ b/sin x = 2.md @@ -0,0 +1,40 @@ +#Math #Trig + +$$ +\sin x = 2 +$$ +$$ +\frac {e^{ix} - e^{-ix}} {2i} = 2 +$$ +$$ +e^{ix} - e^{-ix} = 4i \\ +$$ +$$ +e^{ix} - (e^{ix})^{-1} = 4i +$$ + +Let $u = e^{ix}$: + +$$ +u - u^{-1} = 4i +$$ +$$ +u^2 - 1 = 4iu \\ $$$$ +u^2 - 4iu - 1 = 0 +$$$$ +u^2 - 4iu - 4 = -3 $$$$ +(u - 2i)^2 = -3 \\ $$$$ +u - 2i = \pm \sqrt {-3} $$$$ +u = 2i \pm \sqrt {-3} \\ $$$$ +u = i(2 \pm \sqrt 3) +$$ + +Substitute back into $u$, for $n \in \mathbb{Z}$: + +$$ +e^{ix} = i(2 \pm \sqrt 3) \\ $$$$ +ix = \ln (i(2 \pm \sqrt 3)) \\ $$$$ +ix = \ln i + 2\pi n+ \ln(2 \pm \sqrt 3) \\ $$$$ +ix = \frac {i\pi} 2 + 2\pi n + \ln(2 \pm \sqrt 3) $$$$ +x = \frac \pi 2 - i\ln(2 \pm \sqrt 3) + 2\pi n +$$