5.8 KiB
#Math #Probability
The Central Limit Theorem
Let us sum n instances from an i.i.d (independent and identical distribution) with defined first and second moments (mean and variance). Center the distribution on 0 and scale it by its standard deviation. As n goes to infinity, the distribution of that variable goes toward
\frac 1 {\sqrt 2 \pi} e^{- \frac {x^2} 2}
or the standard normal distribution
Mathematical Definition
Let Y be the mean of a sequence of n i.i.ds
Y = \frac 1 n \sum _{i=1}^{n} X_i
Let \mu=E(X_i), the expected value of X, and \sigma = \sqrt {Var(X)}, the standard deviation of X
Calculate the expected value of Y, E(Y), and the variance, Var(Y):
E(Y) \\
= E(\frac 1 n \sum _{i=1}^{n} X_i) \\
= \frac 1 n \sum _{i=1}^{n} E(X_i) \\
= \frac 1 n \sum _{i=1}^{n} \mu \\
= \frac {n \mu} {n} \\
= \mu
Var(Y) \\
= Var(\frac 1 n \sum _{i=1}^n X_i) \\
= \frac 1 {n^2} \sum _{i=1}^n Var(X_i) \\
= \frac \sigma n
Let Y^* be centered by E(Y) and scaled by it's standard deviation, \sqrt {Var(Y)}
Y^* \\ = \frac {Y - E(Y)} {\sqrt {Var(Y)}} \\ = \frac {Y - \mu} {\sqrt {\frac {\sigma^2} {n}}} \\ = \frac {\sqrt n (Y - \mu)} \sigma \\= \frac {\sqrt n (\frac 1 n \sum _{i=0}^n X_i - \mu)} \sigma \\ = \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma
The CLT states
Y^* \overset d \to N(0, 1)
Or Y^* converges in distribution to the standard normal distribution with a mean of 0 and a standard deviation of 1
Proof
A Change in Variables
Let S be the sum of our sequence of n i.i.ds
S = \sum _{i=1}^{n} X_i
Let’s calculate E(S) and Var(S)
E(S) \\
=E(\sum _{i=1}^n X_i) \\
=\sum _{i=1}^n E(X_i) \\
=\sum _{i=1}^n \mu \\
= n\mu
Var(S) \\
=Var(\sum _{i=1}^n X_i) \\
=\sum _{i=1}^n Var(X_i) \\
=\sum _{i=1}^n \sigma^2 \\
=n\sigma^2
Center S by E(S) and scale it by \sqrt {Var(S)} for S^*
S^* \\
= \frac {S - E(S)} {\sqrt {Var(S)}} \\
= \frac {S - n\mu} {\sqrt {n\sigma^2}} \\
= \frac {S - n\mu} {\sqrt {n}\sigma} \\
= \frac {\frac 1 {\sqrt n} (S-n\mu)} { \sigma} \\
= \frac {\frac 1 {\sqrt n} (\sum _{i=0}^n X_i - \mu)} \sigma
From the above, Y^*=S^*. In the proof, we will use S^*, as it is easier to manipulate.
MGFs
An MGF is a function where
M_V(t) = E(e^{tV})
where V is a random variable
(reminder for me to do another notion on this)
Properties of MGFs
Property 1:
If
C=A+B
Then
M_C(t) \\
= E(e^{tC}) \\
= E(e^{ta + tb}) \\
= E(e^{ta}e^{tb}) \\
= E(e^{ta}) + E(e^{tb}) \\
= M_A(t) + M_B(t)
Property 2:
M_V^{(r)}(0) = E(V^r)
The r derivative of M_V gives the r moment of V
Property 3:
Let A be a sequence of random variables with MGFs of A_1, $A_2$… A_n
If
M_{A_n}(t) \to M_B(t)
Then
A \overset d \to B
MGF of a Normal Distribution
Let a random variable derived from a standard normal distribution be Z
Z \sim N(0, 1)
M_z(t) \\
= E(e^{xt}) \\
= \int _{-\infty}^{\infty} e^{xt} \frac 1 {\sqrt {2\pi}} e^{-\frac {x^2} 2} dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{tx-\frac 1 2 x^2} dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx )} dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x^2 - 2tx + t ) + \frac 1 2 t^2 } dx \\
= \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 + \frac 1 2 t^2 } dx \\
= e ^ {\frac 1 2 t^2} \int _{-\infty}^{\infty} \frac 1 {\sqrt {2\pi}} e^{-\frac 1 2 (x - t)^2 } dx \\
= e ^ {\frac {t^2} 2}
The Argument
To prove the CLT, we need to prove that S^* converges to N(0, 1) as n \to \infty. Our approach will be to prove that the MGF of N(0, 1) converges to the distribution of S^* as n \to \infty.
S^* \\
= \frac {S - E(S)} {\sqrt {Var(S)}} \\
= \frac {S - n\mu} {\sqrt {n \sigma^2}} \\
= \frac {\sum _{i=1}^{n} X_i - n\mu} {\sqrt n \sigma} \\
= \sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma}
Start manipulating MGF of S^*:
M_{S^*}(t) \\
= E(e^{tS^*}) \\
= E(e^{t(\sum _{i=1}^{n} \frac {X_i - u} {\sqrt n \sigma})}) \\
= E(e^{t(\frac {(X-\mu)} {\sqrt n \sigma})})^n \\
= (M_{\frac {(X-\mu)} {\sqrt n \sigma}}(t))^n \\
=(M_{(X - \mu)} (\frac t {\sqrt n \sigma })^n
Expand out Taylor series for (M_{(x-\mu)}(\frac t {\sqrt n \sigma}))^n (note O(t^3) means order t^3 and above, and tends to zero as n goes to \infty ):
M_{(X-\mu)}(\frac t {\sqrt n \sigma}) \\
= (M_{(X-\mu)}(0)) + (\frac {M_{(X-\mu)}\prime(0)} {1!})(\frac t {\sqrt n \sigma}) + (\frac {M_{(X-\mu)}\prime\prime(0)} {2!})(\frac t {\sqrt n \sigma})^2 + (\frac {M_{(X-\mu)}\prime\prime\prime(0)} {1!})(\frac t {\sqrt n \sigma})^3 + ...\\
= 1 + (\frac {t} {\sqrt n \sigma})E(X-\mu) + (\frac {t^2} {2 n \sigma^2})E((X-\mu)^2) + (\frac {t^3} {6n ^ {\frac 3 2} \sigma ^ 3})E((X-\mu)^3) + ... \\
= 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2}
{2n \sigma^2})E((X-\mu)^2) + O(t^3) \\
\approx 1 + (\frac t {\sqrt n \sigma})E(X-\mu) + (\frac {t^2}
{2n \sigma^2})E((X-\mu)^2)
Remember E(X-\mu) = 0 and E((X-\mu)^2) = \sigma^2
= 1 + (\frac t {\sqrt n \sigma})(0) + (\frac {t^2} {2n \sigma^2})(\sigma ^ 2) \\
= 1 + \frac {t^2} {2n}
Solve for M_{S^*} (t):
M_{S^*}(t) = (1 + \frac {t^2} {2n})^n
Solve M_{S^*} (t) for \lim _{n \to \infty}:
\lim _{n \to \infty} M_{S^*}(t) \\
= \lim _{n \to \infty} (1 + \frac {t^2} {2n})^n \\
= \lim _{n \to \infty} (1 + \frac 1 {(\frac {2n} {t^2})})^{\frac {t^2} 2 (\frac {2n} {t^2})} \\
= e^{\frac {t^2} 2}
Since \lim _{n \to \infty} M_{S^*} (t) \to M_Z(t), \lim _{n \to \infty}S^* \overset d \to N(0, 1). Therefore:
Y^* \overset d \to N(0, 1)
proving the Central Limit Theorem
Summary of the Argument
Y^* = S^* \\
\lim _{n \to \infty} M_{S^*}(t) \to M_Z (t) \\
\lim _{n \to \infty} S^* \to N(0, 1) \\
\lim _{n \to \infty} Y^* \to N(0, 1) \\