常用结论的证明记录

高斯分布的微分熵

$X \sim \mathcal{N}(\mu, \sigma^2)~$，$\displaystyle f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$，其微分熵推导过程如下：

$$ \begin{split} h(X) &= \int_{-\infty}^{\infty} \frac{-1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \cdot \ln \left[ \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \right] dx \newline &= \frac{-1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^{\infty} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \cdot \left( \ln(2\pi\sigma^2)^{-1/2} - \frac{(x-\mu)^2}{2\sigma^2} \right) dx \newline &= \frac{1}{2}\ln(2\pi\sigma^2) \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) dx \newline &\quad + \int_{-\infty}^{\infty} \frac{(x-\mu)^2}{2\sigma^2} \cdot \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) dx \newline &= \frac{1}{2}\ln(2\pi\sigma^2) + \frac{1}{2\sigma^2}E(X-\mu)^2 \newline &= \frac{1}{2}\ln(2\pi\sigma^2) + \frac{1}{2} = \frac{1}{2}\ln(2\pi e\sigma^2) \qquad\text{(nats)} \end{split} $$

又因为 $H_b(X) = \log_ba \cdot H_a(X)$，于是取 $b=e, ~a=2$ $$ \implies \frac{1 \text{ nat}}{x \text{ bits}} = \ln2 \implies x = \frac{1}{\ln2} = \frac{\log e}{\log2} = \log e \text{ bits} $$ 所以 $$ \frac{1}{2}\ln(2\pi e\sigma^2) \text{ nats} = \frac{1}{2} \cdot \frac{\log(2\pi e\sigma^2)}{\log e} \cdot \log e \text{ bits} = \frac{1}{2}\log(2\pi e\sigma^2) \text{ bits} $$

概率

Bounds on tail probabilities

Markov’s inequality: For any r.v. $X$ and constant $a > 0$,

$$ P(|X| \ge a) \le \frac{E|X|}{a}. $$

Let $Y = \frac{|X|}{a}$. We need to show that $P(Y\ge 1) \le E(Y)$. Note that

$$ I(Y \ge 1) \le Y, $$ since if $I(Y \ge 1) = 0$ then $Y \ge 0$, and if $I(Y \ge 1) = 1$ then $Y \ge 1$ (because the indicator says so). Taking the expectation of both sides, we have Markov’s inequality.

Chebyshev’s inequality: Let $X$ have mean $\mu$ and variance $\sigma^2$. Then for any $a > 0$,

$$ P(|X-\mu| \ge a) \le \frac{\sigma^2}{a^2}. $$

By Markov’s inequality,

$$ P(|X-\mu|\ge a) = P((X-\mu)^2 \ge a^2) \le \frac{E(X-\mu)^2}{a^2} = \frac{\sigma^2}{a^2}. $$

Chernoff inequality: For any r.v. $X$ and constants $a > 0$ and $t>0$,

$$ P(X\ge a) \le \frac{E(e^{tX})}{e^{ta}}. $$

The transformation $g$ with $g(x) = e^{tx}$ is invertible and strictly increasing. So by Markov’s inequality, we have

$$ P(X\ge a) = P(e^{tX} \ge e^{ta}) \le \frac{E(e^{tX})}{e^{ta}}. $$

Law of large numbers

Assume we have i.i.d. $X_1, X_2, X_3, \dots$ with finite mean $\mu$ and finite variance $\sigma^2$. For all positive integers $n$, let

$$ \bar{X}_n = \frac{X_1 + \cdots + X_n}{n} $$

be the sample mean of $X_1$ through $X_n$. The sample mean is itself an r.v., with mean $\mu$ and variance $\sigma^2/n$:

$$ \begin{split} E(\bar{X}_n) &= \frac{1}{n} E\left(\sum_{i=1}^n X_i\right) = \frac{1}{n}\sum_{i=1}^n E(X_i) = \mu, \newline \text{Var}(\bar{X}_n) &= \frac{1}{n^2} \text{Var}\left(\sum_{i=1}^n X_i\right) = \frac{1}{n^2}\sum_{i=1}^n \text{Var}(X_i) = \frac{\sigma^2}{n}. \end{split} $$

Strong law of large numbers The sample mean $\bar{X}_n$ converges to the true mean $\mu$ pointwise as $n \to \infty$, with probability 1. In other words,

$$ P(\lim_{n\to\infty} \bar{X}_n = \mu) = 1, \text{ or } \bar{X}_n \overset{a.s.}{\longrightarrow} \mu. $$

Weak law of large numbers For all $\epsilon >0$, $P(|\bar{X}_n-\mu|>\epsilon) \to 0$ as $n\to\infty$. (This is called convergence in probability.) In other words,

$$ \lim_{n\to\infty}P(\bar{X}_n = \mu) = 1. $$

Fix $\epsilon >0$, by Chebyshev’s inequality,

$$ P(|\bar{X}_n-\mu|>\epsilon) \le \frac{\sigma^2}{n\epsilon^2}. $$

As $n\to\infty$, the right-hand side goes to 0, ans so must the left-hand side.

References

Blitzstein, Joseph K, and Hwang, Jessica. “Introduction to probability.”

Contents

高斯分布的微分熵

概率

Bounds on tail probabilities

Law of large numbers

References