高斯分布的微分熵

XN(μ,σ2) f(x)=12πσ2e(xμ)22σ2,其微分熵推导过程如下:

h(X)=12πσ2exp((xμ)22σ2)ln[12πσ2exp((xμ)22σ2)]dx=12πσ2exp((xμ)22σ2)(ln(2πσ2)1/2(xμ)22σ2)dx=12ln(2πσ2)12πσ2exp((xμ)22σ2)dx+(xμ)22σ212πσ2exp((xμ)22σ2)dx=12ln(2πσ2)+12σ2E(Xμ)2=12ln(2πσ2)+12=12ln(2πeσ2)(nats)

又因为 Hb(X)=logbaHa(X),于是取 b=e, a=2 1 natx bits=ln2x=1ln2=logelog2=loge bits 所以 12ln(2πeσ2) nats=12log(2πeσ2)logeloge bits=12log(2πeσ2) bits

概率

Bounds on tail probabilities

Markov’s inequality: For any r.v. X and constant a>0,

P(|X|a)E|X|a.

Let Y=|X|a. We need to show that P(Y1)E(Y). Note that

I(Y1)Y, since if I(Y1)=0 then Y0, and if I(Y1)=1 then Y1 (because the indicator says so). Taking the expectation of both sides, we have Markov’s inequality.

Chebyshev’s inequality: Let X have mean μ and variance σ2. Then for any a>0,

P(|Xμ|a)σ2a2.

By Markov’s inequality,

P(|Xμ|a)=P((Xμ)2a2)E(Xμ)2a2=σ2a2.

Chernoff inequality: For any r.v. X and constants a>0 and t>0,

P(Xa)E(etX)eta.

The transformation g with g(x)=etx is invertible and strictly increasing. So by Markov’s inequality, we have

P(Xa)=P(etXeta)E(etX)eta.

Law of large numbers

Assume we have i.i.d. X1,X2,X3, with finite mean μ and finite variance σ2. For all positive integers n, let

X¯n=X1++Xnn

be the sample mean of X1 through Xn. The sample mean is itself an r.v., with mean μ and variance σ2/n:

E(X¯n)=1nE(i=1nXi)=1ni=1nE(Xi)=μ,Var(X¯n)=1n2Var(i=1nXi)=1n2i=1nVar(Xi)=σ2n.

Strong law of large numbers The sample mean X¯n converges to the true mean μ pointwise as n, with probability 1. In other words,

P(limnX¯n=μ)=1, or X¯na.s.μ.

Weak law of large numbers For all ϵ>0, P(|X¯nμ|>ϵ)0 as n. (This is called convergence in probability.) In other words,

limnP(X¯n=μ)=1.

Fix ϵ>0, by Chebyshev’s inequality,

P(|X¯nμ|>ϵ)σ2nϵ2.

As n, the right-hand side goes to 0, ans so must the left-hand side.

References

  1. Blitzstein, Joseph K, and Hwang, Jessica. “Introduction to probability.”