跳到主要内容

2 Sampling and Concentration

2.1 Markov's Inequality

Theorem 2.1 (Markov's inequality). If XX is a non-negative random variable, i.e., Pr[X0]=1\Pr[X \ge 0] = 1, then for every a>0a > 0,

Pr[Xa]E[X]a.\begin{align*} \Pr[X \ge a] \le \frac{\mathbb{E}[X]}{a}. \end{align*}

FKS Perfect Hashing

At the first level, we set the hash space size to be exactly the number of keys to be stored, which is nn. We sample a hash function h(1):K{0,1,,n1}h^{(1)} : K \to \{0, 1, \dots, n-1\} from a universal hash family H(1)\mathcal{H}^{(1)} to distribute the keys into nn buckets B0,,Bn1B_0, \dots, B_{n-1} according to the hash values. That is, for each bucket jj,

Bj:={ki:h(1)(ki)=j}.\begin{align*} B_j := \{k_i : h^{(1)}(k_i) = j\}. \end{align*}

The total number of cells consumed by the construction is

M:=j=1nmj=j=1nnj2.\begin{align*} M := \sum_{j=1}^{n} m_j = \sum_{j=1}^{n} n_j^2. \end{align*}

If M4nM \ge 4n, we resample the hash functions and repeat the construction process until M<4nM < 4n.

Finally, for each key kik_i in bucket BjB_j, we store the key-value pair (ki,vi)(k_i, v_i) in cell h(2,j)(ki)h^{(2,j)}(k_i) of the second-level table for bucket jj.

At the query time, given a query key kk, we first compute j=h(1)(k)j = h^{(1)}(k) to locate the corresponding first-level bucket. Then we compute the second-level address h(2,j)(k)h^{(2,j)}(k) inside bucket jj. Finally, we check if the cell at address h(2,j)(k)h^{(2,j)}(k) matches the query key kk. If so, we return the corresponding value viv_i.

Theorem 2.2. Let XX be a random variable and f:RRf : \mathbb{R} \to \mathbb{R} be a function. If Pr[f(X)0]=1\Pr[f(X) \ge 0] = 1, then for any a>0a > 0,

Pr[f(X)a]E[f(X)]a.\begin{align*} \Pr[f(X) \ge a] \le \frac{\mathbb{E}[f(X)]}{a}. \end{align*}

Theorem 2.3. If Pr[Xu]=1\Pr[X \le u] = 1 for some uRu \in \mathbb{R}, then for any a<ua < u,

Pr[Xa]uE[X]ua.\begin{align*} \Pr[X \le a] \le \frac{u - \mathbb{E}[X]}{u - a}. \end{align*}

2.2 Chebyshev’s Inequality

Variance

The variance of a random variable XX is defined as

Var(X)=E[(XE[X])2].\begin{align*} \text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]. \end{align*}

Theorem 2.4. For any random variable XX,

Var(X)=E[X2]E[X]2.\begin{align*} \text{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2. \end{align*}

Theorem 2.5 (Chebyshev's inequality). For any random variable XX and any ϵ>0\epsilon > 0,

Pr[XE[X]ϵ]Var(X)ϵ2.\begin{align*} \Pr\left[|X - \mathbb{E}[X]| \ge \epsilon\right] \le \frac{\text{Var}(X)}{\epsilon^2}. \end{align*}

Definition (Standard deviation). The standard deviation of a random variable XX is defined as

σ(X)=Var(X).\begin{align*} \sigma(X) = \sqrt{\text{Var}(X)}. \end{align*}

Theorem 2.6. For any random variable XX and any α>0\alpha > 0,

Pr[XE[X]ασ(X)]1α2.\begin{align*} \Pr\left[|X - \mathbb{E}[X]| \ge \alpha \cdot \sigma(X)\right] \le \frac{1}{\alpha^2}. \end{align*}

Theorem 2.7. For any random variable XX and any constant cRc \in \mathbb{R},

Var(cX)=c2Var(X).\begin{align*} \text{Var}(cX) = c^2\text{Var}(X). \end{align*}

Theorem 2.8. If XX and YY are independent random variables, then

Var(X+Y)=Var(X)+Var(Y).\begin{align*} \text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y). \end{align*}

Theorem 2.9 (Chebyshev for sample averages). Let X:=1ni=1nXiX := \frac{1}{n} \sum_{i=1}^{n} X_i be the sample average of nn i.i.d. random variables X1,,XnX_1, \dots, X_n, each with mean μ\mu and variance σ2\sigma^2. Then for every ϵ>0\epsilon > 0,

Pr[Xμϵ]σ2nϵ2whereX:=1ni=1nXi.\begin{align*} \Pr\left[|X - \mu| \ge \epsilon\right] \le \frac{\sigma^2}{n\epsilon^2} \quad \text{where} \quad X := \frac{1}{n}\sum_{i=1}^{n} X_i. \end{align*}

2.3 Chernoff Bounds

Moment Generating Functions

The moment generating function (MGF) of a random variable XX is defined as

MX(t):=E[etX]=E[k=0tkk!Xk]=k=0tkk!E[Xk].\begin{align*} M_X(t) := \mathbb{E}[e^{tX}] = \mathbb{E}\left[ \sum_{k=0}^{\infty} \frac{t^k}{k!} X^k \right] = \sum_{k=0}^{\infty} \frac{t^k}{k!} \mathbb{E}[X^k]. \end{align*}

Theorem 2.10 (Properties of the MGF). Let XX be a random variable. Then its MGF M(t)M(t) satisfies the following properties:

  1. M(0)=1M(0) = 1;
  2. M(t)etE[X]M(t) \ge e^{t\mathbb{E}[X]} for all tRt \in \mathbb{R}.

Generic Chernoff Bounds

Theorem 2.11 (Generic Chernoff bound (upper tail)). Let XX be a random variable with mean μ\mu and MGF M(t)M(t). Then for any aμa \ge \mu,

Pr[Xa]inftR{etaM(t)}.\begin{align*} \Pr[X \ge a] \le \inf_{t \in \mathbb{R}} \{e^{-ta} M(t)\}. \end{align*}

Theorem 2.12 (Generic Chernoff bound (lower tail)). Let XX be a random variable with mean μ\mu and MGF M(t)M(t). Then for any aμa \le \mu,

Pr[Xa]inftR{etaM(t)}.\begin{align*} \Pr[X \le a] \le \inf_{t \in \mathbb{R}} \{e^{-ta} M(t)\}. \end{align*}

Definition (Rate function). The rate function of a random variable XX is defined as

IX(a):=suptR{talogMX(t)},\begin{align*} I_X(a) := \sup_{t \in \mathbb{R}} \{ta - \log M_X(t)\} \,, \end{align*}

with the convention that IX(a)=+I_X(a) = +\infty if the supremum is unbounded.

Theorem 2.13 (Generic Chernoff bound in terms of the rate function). Let XX be a random variable with mean μ\mu and rate function I(a)I(a). Then

aμ:Pr[Xa]eI(a),aμ:Pr[Xa]eI(a).\begin{align*} \forall a \ge \mu : \quad \Pr[X \ge a] &\le e^{-I(a)} , \\ \forall a \le \mu : \quad \Pr[X \le a] &\le e^{-I(a)} . \end{align*}

Theorem 2.14 (Properties of the rate function). Let XX be a random variable with mean μ\mu and rate function I(a)I(a). Let \ell and uu be the minimum and maximum values that XX can take, and assume <u\ell < u. Then

  1. I(a)=+I(a) = +\infty if a<a < \ell or a>ua > u;
  2. 0I(a)<+0 \le I(a) < +\infty for all a[,u]a \in [\ell, u];
  3. I(μ)=0I(\mu) = 0.
  4. I(a)I(a) is convex on the interval [,u][\ell, u];

Chernoff Bounds for Sample Averages

Lemma. Let X:=1ni=1nXiX := \frac{1}{n} \sum_{i=1}^n X_i be the sample average of nn i.i.d. random variables X1,,XnX_1, \dots, X_n, each with rate function I(a)I(a). Then

IX(a)=nI(a).\begin{align*} I_X(a) = n I(a) . \end{align*}

Theorem 2.15 (Generic Chernoff bound for sample averages). Let X:=1ni=1nXiX := \frac{1}{n} \sum_{i=1}^n X_i be the sample average of nn i.i.d. random variables X1,,XnX_1, \dots, X_n, each with mean μ\mu and rate function I(a)I(a). Then

aμ:Pr[Xa]enI(a),aμ:Pr[Xa]enI(a).\begin{align*} \forall a \ge \mu : \quad \Pr[X \ge a] &\le e^{-n I(a)} , \\ \forall a \le \mu : \quad \Pr[X \le a] &\le e^{-n I(a)} . \end{align*}

Thus

Pr[Xμϵ]enI(μ+ϵ),Pr[Xμϵ]enI(μϵ),Pr[Xμϵ]enI(μ+ϵ)+enI(μϵ)2enmin{I(μ+ϵ),I(μϵ)}.\begin{align*} \Pr[X - \mu \ge \epsilon] &\le e^{-n I(\mu + \epsilon)} , \\ \Pr[X - \mu \le -\epsilon] &\le e^{-n I(\mu - \epsilon)} , \\ \Pr[|X - \mu| \ge \epsilon] &\le e^{-n I(\mu + \epsilon)} + e^{-n I(\mu - \epsilon)} \le 2e^{-n \min\{I(\mu + \epsilon), I(\mu - \epsilon)\}} . \end{align*}

**Theorem 2.16 (Cramér's theorem). **Let D\mathcal{D} be a probability distribution with mean μ\mu and rate function I(a)I(a). Let X(n):=1ni=1nXiX^{(n)} := \frac{1}{n} \sum_{i=1}^n X_i be the sample average of nn i.i.d. random variables X1,,XnX_1, \dots, X_n, drawn from D\mathcal{D}. Then as nn \to \infty,

for any fixed aμ:Pr[X(n)a]=en(I(a)+o(1)),for any fixed aμ:Pr[X(n)a]=en(I(a)+o(1)).\begin{align*} \text{for any fixed } a \ge \mu : \quad \Pr[X^{(n)} \ge a] &= e^{-n(I(a) + o(1))} , \\ \text{for any fixed } a \le \mu : \quad \Pr[X^{(n)} \le a] &= e^{-n(I(a) + o(1))} . \end{align*}

Chernoff Bounds for Small Deviation

I(μ+ϵ)=I(μ)+I(μ)ϵ+12I(μ)ϵ2+O(ϵ3)=12I(μ)ϵ2+O(n3/2),\begin{align*} I(\mu + \epsilon) &= I(\mu) + I'(\mu)\epsilon + \frac{1}{2}I''(\mu)\epsilon^2 + \mathcal{O}(\epsilon^3) \\ &= \frac{1}{2}I''(\mu)\epsilon^2 + \mathcal{O}(n^{-3/2}), \end{align*}

where we used I(μ)=0I(\mu) = 0, I(μ)=0I'(\mu) = 0, and ϵ=O(1n)\epsilon = \mathcal{O}\left(\frac{1}{\sqrt{n}}\right).

Pr[Xμ+ϵ]enI(μ+ϵ)e12I(μ)nϵ2+O(n1/2),Pr[Xμϵ]enI(μϵ)e12I(μ)nϵ2+O(n1/2).\begin{align*} \Pr[X \ge \mu + \epsilon] &\le e^{-nI(\mu+\epsilon)} \le e^{-\frac{1}{2} I''(\mu)n\epsilon^2 + \mathcal{O}(n^{-1/2})} , \\ \Pr[X \le \mu - \epsilon] &\le e^{-nI(\mu-\epsilon)} \le e^{-\frac{1}{2} I''(\mu)n\epsilon^2 + \mathcal{O}(n^{-1/2})} . \end{align*}

2.4 Hoeffding’s Inequality for Bounded Variables

Symmetric Bernoulli Variables

Theorem 2.17. Let XBernoulli(12)X \sim \text{Bernoulli}(\frac{1}{2}). Then for all tRt \in \mathbb{R},

E[et(XE[X])]et2/8.\begin{align*} \mathbb{E}[e^{t(X-\mathbb{E}[X])}] \leq e^{t^2/8}. \end{align*}

Theorem 2.18. Let XBernoulli(12)X \sim \text{Bernoulli}(\frac{1}{2}). Then for all ϵR\epsilon \in \mathbb{R},

IX(1/2+ϵ)2ϵ2.\begin{align*} I_X(1/2 + \epsilon) \geq 2\epsilon^2. \end{align*}

Theorem 2.19. Let X:=1ni=1nXiX := \frac{1}{n} \sum_{i=1}^n X_i be the sample average of nn independent random variables X1,,XnBernoulli(1/2)X_1, \dots, X_n \sim \text{Bernoulli}(1/2). Then for all ϵ0\epsilon \geq 0,

Pr[X1/2+ϵ]e2nϵ2,Pr[X1/2ϵ]e2nϵ2.\begin{align*} \text{Pr}[X \geq 1/2 + \epsilon] \leq e^{-2n\epsilon^2}, \\ \text{Pr}[X \leq 1/2 - \epsilon] \leq e^{-2n\epsilon^2}. \end{align*}

Hoeffding’s Inequality

Theorem 2.20 (Hoeffding's Theorem). Let XX be a random variable taking values in [,u][\ell, u] for some u\ell \leq u. Then for all tRt \in \mathbb{R},

E[et(XE[X])]et2(u)2/8.\begin{align*} \mathbb{E}[e^{t(X-\mathbb{E}[X])}] \leq e^{t^2(u-\ell)^2/8}. \end{align*}

Theorem 2.21. Let XX be a random variable taking values in [,u][\ell, u] for some u\ell \leq u. Then for all ϵR\epsilon \in \mathbb{R},

IX(E[X]+ϵ)2ϵ2/(u)2.\begin{align*} I_X(\mathbb{E}[X] + \epsilon) \geq 2\epsilon^2 / (u-\ell)^2. \end{align*}

Theorem 2.22 (Hoeffding's inequality, i.i.i.d. case). Let X:=1ni=1nXiX := \frac{1}{n} \sum_{i=1}^n X_i be the sample average of nn i.i.d. random variables X1,,XnX_1, \dots, X_n taking values in [,u][\ell, u] for some u\ell \leq u. Then for all ϵ0\epsilon \geq 0,

Pr[XE[X]+ϵ]e2nϵ2/(u)2,Pr[XE[X]ϵ]e2nϵ2/(u)2.\begin{align*} \Pr[X \geq \mathbb{E}[X] + \epsilon] \leq e^{-2n\epsilon^2 / (u-\ell)^2},\\ \Pr[X \leq \mathbb{E}[X] - \epsilon] \leq e^{-2n\epsilon^2 / (u-\ell)^2}. \end{align*}

Theorem 2.23 (Hoeffding’s inequality). Let X:=1ni=1nXiX := \frac{1}{n} \sum_{i=1}^n X_i be the sample average of nn independent random variables X1,,XnX_1, \dots, X_n taking values in intervals [1,u1],,[n,un][\ell_1, u_1], \dots, [\ell_n, u_n], respectively. Let s2s^2 be the mean squared interval length:

s2:=1ni=1n(uii)2.\begin{align*} s^2 := \frac{1}{n} \sum_{i=1}^n (u_i - \ell_i)^2. \end{align*}

Then for all ϵ0\epsilon \geq 0,

Pr[XE[X]+ϵ]e2nϵ2/s2,Pr[XE[X]ϵ]e2nϵ2/s2.\begin{align*} \Pr[X \geq \mathbb{E}[X] + \epsilon] \leq e^{-2n\epsilon^2 / s^2}, \\ \Pr[X \leq \mathbb{E}[X] - \epsilon] \leq e^{-2n\epsilon^2 / s^2}. \end{align*}

Theorem 2.24. Let X:=i=1nΔiX := \sum_{i=1}^n \Delta_i be the sum of nn independent random variables Δ1,,Δn\Delta_1, \dots, \Delta_n taking values in intervals [~1,u~1],,[~n,u~n][\tilde{\ell}_1, \tilde{u}_1], \dots, [\tilde{\ell}_n, \tilde{u}_n], respectively. Let L2L^2 be the sum of the squared interval lengths:

L2:=i=1n(u~i~i)2.\begin{align*} L^2 := \sum_{i=1}^n (\tilde{u}_i - \tilde{\ell}_i)^2. \end{align*}

Then for all ϵ0\epsilon \geq 0,

Pr[XE[X]+ϵ]e2ϵ2/L2,Pr[XE[X]ϵ]e2ϵ2/L2.\begin{align*} \Pr[X \geq \mathbb{E}[X] + \epsilon] \leq e^{-2\epsilon^2 / L^2}, \\ \Pr[X \leq \mathbb{E}[X] - \epsilon] \leq e^{-2\epsilon^2 / L^2}. \end{align*}

Azuma–Hoeffding Inequality for Martingales

Definition (Martingale). Let Z0,Z1,,ZnZ_0, Z_1, \dots, Z_n be random variables. We say that (Zi)i=0n(Z_i)_{i=0}^n is a martingale if for every i1i \geq 1,

E[ZiZ0,,Zi1]=Zi1.\begin{align*} \mathbb{E}[Z_i \mid Z_0, \dots, Z_{i-1}] = Z_{i-1}. \end{align*}

Theorem 2.25 (Azuma’s lemma). Let (Zi)i=0n(Z_i)_{i=0}^n be a martingale. Assume that for each i[n]i \in [n], the increment Δi:=ZiZi1\Delta_i := Z_i - Z_{i-1} always lies in an interval [~i,u~i][\tilde{\ell}_i, \tilde{u}_i] for some ~i,u~iR\tilde{\ell}_i, \tilde{u}_i \in \mathbb{R}. Let L2L^2 be the sum of the squared interval lengths:

L2:=i=1n(u~i~i)2.\begin{align*} L^2 := \sum_{i=1}^n (\tilde{u}_i - \tilde{\ell}_i)^2. \end{align*}

Then for every tRt \in \mathbb{R},

E[et(ZnZ0)]et2L2/8.\begin{align*} \mathbb{E}[e^{t(Z_n-Z_0)}] \leq e^{t^2 L^2 / 8}. \end{align*}

**Theorem 2.26 (Azuma–Hoeffding inequality). ** Let (Zi)i=0n(Z_i)_{i=0}^n be a martingale. Assume that for each i[n]i \in [n], the increment Δi:=ZiZi1\Delta_i := Z_i - Z_{i-1} always lies in an interval [~i,u~i][\tilde{\ell}_i, \tilde{u}_i] for some ~i,u~iR\tilde{\ell}_i, \tilde{u}_i \in \mathbb{R}. Let L2L^2 be the sum of the squared interval lengths:

L2:=i=1n(u~i~i)2.\begin{align*} L^2 := \sum_{i=1}^n (\tilde{u}_i - \tilde{\ell}_i)^2. \end{align*}

Then for all ϵ0\epsilon \geq 0,

Pr[ZnZ0+ϵ]e2ϵ2/L2,Pr[ZnZ0ϵ]e2ϵ2/L2.\begin{align*} \Pr[Z_n \geq Z_0 + \epsilon] \leq e^{-2\epsilon^2/L^2}, \\ \Pr[Z_n \leq Z_0 - \epsilon] \leq e^{-2\epsilon^2/L^2}. \end{align*}