2 Sampling and Concentration

2.1 Markov's Inequality

Theorem 2.1 (Markov's inequality). If $X$ is a non-negative random variable, i.e., $\Pr[X \ge 0] = 1$ , then for every $a > 0$ ,

\begin{align*} \Pr[X \ge a] \le \frac{\mathbb{E}[X]}{a}. \end{align*}

FKS Perfect Hashing

At the first level, we set the hash space size to be exactly the number of keys to be stored, which is $n$ . We sample a hash function $h^{(1)} : K \to \{0, 1, \dots, n-1\}$ from a universal hash family $\mathcal{H}^{(1)}$ to distribute the keys into $n$ buckets $B_0, \dots, B_{n-1}$ according to the hash values. That is, for each bucket $j$ ,

\begin{align*} B_j := \{k_i : h^{(1)}(k_i) = j\}. \end{align*}

The total number of cells consumed by the construction is

\begin{align*} M := \sum_{j=1}^{n} m_j = \sum_{j=1}^{n} n_j^2. \end{align*}

If $M \ge 4n$ , we resample the hash functions and repeat the construction process until $M < 4n$ .

Finally, for each key $k_i$ in bucket $B_j$ , we store the key-value pair $(k_i, v_i)$ in cell $h^{(2,j)}(k_i)$ of the second-level table for bucket $j$ .

At the query time, given a query key $k$ , we first compute $j = h^{(1)}(k)$ to locate the corresponding first-level bucket. Then we compute the second-level address $h^{(2,j)}(k)$ inside bucket $j$ . Finally, we check if the cell at address $h^{(2,j)}(k)$ matches the query key $k$ . If so, we return the corresponding value $v_i$ .

Theorem 2.2. Let $X$ be a random variable and $f : \mathbb{R} \to \mathbb{R}$ be a function. If $\Pr[f(X) \ge 0] = 1$ , then for any $a > 0$ ,

\begin{align*} \Pr[f(X) \ge a] \le \frac{\mathbb{E}[f(X)]}{a}. \end{align*}

Theorem 2.3. If $\Pr[X \le u] = 1$ for some $u \in \mathbb{R}$ , then for any $a < u$ ,

\begin{align*} \Pr[X \le a] \le \frac{u - \mathbb{E}[X]}{u - a}. \end{align*}

2.2 Chebyshev’s Inequality

Variance

The variance of a random variable $X$ is defined as

\begin{align*} \text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]. \end{align*}

Theorem 2.4. For any random variable $X$ ,

\begin{align*} \text{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2. \end{align*}

Theorem 2.5 (Chebyshev's inequality). For any random variable $X$ and any $\epsilon > 0$ ,

\begin{align*} \Pr\left[|X - \mathbb{E}[X]| \ge \epsilon\right] \le \frac{\text{Var}(X)}{\epsilon^2}. \end{align*}

Definition (Standard deviation). The standard deviation of a random variable $X$ is defined as

\begin{align*} \sigma(X) = \sqrt{\text{Var}(X)}. \end{align*}

Theorem 2.6. For any random variable $X$ and any $\alpha > 0$ ,

\begin{align*} \Pr\left[|X - \mathbb{E}[X]| \ge \alpha \cdot \sigma(X)\right] \le \frac{1}{\alpha^2}. \end{align*}

Theorem 2.7. For any random variable $X$ and any constant $c \in \mathbb{R}$ ,

\begin{align*} \text{Var}(cX) = c^2\text{Var}(X). \end{align*}

Theorem 2.8. If $X$ and $Y$ are independent random variables, then

\begin{align*} \text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y). \end{align*}

Theorem 2.9 (Chebyshev for sample averages). Let $X := \frac{1}{n} \sum_{i=1}^{n} X_i$ be the sample average of $n$ i.i.d. random variables $X_1, \dots, X_n$ , each with mean $\mu$ and variance $\sigma^2$ . Then for every $\epsilon > 0$ ,

\begin{align*} \Pr\left[|X - \mu| \ge \epsilon\right] \le \frac{\sigma^2}{n\epsilon^2} \quad \text{where} \quad X := \frac{1}{n}\sum_{i=1}^{n} X_i. \end{align*}

2.3 Chernoff Bounds

Moment Generating Functions

The moment generating function (MGF) of a random variable $X$ is defined as

\begin{align*} M_X(t) := \mathbb{E}[e^{tX}] = \mathbb{E}\left[ \sum_{k=0}^{\infty} \frac{t^k}{k!} X^k \right] = \sum_{k=0}^{\infty} \frac{t^k}{k!} \mathbb{E}[X^k]. \end{align*}

Theorem 2.10 (Properties of the MGF). Let $X$ be a random variable. Then its MGF $M(t)$ satisfies the following properties:

$M(0) = 1$ ;
$M(t) \ge e^{t\mathbb{E}[X]}$ for all $t \in \mathbb{R}$ .

Generic Chernoff Bounds

Theorem 2.11 (Generic Chernoff bound (upper tail)). Let $X$ be a random variable with mean $\mu$ and MGF $M(t)$ . Then for any $a \ge \mu$ ,

\begin{align*} \Pr[X \ge a] \le \inf_{t \in \mathbb{R}} \{e^{-ta} M(t)\}. \end{align*}

Theorem 2.12 (Generic Chernoff bound (lower tail)). Let $X$ be a random variable with mean $\mu$ and MGF $M(t)$ . Then for any $a \le \mu$ ,

\begin{align*} \Pr[X \le a] \le \inf_{t \in \mathbb{R}} \{e^{-ta} M(t)\}. \end{align*}

Definition (Rate function). The rate function of a random variable $X$ is defined as

\begin{align*} I_X(a) := \sup_{t \in \mathbb{R}} \{ta - \log M_X(t)\} \,, \end{align*}

with the convention that $I_X(a) = +\infty$ if the supremum is unbounded.

Theorem 2.13 (Generic Chernoff bound in terms of the rate function). Let $X$ be a random variable with mean $\mu$ and rate function $I(a)$ . Then

\begin{align*} \forall a \ge \mu : \quad \Pr[X \ge a] &\le e^{-I(a)} , \\ \forall a \le \mu : \quad \Pr[X \le a] &\le e^{-I(a)} . \end{align*}

Theorem 2.14 (Properties of the rate function). Let $X$ be a random variable with mean $\mu$ and rate function $I(a)$ . Let $\ell$ and $u$ be the minimum and maximum values that $X$ can take, and assume $\ell < u$ . Then

$I(a) = +\infty$ if $a < \ell$ or $a > u$ ;
$0 \le I(a) < +\infty$ for all $a \in [\ell, u]$ ;
$I(\mu) = 0$ .
$I(a)$ is convex on the interval $[\ell, u]$ ;

Chernoff Bounds for Sample Averages

Lemma. Let $X := \frac{1}{n} \sum_{i=1}^n X_i$ be the sample average of $n$ i.i.d. random variables $X_1, \dots, X_n$ , each with rate function $I(a)$ . Then

\begin{align*} I_X(a) = n I(a) . \end{align*}

Theorem 2.15 (Generic Chernoff bound for sample averages). Let $X := \frac{1}{n} \sum_{i=1}^n X_i$ be the sample average of $n$ i.i.d. random variables $X_1, \dots, X_n$ , each with mean $\mu$ and rate function $I(a)$ . Then

\begin{align*} \forall a \ge \mu : \quad \Pr[X \ge a] &\le e^{-n I(a)} , \\ \forall a \le \mu : \quad \Pr[X \le a] &\le e^{-n I(a)} . \end{align*}

Thus

\begin{align*} \Pr[X - \mu \ge \epsilon] &\le e^{-n I(\mu + \epsilon)} , \\ \Pr[X - \mu \le -\epsilon] &\le e^{-n I(\mu - \epsilon)} , \\ \Pr[|X - \mu| \ge \epsilon] &\le e^{-n I(\mu + \epsilon)} + e^{-n I(\mu - \epsilon)} \le 2e^{-n \min\{I(\mu + \epsilon), I(\mu - \epsilon)\}} . \end{align*}

**Theorem 2.16 (Cramér's theorem). **Let $\mathcal{D}$ be a probability distribution with mean $\mu$ and rate function $I(a)$ . Let $X^{(n)} := \frac{1}{n} \sum_{i=1}^n X_i$ be the sample average of $n$ i.i.d. random variables $X_1, \dots, X_n$ , drawn from $\mathcal{D}$ . Then as $n \to \infty$ ,

\begin{align*} \text{for any fixed } a \ge \mu : \quad \Pr[X^{(n)} \ge a] &= e^{-n(I(a) + o(1))} , \\ \text{for any fixed } a \le \mu : \quad \Pr[X^{(n)} \le a] &= e^{-n(I(a) + o(1))} . \end{align*}

Chernoff Bounds for Small Deviation

\begin{align*} I(\mu + \epsilon) &= I(\mu) + I'(\mu)\epsilon + \frac{1}{2}I''(\mu)\epsilon^2 + \mathcal{O}(\epsilon^3) \\ &= \frac{1}{2}I''(\mu)\epsilon^2 + \mathcal{O}(n^{-3/2}), \end{align*}

where we used $I(\mu) = 0$ , $I'(\mu) = 0$ , and $\epsilon = \mathcal{O}\left(\frac{1}{\sqrt{n}}\right)$ .

\begin{align*} \Pr[X \ge \mu + \epsilon] &\le e^{-nI(\mu+\epsilon)} \le e^{-\frac{1}{2} I''(\mu)n\epsilon^2 + \mathcal{O}(n^{-1/2})} , \\ \Pr[X \le \mu - \epsilon] &\le e^{-nI(\mu-\epsilon)} \le e^{-\frac{1}{2} I''(\mu)n\epsilon^2 + \mathcal{O}(n^{-1/2})} . \end{align*}

2.4 Hoeffding’s Inequality for Bounded Variables

Symmetric Bernoulli Variables

Theorem 2.17. Let $X \sim \text{Bernoulli}(\frac{1}{2})$ . Then for all $t \in \mathbb{R}$ ,

\begin{align*} \mathbb{E}[e^{t(X-\mathbb{E}[X])}] \leq e^{t^2/8}. \end{align*}

Theorem 2.18. Let $X \sim \text{Bernoulli}(\frac{1}{2})$ . Then for all $\epsilon \in \mathbb{R}$ ,

\begin{align*} I_X(1/2 + \epsilon) \geq 2\epsilon^2. \end{align*}

Theorem 2.19. Let $X := \frac{1}{n} \sum_{i=1}^n X_i$ be the sample average of $n$ independent random variables $X_1, \dots, X_n \sim \text{Bernoulli}(1/2)$ . Then for all $\epsilon \geq 0$ ,

\begin{align*} \text{Pr}[X \geq 1/2 + \epsilon] \leq e^{-2n\epsilon^2}, \\ \text{Pr}[X \leq 1/2 - \epsilon] \leq e^{-2n\epsilon^2}. \end{align*}

Hoeffding’s Inequality

Theorem 2.20 (Hoeffding's Theorem). Let $X$ be a random variable taking values in $[\ell, u]$ for some $\ell \leq u$ . Then for all $t \in \mathbb{R}$ ,

\begin{align*} \mathbb{E}[e^{t(X-\mathbb{E}[X])}] \leq e^{t^2(u-\ell)^2/8}. \end{align*}

Theorem 2.21. Let $X$ be a random variable taking values in $[\ell, u]$ for some $\ell \leq u$ . Then for all $\epsilon \in \mathbb{R}$ ,

\begin{align*} I_X(\mathbb{E}[X] + \epsilon) \geq 2\epsilon^2 / (u-\ell)^2. \end{align*}

Theorem 2.22 (Hoeffding's inequality, i.i.i.d. case). Let $X := \frac{1}{n} \sum_{i=1}^n X_i$ be the sample average of $n$ i.i.d. random variables $X_1, \dots, X_n$ taking values in $[\ell, u]$ for some $\ell \leq u$ . Then for all $\epsilon \geq 0$ ,

\begin{align*} \Pr[X \geq \mathbb{E}[X] + \epsilon] \leq e^{-2n\epsilon^2 / (u-\ell)^2},\\ \Pr[X \leq \mathbb{E}[X] - \epsilon] \leq e^{-2n\epsilon^2 / (u-\ell)^2}. \end{align*}

Theorem 2.23 (Hoeffding’s inequality). Let $X := \frac{1}{n} \sum_{i=1}^n X_i$ be the sample average of $n$ independent random variables $X_1, \dots, X_n$ taking values in intervals $[\ell_1, u_1], \dots, [\ell_n, u_n]$ , respectively. Let $s^2$ be the mean squared interval length:

\begin{align*} s^2 := \frac{1}{n} \sum_{i=1}^n (u_i - \ell_i)^2. \end{align*}

Then for all $\epsilon \geq 0$ ,

\begin{align*} \Pr[X \geq \mathbb{E}[X] + \epsilon] \leq e^{-2n\epsilon^2 / s^2}, \\ \Pr[X \leq \mathbb{E}[X] - \epsilon] \leq e^{-2n\epsilon^2 / s^2}. \end{align*}

Theorem 2.24. Let $X := \sum_{i=1}^n \Delta_i$ be the sum of $n$ independent random variables $\Delta_1, \dots, \Delta_n$ taking values in intervals $[\tilde{\ell}_1, \tilde{u}_1], \dots, [\tilde{\ell}_n, \tilde{u}_n]$ , respectively. Let $L^2$ be the sum of the squared interval lengths:

\begin{align*} L^2 := \sum_{i=1}^n (\tilde{u}_i - \tilde{\ell}_i)^2. \end{align*}

Then for all $\epsilon \geq 0$ ,

\begin{align*} \Pr[X \geq \mathbb{E}[X] + \epsilon] \leq e^{-2\epsilon^2 / L^2}, \\ \Pr[X \leq \mathbb{E}[X] - \epsilon] \leq e^{-2\epsilon^2 / L^2}. \end{align*}

Azuma–Hoeffding Inequality for Martingales

Definition (Martingale). Let $Z_0, Z_1, \dots, Z_n$ be random variables. We say that $(Z_i)_{i=0}^n$ is a martingale if for every $i \geq 1$ ,

\begin{align*} \mathbb{E}[Z_i \mid Z_0, \dots, Z_{i-1}] = Z_{i-1}. \end{align*}

Theorem 2.25 (Azuma’s lemma). Let $(Z_i)_{i=0}^n$ be a martingale. Assume that for each $i \in [n]$ , the increment $\Delta_i := Z_i - Z_{i-1}$ always lies in an interval $[\tilde{\ell}_i, \tilde{u}_i]$ for some $\tilde{\ell}_i, \tilde{u}_i \in \mathbb{R}$ . Let $L^2$ be the sum of the squared interval lengths:

\begin{align*} L^2 := \sum_{i=1}^n (\tilde{u}_i - \tilde{\ell}_i)^2. \end{align*}

Then for every $t \in \mathbb{R}$ ,

\begin{align*} \mathbb{E}[e^{t(Z_n-Z_0)}] \leq e^{t^2 L^2 / 8}. \end{align*}

**Theorem 2.26 (Azuma–Hoeffding inequality). ** Let $(Z_i)_{i=0}^n$ be a martingale. Assume that for each $i \in [n]$ , the increment $\Delta_i := Z_i - Z_{i-1}$ always lies in an interval $[\tilde{\ell}_i, \tilde{u}_i]$ for some $\tilde{\ell}_i, \tilde{u}_i \in \mathbb{R}$ . Let $L^2$ be the sum of the squared interval lengths:

\begin{align*} L^2 := \sum_{i=1}^n (\tilde{u}_i - \tilde{\ell}_i)^2. \end{align*}

Then for all $\epsilon \geq 0$ ,

\begin{align*} \Pr[Z_n \geq Z_0 + \epsilon] \leq e^{-2\epsilon^2/L^2}, \\ \Pr[Z_n \leq Z_0 - \epsilon] \leq e^{-2\epsilon^2/L^2}. \end{align*}

2.1 Markov's Inequality​

FKS Perfect Hashing​

2.2 Chebyshev’s Inequality​

Variance​

2.3 Chernoff Bounds​

Moment Generating Functions​

Generic Chernoff Bounds​

Chernoff Bounds for Sample Averages​

Chernoff Bounds for Small Deviation​

2.4 Hoeffding’s Inequality for Bounded Variables​

Symmetric Bernoulli Variables​

Hoeffding’s Inequality​

Azuma–Hoeffding Inequality for Martingales​

2.1 Markov's Inequality

FKS Perfect Hashing

2.2 Chebyshev’s Inequality

Variance

2.3 Chernoff Bounds

Moment Generating Functions

Generic Chernoff Bounds

Chernoff Bounds for Sample Averages

Chernoff Bounds for Small Deviation

2.4 Hoeffding’s Inequality for Bounded Variables

Symmetric Bernoulli Variables

Hoeffding’s Inequality

Azuma–Hoeffding Inequality for Martingales