Polya Urn Model and Exchangeability

Model Settings

Suppose we have an urn with \(a\) black balls (denoted as \(1\)), and white balls (denoted as \(0\)). For each trail, we randomly draw one from the urn and then put back a copy of what was sampled(“over-replacing”), and thus, we have:

\[ P(X_1=1)=\frac{a}{a+b},P(X_1=0)=\frac{b}{a+b}, \] and \[ P(X_n=x_n|X_1=x_1,\cdots,X_{n-1}=x_{n-1})\\=(\frac{a+\sum_{i=1}^{n-1}x_{i}}{a+b+n-1})^{x_n}(\frac{b+\sum_{i=1}^{n-1}\bar x_{i}}{a+b+n-1})^{\bar x_n}, \] where \(X_1,\cdots,X_n\) are the results of each sampling which can be viewed as Bernoulli trails and \(x_i\in\{0,1\}\), \(\bar x_i=1-x_i\), \(\forall i=1,2,\cdots,n\).

Joint Probability Distribution

Now, we have a problem: what is the joint distribution of \(X_1,\cdots,X_n\)?

\[ \begin{aligned} &P(X_1=x_1,\cdots,X_n=x_n)\\ =&P(X_1=x_1)P(X_2=x_2|X_1=x_1)\cdots P(X_n=x_n|X_1=x_1,\cdots,X_{n-1}=x_{n-1})\\ =&\frac{[a^{x_1}(a+x_1)^{x_2}\cdots(a+x_1+\cdots+x_{n-1})^{x_n}][b^{\bar x_1}(b+\bar x_1)^{\bar x_2}\cdots (b+\bar x_1+\cdots+\bar x_{n-1})^{\bar x_n}]}{(a+b)(a+b+1)\cdots(a+b+n-1)}\\ =&\frac{[a(a+1)\cdots(a+\sum_{i=1}^{n}x_i-1)][b(b+1)\cdots(b+ \sum_{i=1}^{n}\bar x_i-1)]}{(a+b)(a+b+1)\cdots(a+b+n-1)}\\ =&\frac{[\Gamma(a+\sum_{i=1}^{n}x_i)/\Gamma(a)][\Gamma(b+\sum_{i=1}^{n}\bar x_i)/\Gamma(b)]}{\Gamma(a+b+n)/\Gamma(a+b)}\\ =&\frac{\Gamma(a+b)\Gamma(a+\sum_{i=1}^{n}x_i)\Gamma(b+n-\sum_{i=1}^{n}x_i)}{\Gamma(a+b+n)\Gamma(a)\Gamma(b)}. \end{aligned} \] We can see that this is a function with respect to \(\sum_{i=1}^nx_i\).

Exchangability

Let \(\pi=(\pi_1,\cdots,\pi_n)\) be a fixed permutation of \(\{1,2,\cdots,n\}\), and \(Y=(Y_1,\cdots,Y_n):=(X_{\pi_1},\cdots,X_{\pi_n})\). It means that \(Y=\{Y_1,\cdots,Y_n\}\) is a permutation of \(X=\{X_1,\cdots,X_n\}\).

Our problem is: what is the probability distribution \(P(Y_1=y_1,\cdots,Y_n=y_n)\)?

We define \(\phi=\{\phi_1,\cdots,\phi_n\}\), satisfying \(\phi_i=j\iff\pi_j=i\), and then we have

\[ \begin{aligned} P(Y_1=y_1,\cdots,Y_n=y_n)&=P(X_1=y_{\phi_1},\cdots,X_n=y_{\phi_n})\\ &=f(\sum_{i=1}^ny_{\phi_i})\\ &=P(X_1=y_1,\cdots,X_n=y_n). \end{aligned} \]

Definition If the distribution of \(X_\pi\) is the same as \(X\) for any permutation \(\pi\), then we say \(X\) is exchangeable.

Note: de Finetti viewed the exchangeability as an important way to express uncertainty.

A Fact about Identical Distribution

Exchangeable \(\Rightarrow\) Marginally identically distributed.

A Toy Example

\[ \begin{aligned} P(X_m=1)&=\sum_{x:x_m=1} P(X_1=x_1,\cdots,X_n=x_n)\\ &=\sum_{x:x_m=1}P(X_1=x_m,\cdots,X_m=x_1,\cdots,X_n=x_n)\\ &=P(X_1=1). \end{aligned} \] It means that \(X_1\) and \(X_m\) are (marginally) identically distributed.

2-Stage Experiment

Settings

\(Z\sim f\), \(X|Z=z\sim g\) other distribution.(Hierarchical model)

Fact

Mixture of conditional i.i.d. random variables are exchangable.

Example

If we have \(X=(X_1,\cdots,X_n)\), and \((X_i|Z=z)\sim \mathrm{Beroulli}(z)\), and \(X_1,X_2,\cdots,X_n\) are i.i.d..

If \(Z\) satisfies \(P(Z=0.1)=P(Z=0.9)=\frac{1}{2}\), then we have \[ \begin{aligned} P(X=x)&=\frac{1}{2}P(X=x|Z=0.1)+\frac{1}{2}P(X=x|Z=0.9)\\ &=\frac{1}{2}(\frac{1}{10})^{\sum x_i}(\frac{9}{10})^{n-\sum x_i}+\frac{1}{2}(\frac{9}{10})^{\sum x_i}(\frac{1}{10})^{n-\sum x_i} \end{aligned} \]
If \(Z\sim \mathrm{Beta}(a,b)\), which means the pdf \(f(z)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}z^{a-1}(1-z)^{b-1}\), then

\[ P(X=x)=\int_0^1P(X=x|Z=z)dz=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\cdot\frac{\Gamma(a+\sum x_i)\Gamma(b+n-\sum x_i)}{\Gamma(a+b+n)}. \]

It is easy to see that the Polya urn model \(\mathrm{Polya}(a,b)\) is a Beta mixture of i.i.d. Bernoulli’s.

de Finetti Theorem

For \(X_1,\cdots,X_n\), they are exchageable if and only if \(\exists Z\sim p(z)\), such that \(X_i\)’s are i.i.d conditional on \(Z\).

Moreover, we have \[\lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^n X_i=Z\ (\mathrm{a.s.}).\]

For extensions, see Hewitt-Sevage.

Related Problems

Problems 1

Consider the binary (values 0 and 1) Markov chain \(X_1, X_2, ..., X_n\) for which \(P[X_1=1] = \alpha/(\alpha+\beta)\) and for any \(i=2, 3, ..., n, P[X_i=1|X_1=x_1, X_2=x_2, ... X_{i-1} = c ]\) equals \(\alpha\), if \(c=0\) and equals \(1-\beta\) if \(c=1\) for transition rates \(\alpha\) and \(\beta\) both in \((0,1)\). Show that \(X_1,X_2,...,X_n\) is not exchangeable, but these variables are identically distributed.

Solution:

We first show that they are identically distributed. We prove it by induction:

First, \(P(X_1=1)=\frac{\alpha}{\alpha+\beta}\), now we assume for any specific \(n\), \(X_n\) has its marginal distribution \[P(X_n=1)=\frac{\alpha}{\alpha+\beta},\] then, for \(X_{n+1}\), we have \[ \begin{aligned} P(X_{n+1}=1)&=P(X_{n+1}=1|X_n=0)P(X_n=0)+P(X_{n+1}=1|X_n=1)P(X_n=1)\\ &=\alpha\cdot\frac{\beta}{\alpha+\beta}+(1-\beta)\cdot\frac{\alpha}{\alpha+\beta}\\ &=\frac{\alpha}{\alpha+\beta}. \end{aligned} \] So, they are identically distributed.

Now, we show they are not exchangeable. Here, it suffices to only consider \(X_1,X_2,X_3\) case.

\[ \begin{aligned} &P(X_1=1,X_2=0,X_3=0)\\ =&P(X_1=1)P(X_2=0|X_1=1)P(X_3=0|X_1=1,X_2=0)\\ =&\frac{\alpha}{\alpha+\beta}\cdot \beta\cdot (1-\alpha) \end{aligned} \] and \[ \begin{aligned} &P(X_1=0,X_2=1,X_3=0)\\ =&P(X_1=0)P(X_2=1|X_1=0)P(X_3=0|X_1=0,X_2=1)\\ =&\frac{\beta}{\alpha+\beta}\cdot \alpha\cdot \beta. \end{aligned} \] So, we can see that \(P(X_1=1,X_2=0,X_3=0)\neq P(X_1=0,X_2=1,X_3=0)\), and thus, they are not exchangeable.

Note: the proof above only shows that they are not necessarily exchangeable. It is not hard to see that these variables are exchangeable (for any \(n\)) if and only if \(\alpha=\beta=1/2\).

Problem 2

The so-called Chinese restaurant process is a discrete/continuous variation of Polya’s urn model. In one version, the first variable \(X_1\) has a standard normal distribution [written \(X_1 \sim N(0,1)\)]. There is a single parameter \(m\), say. With probability \(1/(m+1), X_2\) equals \(X_1\) exactly; with probability \(m/(m+1)\), on the other hand, \(X_2\) is an independent \(N(0,1)\) variate. With probability \(m/(m+2)\), then, \(X_3\) is an independent \(N(0,1)\); else it is \(X_1\) w.p. \(1/(m+2)\) or \(X_2\) w.p. \(1/(m+2)\). Let’s stop at three and consider \((X_1,X_2,X_3)\). Work out the marginal distribution of each variable. Also work out the joint distribution of \((X_2,X_3)\) and similarly the conditional distribution of \(X_1\) given \(X_2\) and \(X_3\).

Solution:

From the statement, generally we can conclude 5 possible results:

\(E_1=\{X_1=X_2=X_3\}\), and \(P(E_1)=\frac{2}{(m+1)(m+2)}\)
\(E_2=\{X_1=X_2,X_3\perp X_1\}\), and \(P(E_2)=\frac{m}{(m+1)(m+2)}\)
\(E_3=\{X_1=X_3,X_2\perp X_1\}\), and \(P(E_3)=\frac{m}{(m+1)(m+2)}\)
\(E_4=\{X_2=X_3,X_3\perp X_1\}\), and \(P(E_4)=\frac{m}{(m+1)(m+2)}\)
\(E_5=\{X_1,X_2,X_3\mathrm{\ mutually\ indenpendent}\}\), and \(P(E_5)=\frac{m^2}{(m+1)(m+2)}\)

First, we consider the marginal distritution of the three variable. Since whether they are independent or not, they are all normal distributed. So, their marninal distribution is \(N(0,1)\).

Now, consider the joint distribution of \((X_2,X_3)\). There are two situations: \(X_2=X_3\) and \(X_2\perp X_3\), with probability \(1/(m+1)\) and \(m/(m+1)\). Thus, the joint distribution of \((X_2,X_3)\) is \[ f_{(X_2,X_3)}(x,y)=\frac{1}{(m+1)\sqrt{2\pi}}\exp(-\frac{x^2}{2})\mathbb{I}(y=x)+\frac{m}{(m+1)(2\pi)}\exp(-\frac{x^2+y^2}{2}) \] and similarly, the conditional distribution of \(X_1\) given \((X_2,X_3)\) is \[ \begin{aligned} f_{X_1|(X_2,X_3)}(x|x_2,x_3)=&\frac{2}{(m+1)(m+2)}\mathbb{I}(x=x_2=x_3)+\frac{m}{(m+1)(m+2)}\mathbb{I}(x=x_2)+\\ &\frac{m}{(m+1)(m+2)}\mathbb{I}(x=x_3)+\frac{m}{(m+1)(m+2)}\frac{1}{\sqrt{2\pi}}\exp(-\frac{x^2}{2})\mathbb{I}(x_2=x_3)+\\ &\frac{m^2}{(m+1)(m+2)}\frac{1}{\sqrt{2\pi}}\exp(-\frac{x^2}{2}). \end{aligned} \]