This note is based on the coure STAT 775 instructed by Michael A. Newton. The theory here follows presentations in Jim Berger’s 1985 text, and earlier, Abraham Wald.

Settings

Data \(X\in \mathcal X\)

Parameter \(\theta\in \Theta\)

Model \(p(x|\theta)\)

Actions \(a\in \mathcal A\) possible inferences depending on the context. For estimation, it can be a specific point in the parameter space; for testing, it can be the taking of the null or alternative hypothesis.

Decision Rules \(\delta:\mathcal X\to\mathcal A\). It is a statistical method, the way of processing data to make a certion inference.

Loss \(L(\theta,a):\Theta\times\mathcal A\to \mathbb [0,+\infty)\) It records the penalty associated with action \(a\) is \(\theta\) is the state of nature.

Risk \(R(\theta,\delta)=E(L(\theta,\delta(X)))\) with respect to \(P(X|\theta)\)

A Toy Example

\(X\sim N(\theta,1)\), \(\delta(x)=x\), \(L(x,a)=(x-a)^2\), and the risk is the mean square error.

Note: if \(\delta_c(x)=cx\), then \(R(\theta,\delta_c)=E(\theta-cx)^2=(c-1)^2\theta^2+c^2\).

Now, we have a problem: what is the principle for selecting \(\delta\)? For example, one possible idea is to restrict the class of precedures, and then minimize the risk in that class. (e.g. UMVUE restrict the class to be unbiased; or restrict the significant level.)

Admissiable Rules

Definition We say \(\delta_2\) is not admissiable if \(\exists \delta_1\), s.t. \(R(\theta,\delta_1)\leqslant R(\theta,\delta_2),\forall\theta\in\Theta\), and strict somewhere.

Bayesian Method

Prior \(p(\theta)\)

Bayes Risk By switching the integration order, we can show that the Bayes risk is an average over the sample space, against the marginal predictive distribution \(p(x)\), of the so-called posterior expected loss; i.e. the expected loss against the probability distribution \(p(\theta|x)=p(x|\theta)\frac{p(\theta)}{p(x)}\), which uese Bayes rule.

In fact, \(r(\delta)=E(R(\theta,\delta))\)(w.r.t \(p(\theta)\))

\[ \begin{aligned} r(\delta)&=\int_\theta\int_xL(\theta,\delta(x))p(x|\theta)p(\theta)dxd\theta\\ &=\int_x\int_\theta p(x)L(\theta,\delta(x))\frac{p(x|\theta)p(\theta)}{p(x)}d\theta dx\\ &=\int_xp(x)(\int_\theta L(\theta,\delta(x))p(\theta|x)d\theta )dx \end{aligned}, \] where \(\int_\theta L(\theta,\delta(x))p(\theta|x)d\theta\) is called posterior expected loss \(PEL(\delta(x))\), and \(r(\delta)=E(PEL(\delta(x)))\), with respect to \(p(x)\).

Bayes rule Any procedure that minimize the Bayes risk. In fact, we can compute the Bayes rule by minimizing \(PEL(x,\delta(x))\) for each \(x\) (data point by data point).

Fact under weak reularity conditions, if \(\delta^*\) is the Bayes rule, then \(\delta^*\) is admissible.

Note: This means that whatever their deficiencies, at least they cannot be beaten by any other rule in terms of risk, univormly over the parameter space.

Bayes Methods on High dimension

Decision theory is the source of an interesting result that leads to another sort of motivation for Bayesian analysis. It’s the first hint that Bayesian modeling has special effects as dimensionality of theta grows.

James-Stein: We have \(X=(X_1,X_2,X_3)^T\), \(\theta=(\theta_1,\theta_2,\theta_3)^T\), and \(X\sim N(\theta,I_3)\). To estimate \(\theta\), the loss function is \(L(\theta,a)=\sum_{j=1}^3(\theta_j-a_j)^2\), the squared error loss. Then the natural estimator \(\delta(x)=x\) is inadmissible.