Processing math: 84%

Back to the Contents

3.1 Introduction

3.2 Linear Regression Models and Least Squares.

In this Chapter, the linear models are of the form

f(X)=Xβ,

where β is unknown parameter and the input Xj can come from

quantitative inputs or their transformations;

basis expansions, such as X2=X21,X3=X31;

numeric or dummy coding of the levels of qualitative inputs;

interactions between variables.

And the basic assumption is E(Y|X) is linear, or the linear model is a reasonable approximation.

Minimizing RSS(β)=(yXβ)(yXβ), we get ˆβ=(XX)1Xy, ˆy=X(XX)1Xy, and H=X(XX)1X is the head matrix.

Greometrical view of least quare: an orthogonal space way.

What if XX is singular(X not full rank)?

Then,ˆβ is not uniquely defined. However, ˆy=Xˆβ is still the projection of y onto the column space of X. Why this happens?

One or more qualitative variables are coded in a reduncant fashion.

dimension p exceed the number of training cases N.

Basically, we can use filtering methods or regularization to solve this problem.

 

For further work, we need more assumptions about Y:

yi are uncorrelated and have constant variance σ2, xi is non-random.

Then, we know Var(ˆβ)=(XX)1σ2, and an estimation of σ2 is ˆσ2=1Np1Ni=1(yiˆyi)2=1Np1Y(IH)Y. To draw inference about parameters, we need additional assumptions: E(Y|X)=Xβ,Y=E(Y|X)+ε, where εN(0,σ2).

Then, ˆβN(β,(XX)1σ2), and (Np1)ˆσ2σ2χ2Np1.

Simple t test

zj=^βjˆσvj, where vj is jth diagonal element of (XX)1.

F test

F=(RSS0RSS1)/(p1p0)RSS1/(Np11)Fp1p0,Np11.

Then, we can derive the confidence interval as well.

3.2.1 Example: Prostate Cancer

3.2.2 The Gauss-Markov Theorem

In this sunsection, we only focus on setimation of any linear combination of the parameters θ=aTβ.

The least squares estimate of aβ is ˆθ=aˆβ=a(XX)1Xy. If we assume that the linear model is correct, then this estimate is unbiased.

Gauss-Markov Theorem For any other linear estimator ˜θ=cy, unbiased for aβ,we have Var(aˆβ) Note: unbiased estimators are not necessarily better than biased estimaters since the unbiased ones may have larger variance.

\bullet MSE v.s. EPE

For Y_0=f(x_0)+\varepsilon_0, the EPE of \tilde f(x_0)=x_0'\tilde\beta is EPE=E(Y_0-\tilde{f}(x_0))^2=\sigma^2+E(x_0'\tilde\beta-f(x_0))^2=\sigma^2+MSE(\tilde f(x_0))

3.2.3 Multiple Regression from Simple Univariate Regression

\bullet Univariate model(without intercept)

Y=X\beta+\varepsilon, and the least squares estimate is \hat\beta=\frac{<x,y>}{<x,x>}, where <\cdot,\cdot> means innerproduct.

Fact: when inputs are orthogonal, they have no effect on each other’s parameter estimates in the model.

The idea of the following algorithm is similar to Gram-Schmidt process, but without normalizing.

   

Back to the Contents