Notes: Principles of Econometrics

SME Note 2

This note is the note for statistical method in economics course. Made for quick check of important formulae.

(poe) ASSUMPTIONS OF THE SIMPLE LINEAR REGRESSION MODEL-II

SR1. The value of \(y\), for each value of \(x\), is

\[y=\beta_1+\beta_2 x+e \]

SR2. The expected value of the random error \(e\) is

\[E(e)=0 \]

which is equivalent to assuming that

\[E(y)=\beta_1+\beta_2 x \]

SR3. The variance of the random error \(e\) is

\[\operatorname{var}(e)=\sigma^2=\operatorname{var}(y) \]

The random variables \(y\) and \(e\) have the same variance because they differ only by a constant.
SR4. The covariance between any pair of random errors \(e_i\) and \(e_j\) is

\[\operatorname{cov}\left(e_i, e_j\right)=\operatorname{cov}\left(y_i, y_j\right)=0 \]

The stronger version of this assumption is that the random errors \(e\) are statistically independent, in which case the values of the dependent variable \(y\) are also statistically independent.
SR5. The variable \(x\) is not random and must take at least two different values.
SR6. (optional) The values of \(e\) are normally distributed about their mean

\[e \sim N\left(0, \sigma^2\right) \]

if the values of \(y\) are normally distributed, and vice versa.

kw: xi, residual

\(\sum_{i=1}^n x_i e_i=0\), the sum of the weighted (by \(x_i\) ) residuals is 0 :

\[\begin{aligned} \sum_{i=1}^n x_i e_i &=\sum_{i=1}^n x_i\left(y_i-b_1-b_2 x_i\right)=\sum_{i=1}^n x_i y_i-n \bar{x} b_1-b_2 \sum_{i=1}^n x_i^2 \\ &=\sum_{i=1}^n x_i y_i-n \bar{x}\left(\bar{y}-b_2 \bar{x}\right)-b_2 \sum_{i=1}^n x_i^2 \\ &=\sum_{i=1}^n x_i y_i-n \bar{x} \bar{y}+b_2 n \bar{x}^2-b_2 \sum_{i=1}^n x_i^2 \\ &=S_{x y}-b_2 S_{x x}=0 \end{aligned} \]

kw: xi, residual

\(x_i\) and \(e_i\) are uncorrelated (in samples):

\[\operatorname{Cov}\left(x_i, e_i\right)=\frac{1}{n-1} \sum_{i=1}^n\left(x_i-\bar{x}\right)\left(e_i-\bar{e}\right)=\frac{1}{n-1}\left(\sum_{i=1}^n x_i e_i-n \bar{x} \bar{e}\right)=0 \]

kw: yhat, residual

\(\sum_{i=1}^n \hat{y}_i e_i=0\), the sum of the weighted (by \(\hat{y}_i\) ) residuals is 0 :

\[\sum_{i=1}^n \hat{y}_i e_i=\sum_{i=1}^n\left(b_1+b_2 x_i\right) e_i=b_1 n \bar{e}+b_2 \sum_{i=1}^n x_i e_i=0 \]

\(\hat{y}_i\) and \(e_i\) are uncorrelated (in samples):

\[\operatorname{Cov}\left(\hat{y}_i, e_i\right)=\frac{1}{n-1} \sum_{i=1}^n\left(\hat{y}_i-\overline{\hat{y}}\right)\left(e_i-\bar{e}\right)=\frac{1}{n-1}\left(\sum_{i=1}^n \hat{y}_i e_i-n \overline{\hat{y}} \bar{e}\right)=0 ; \]

kw: le, linear estimators, variance, covariance

variances and covariances of b1 and b2
are given by

\[\begin{aligned} \operatorname{Var}\left(b_1\right) &=\frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2} ; \\ \operatorname{Var}\left(b_2\right) &=\frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2} ; \\ \operatorname{Cov}\left(b_1, b_2\right) &=\frac{-\bar{x} \sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}, \end{aligned} \]

kw: sample correlation coefficient

\[r_{x y}=\frac{S_{x y}}{\sqrt{S_{x x} S_{y y}}}=\frac{\operatorname{Cov}\left(x_i, y_i\right)}{\sqrt{\operatorname{Var}\left(x_i\right) \operatorname{Var}\left(y_i\right)}}, \]

kw: sst, ssr, sse, r2, coefficient of determination, sample correlation coefficient

\[ SST = SSR + SSE \]

\[R^2=\frac{\mathrm{SSR}}{\mathrm{SST}}=1-\frac{\mathrm{SSE}}{\mathrm{SST}} \]

  • \(R^2=r_{x y}^2\) : The coefficient of determination \(R^2\) is algebraically equal to the square of the sample correlation coefficient \(r_{x y}\) between \(x\) and \(y\). This result is valid in simple linear regression models;
  • \(R^2=r_{y \hat{y}}^2\) : The coefficient of determination \(R^2\) can also be computed as the square of the sample correlation coefficient between \(y\) and \(\hat{y}\). In this case, it measures the "goodness-of-fit" between the sample data and their predicted values. Therefore, \(R^2\) is sometimes called a measure of "goodness-of-fit". This result is valid not only in simple linear regression models but also in multiple linear regression models.

kw: unbiased sample variance

\[\sigma_{\bar{y}}^2=\frac{n}{n-1} \sigma_y^2=\frac{1}{n-1} \sum_{i=1}^n\left(y_i-\bar{y}\right)^2=\frac{S_{y y}}{n-1}, \]

kw: mean squared error, mse

The mean squared error (MSE):

\[\hat{\sigma}^2 \equiv \sigma_{\hat{y}}^2=\frac{1}{n-2} \sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2=\frac{1}{n-2} \sum_{i=1}^n e_i^2=\frac{S_{y y}-2 b_2 S_{x y}+b_2^2 S_{x x}}{n-2} \]

In this case, we have to divide by \(n-2\), because we estimated the unknown population intercept \(\beta_1\) and the population slope \(\beta_2\) by \(b_1\) and \(b_2\), respectively, which "costs us two degrees of freedom";

  • The unbiased sample covariance can be similarly defined as

\[\sigma_{\bar{x} \bar{y}}^2=\frac{1}{n-1} \sum_{i=1}^n\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)=\frac{S_{x y}}{n-1}, \]

why is the denominator being \(n-1\) rather than \(n-2\) ?

? because no \(\hat y_i\) contained ?

kw: standard errors, se, b1, b2

\[\begin{aligned} &\operatorname{se}\left(b_1\right)=\sqrt{\widehat{\operatorname{Var}}\left(b_1\right)}=\left[\frac{\hat{\sigma}^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right]^{1 / 2}, \\ &\operatorname{se}\left(b_2\right)=\sqrt{\widehat{\operatorname{Var}}\left(b_2\right)}=\left[\frac{\hat{\sigma}^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right]^{1 / 2}, \end{aligned} \]

kw: Adjusted R Square, radj, r2
Adjusted R Square

\[\bar{R}^2=1-\frac{\left(1-R^2\right)(n-1)}{n-K} \]

kw: r2, R Square

It shows the proportion of variation in a dependent variable explained by variation in the explanatory variables. Since it is desirable to have a model that fits the data well, there can be a tendency to think that the best model is the one with the highest \(R^2\).

kw: p-value

It is a standard practice to report the probability value of the test (i.e., the \(p\)-value) when reporting the outcome of statistical hypothesis tests. If we have the \(p\)-value of a test, \(p\), we can determine the outcome of the test by comparing the \(p\)-value to the chosen level of significance, \(\alpha\), without looking up or calculating the critical values. The \(p\)-value rule suggests to reject the null hypothesis when the \(p\)-value is less than, or equal to, the level of significance, \(\alpha\). That is, reject \(H_0\) if \(p \leq \alpha\) whereas do not reject \(H_0\) if \(p>\alpha\).

kw: forecast error

\[f=y_0-\hat{y}_0=\left(\beta_1+\beta_2 x_0+\epsilon_0\right)-\left(b_1+b_2 x_0\right) \]

kw: Least squares prediction, f, ci for f, variance of forecast error

\[\operatorname{Var}(f)=\operatorname{Var}\left(y_0\right)+\operatorname{Var}\left(\hat{y}_0\right)-2 \operatorname{Cov}\left(y_0, \hat{y}_0\right) \]

Taking into account that \(x_0\) and the unknown parameters \(\beta_1\) and \(\beta_2\) are not random, we have

\[\operatorname{Var}(f)=\operatorname{Var}\left(\epsilon_0\right)+\operatorname{Var}\left(\hat{y}_0\right)=\sigma^2+\operatorname{Var}\left(\hat{y}_0\right) \]

where

\[\begin{aligned} \operatorname{Var}\left(\hat{y}_0\right)=& \operatorname{Var}\left(b_1+b_2 x_0\right)=\operatorname{Var}\left(b_1\right)+x_0^2 \operatorname{Var}\left(b_2\right)+2 x_0 \operatorname{Cov}\left(b_1, b_2\right) \\ =& \frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+x_0^2 \frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+2 x_0 \frac{-\sigma^2 \bar{x}}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2} \\ =& {\left[\frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}-\frac{\sigma^2 n \bar{x}^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] } \\ &+\left[x_0^2 \frac{\sigma^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+2 x_0 \frac{-\sigma^2 \bar{x}}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+\frac{\sigma^2 n \bar{x}^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \\ =& \sigma^2\left[\frac{\sum_{i=1}^n x_i^2-n \bar{x}^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+\frac{x_0^2-2 x_0 \bar{x}+\bar{x}^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \\ =& \sigma^2\left[\frac{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}{n \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}+\frac{\left(x_0-\bar{x}\right)^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \\ =& \sigma^2\left[\frac{1}{n}+\frac{\left(x_0-\bar{x}\right)^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] . \end{aligned} \]

Therefore, by replacing \(\sigma^2\) by its estimate \(\hat{\sigma}^2\), the estimated variance of the forecast error is given by

\[\widehat{\operatorname{Var}}(f)=\hat{\sigma}^2\left[1+\frac{1}{n}+\frac{\left(x_0-\bar{x}\right)^2}{\sum_{i=1}^n\left(x_i-\bar{x}\right)^2}\right] \]

the square root of which is the standard error of the forecast error

\[\operatorname{se}(f)=\sqrt{\widehat{\operatorname{Var}}(f)} \]

kw: f test

\[F=\frac{\left(S S E_R-S S E_U\right) / J}{S S E_U /(N-K)} \]

kw: relation between f test and t test

The square of a \(t\) random variable with \(d f\) degrees of freedom is an \(F\) random variable with 1 degree of freedom in the numerator and \(d f\) degrees of freedom in the denominator. It has distribution \(F_{(1, d f)}\).

kw: Heteroskedastic

kw: Autocorrelation, lagged correlation, serial correlation

the above 3 are equivalent

kw: ar(1), ar1, first-order autoregressive model

\[e_t=\rho e_{t-1}+v_t \]

If we assume the \(v_t\) are uncorrelated random errors with zero mean and constant variances,

\[E\left(v_t\right)=0 \quad \operatorname{var}\left(v_t\right)=\sigma_v^2 \quad \operatorname{cov}\left(v_t, v_s\right)=0 \quad \text { for } t \neq s \]

then (9.30) describes a first-order autoregressive model or a first-order autoregressive process for \(e_t\). The term AR(1) model is used as an abbreviation for first-order autoregressive model. It is called an autoregressive model because it can be viewed as a regression model where \(e_t\) depends on its lagged value, inducing autocorrelation. It is called first-order because the right-hand-side variable is \(e_t\) lagged one period.

kw: variance of ar1

\[-1<\rho<1 \]

In Appendix 9B, we show that the mean and variance of \(e_t\) are

\[E\left(e_t\right)=0 \quad \operatorname{var}\left(e_t\right)=\sigma_e^2=\frac{\sigma_v^2}{1-\rho^2} \]

The AR(1) error \(e_t\) has a mean of zero, and a variance that depends on the variance of \(v_t\) and the magnitude of \(\rho\). The larger the degree of autocorrelation (the closer \(\rho\) is to \(+1\) or \(-1\) ), the larger the variance of \(e_t\). Also, since \(\sigma_v^2 /\left(1-\rho^2\right)\) is constant over time, \(e_t\) is homoskedastic. In Appendix 9B we also discover that the covariance between two errors that are \(k\) periods apart \(\left(e_t\right.\) and \(\left.e_{t-k}\right)\) is

\[\operatorname{cov}\left(e_t, e_{t-k}\right)=\frac{\rho^k \sigma_v^2}{1-\rho^2}, \quad k>0 \]

kw: collinearity

recall:

\[\operatorname{var}\left(b_2\right)=\frac{\sigma^2}{\left(1-r_{23}^2\right) \sum_{i=1}^N\left(x_2-\bar{x}_2\right)^2} \]

\(r_{23} \rightarrow 1\) or no variation of \(x\) -> collinearity

no variation of \(x\) : correlated with constant

if perfectly correlated:

  • linear estimator becomes infinity

If high correlation:

  • By the Gauss–Markov theorem, the least squares estimator is still the best linear unbiased
    estimator.
  • large variance of linear estimator -> large se -> estimate not significantly different from 0 / interval estimate too wide -> provide imprecise information to the unknown parameters
  1. When estimator standard errors are large, it is likely that the usual \(t\)-tests will lead to the conclusion that parameter estimates are not significantly different from zero. This outcome occurs despite possibly high \(R^2\) - or \(F\)-values indicating significant explanatory power of the model as a whole. The problem is that collinear variables do not provide enough information to estimate their separate effects, even though theory may indicate their importance in the relationship.
  2. Estimators may be very sensitive to the addition or deletion of a few observations, or to the deletion of an apparently insignificant variable.
  3. Despite the difficulties in isolating the effects of individual variables from such a sample, accurate forecasts may still be possible if the nature of the collinear relationship remains the same within the out-of-sample observations. For example, in an aggregate production function where the inputs labor and capital are nearly collinear, accurate forecasts of output may be possible for a particular ratio of inputs but not for various mixes of inputs.

kw: RESET, specification error

  • omitted variables
  • incorrect functional form

Ramsey Regression Equation Specification Error Test

if null hypothesis is rejected,

  • Perhaps other variables could be included in the model.
  • Perhaps the linear functional form is inappropriate.

consequence of specification error:

If we have omitted some important factor, or made any other serious specification error, then assumption SR2 \(E(e)=0\) will be violated, which will have serious consequences.

kw: covariance (weakly) stationary

\[\begin{aligned} E\left(y_t\right) & =\mu & & \text { (constant mean) } \\ \operatorname{var}\left(y_t\right) & =\sigma^2 & & \text { (constant variance) } \\ \operatorname{cov}\left(y_t, y_{t+s}\right)=\operatorname{cov}\left(y_t, y_{t-s}\right) & =\gamma_s & & \text { (covariance depends on } s, \text { not } t) \end{aligned} \]

posted @ 2022-12-01 20:59  miyasaka  阅读(37)  评论(0)    收藏  举报