Notes: Principles of Econometrics
SME Note 2
This note is the note for statistical method in economics course. Made for quick check of important formulae.
(poe) ASSUMPTIONS OF THE SIMPLE LINEAR REGRESSION MODEL-II
SR1. The value of \(y\), for each value of \(x\), is
SR2. The expected value of the random error \(e\) is
which is equivalent to assuming that
SR3. The variance of the random error \(e\) is
The random variables \(y\) and \(e\) have the same variance because they differ only by a constant.
SR4. The covariance between any pair of random errors \(e_i\) and \(e_j\) is
The stronger version of this assumption is that the random errors \(e\) are statistically independent, in which case the values of the dependent variable \(y\) are also statistically independent.
SR5. The variable \(x\) is not random and must take at least two different values.
SR6. (optional) The values of \(e\) are normally distributed about their mean
if the values of \(y\) are normally distributed, and vice versa.
kw: xi, residual
\(\sum_{i=1}^n x_i e_i=0\), the sum of the weighted (by \(x_i\) ) residuals is 0 :
kw: xi, residual
\(x_i\) and \(e_i\) are uncorrelated (in samples):
kw: yhat, residual
\(\sum_{i=1}^n \hat{y}_i e_i=0\), the sum of the weighted (by \(\hat{y}_i\) ) residuals is 0 :
\(\hat{y}_i\) and \(e_i\) are uncorrelated (in samples):
kw: le, linear estimators, variance, covariance
variances and covariances of b1 and b2
are given by
kw: sample correlation coefficient
kw: sst, ssr, sse, r2, coefficient of determination, sample correlation coefficient
- \(R^2=r_{x y}^2\) : The coefficient of determination \(R^2\) is algebraically equal to the square of the sample correlation coefficient \(r_{x y}\) between \(x\) and \(y\). This result is valid in simple linear regression models;
- \(R^2=r_{y \hat{y}}^2\) : The coefficient of determination \(R^2\) can also be computed as the square of the sample correlation coefficient between \(y\) and \(\hat{y}\). In this case, it measures the "goodness-of-fit" between the sample data and their predicted values. Therefore, \(R^2\) is sometimes called a measure of "goodness-of-fit". This result is valid not only in simple linear regression models but also in multiple linear regression models.
kw: unbiased sample variance
kw: mean squared error, mse
The mean squared error (MSE):
In this case, we have to divide by \(n-2\), because we estimated the unknown population intercept \(\beta_1\) and the population slope \(\beta_2\) by \(b_1\) and \(b_2\), respectively, which "costs us two degrees of freedom";
- The unbiased sample covariance can be similarly defined as
why is the denominator being \(n-1\) rather than \(n-2\) ?
? because no \(\hat y_i\) contained ?
kw: standard errors, se, b1, b2
kw: Adjusted R Square, radj, r2
Adjusted R Square
kw: r2, R Square
It shows the proportion of variation in a dependent variable explained by variation in the explanatory variables. Since it is desirable to have a model that fits the data well, there can be a tendency to think that the best model is the one with the highest \(R^2\).
kw: p-value
It is a standard practice to report the probability value of the test (i.e., the \(p\)-value) when reporting the outcome of statistical hypothesis tests. If we have the \(p\)-value of a test, \(p\), we can determine the outcome of the test by comparing the \(p\)-value to the chosen level of significance, \(\alpha\), without looking up or calculating the critical values. The \(p\)-value rule suggests to reject the null hypothesis when the \(p\)-value is less than, or equal to, the level of significance, \(\alpha\). That is, reject \(H_0\) if \(p \leq \alpha\) whereas do not reject \(H_0\) if \(p>\alpha\).
kw: forecast error
kw: Least squares prediction, f, ci for f, variance of forecast error
Taking into account that \(x_0\) and the unknown parameters \(\beta_1\) and \(\beta_2\) are not random, we have
where
Therefore, by replacing \(\sigma^2\) by its estimate \(\hat{\sigma}^2\), the estimated variance of the forecast error is given by
the square root of which is the standard error of the forecast error
kw: f test
kw: relation between f test and t test
The square of a \(t\) random variable with \(d f\) degrees of freedom is an \(F\) random variable with 1 degree of freedom in the numerator and \(d f\) degrees of freedom in the denominator. It has distribution \(F_{(1, d f)}\).
kw: Heteroskedastic
kw: Autocorrelation, lagged correlation, serial correlation
the above 3 are equivalent
kw: ar(1), ar1, first-order autoregressive model
If we assume the \(v_t\) are uncorrelated random errors with zero mean and constant variances,
then (9.30) describes a first-order autoregressive model or a first-order autoregressive process for \(e_t\). The term AR(1) model is used as an abbreviation for first-order autoregressive model. It is called an autoregressive model because it can be viewed as a regression model where \(e_t\) depends on its lagged value, inducing autocorrelation. It is called first-order because the right-hand-side variable is \(e_t\) lagged one period.
kw: variance of ar1
In Appendix 9B, we show that the mean and variance of \(e_t\) are
The AR(1) error \(e_t\) has a mean of zero, and a variance that depends on the variance of \(v_t\) and the magnitude of \(\rho\). The larger the degree of autocorrelation (the closer \(\rho\) is to \(+1\) or \(-1\) ), the larger the variance of \(e_t\). Also, since \(\sigma_v^2 /\left(1-\rho^2\right)\) is constant over time, \(e_t\) is homoskedastic. In Appendix 9B we also discover that the covariance between two errors that are \(k\) periods apart \(\left(e_t\right.\) and \(\left.e_{t-k}\right)\) is
kw: collinearity
recall:
\(r_{23} \rightarrow 1\) or no variation of \(x\) -> collinearity
no variation of \(x\) : correlated with constant
if perfectly correlated:
- linear estimator becomes infinity
If high correlation:
- By the Gauss–Markov theorem, the least squares estimator is still the best linear unbiased
estimator. - large variance of linear estimator -> large se -> estimate not significantly different from 0 / interval estimate too wide -> provide imprecise information to the unknown parameters
- When estimator standard errors are large, it is likely that the usual \(t\)-tests will lead to the conclusion that parameter estimates are not significantly different from zero. This outcome occurs despite possibly high \(R^2\) - or \(F\)-values indicating significant explanatory power of the model as a whole. The problem is that collinear variables do not provide enough information to estimate their separate effects, even though theory may indicate their importance in the relationship.
- Estimators may be very sensitive to the addition or deletion of a few observations, or to the deletion of an apparently insignificant variable.
- Despite the difficulties in isolating the effects of individual variables from such a sample, accurate forecasts may still be possible if the nature of the collinear relationship remains the same within the out-of-sample observations. For example, in an aggregate production function where the inputs labor and capital are nearly collinear, accurate forecasts of output may be possible for a particular ratio of inputs but not for various mixes of inputs.
kw: RESET, specification error
- omitted variables
- incorrect functional form
Ramsey Regression Equation Specification Error Test
if null hypothesis is rejected,
- Perhaps other variables could be included in the model.
- Perhaps the linear functional form is inappropriate.
consequence of specification error:
If we have omitted some important factor, or made any other serious specification error, then assumption SR2 \(E(e)=0\) will be violated, which will have serious consequences.
kw: covariance (weakly) stationary
本文来自博客园,作者:miyasaka,转载请注明原文链接:https://www.cnblogs.com/kion/p/16942691.html