Linear Regression - Subset Selection

1 Best Subset Selection

\(2^p\) models, \(p\) is the number of predictors

Algorithm:

Let \(\mathcal{M}_0\) denote the null model, which only contains constant item but no predictors.
This model simply predicts the sample mean for each observation.
For \(k = 1, 2, \cdots, p\) :
1. Fit all \(\displaystyle \binom{p}{k}\) models that contain exactly \(k\) predictors.
2. Pick the best among these \(\displaystyle \binom{p}{k}\) models, and call it \(\mathcal{M}_K\).
  Here best is defined as having the smallest \(\text{RSS}\), or equivalently largest \(R^2\).
Select a single best model from among \(\mathcal{M}_0, \mathcal{M}_1, \cdots, \mathcal{M}_p\) using cross-validated prediction error, such as, \(C_p (\text{AIC})\), \(\text{BIC}\), or adjusted \(R^2\).

Note: The \(\text{RSS}\) of these \(p + 1\) models decreases monotonically, and the \(R^2\) increases monotonically, as the number of features included in the models increases. Therefore, in Step 3, use cross-validated prediction error, \(Cp\), \(\text{BIC}\), or adjusted \(R^2\) to compare the models with the different number of predictors.

2 Forward Stepwise Selection

Algorithm:

Let \(\mathcal{M}_0\) denote the null model, which only contains constant item but no predictors.
For \(k = 1, 2, \cdots, p\) :
1. Consider all \(p − k\) models that augment the predictors in \(\mathcal{M}_k\) with one additional predictor.
2. Choose the best among these \(p − k\) models, and call it \(\mathcal{M}_{k+1}\)
  Here best is defined as having the smallest \(\text{RSS}\), or equivalently largest \(R^2\).
Select a single best model from among \(\mathcal{M}_0, \mathcal{M}_1, \cdots, \mathcal{M}_p\) using cross-validated prediction error, such as, \(C_p (\text{AIC})\), \(\text{BIC}\), or adjusted \(R^2\).

3 Backward Stepwise Selection

Algorithm:

Let \(\mathcal{M}_p\) denote the all model, which contains all \(p\) predictors.
For \(k = p, p-1, \cdots, 1\) :
1. Consider all \(k\) models that contain all but one of the predictors in \(\mathcal{M}_k\), for a total of \(k − 1\) predictors.
2. Choose the best among these \(k\) models, and call it \(\mathcal{M}_{k-1}\)
  Here best is defined as having the smallest \(\text{RSS}\), or equivalently largest \(R^2\).
Select a single best model from among \(\mathcal{M}_0, \mathcal{M}_1, \cdots, \mathcal{M}_p\) using cross-validated prediction error, such as, \(C_p (\text{AIC})\), \(\text{BIC}\), or adjusted \(R^2\).

4 Hybrid Approaches

Stepwise selection (sequential replacement), which is a combination of forward and backward selections. Start with no predictors, then sequentially add the most contributive predictors (like forward selection). After adding each new variable, remove any variables that no longer provide an improvement in the model fit (like backward selection).

Such an approach attempts to more closely mimic best subset selection while retaining the computational advantages of forward and backward stepwise selection.

5 Implement by `R`

5.1 `regsubsets` function in `leaps` package

5.1.1 安装

安装

install.packages("leaps")

regsubsets() function，主要参数

nvmax: 最大子集（变量）个数
intercept: bool ；是否包含 intercept
method: str，"exhaustive", "backward", "forward", "seqrep"；
force.in: str向量或 bool向量；强制保留的变量
nbest: int；在每个 step 中，保留最好的几个模型

5.1.2 实例

加载数据

library(ISLR)
names(Hitters)  # 变量名
Hitters = na.omit(Hitters)  # 删除含有缺失数据的样本
dim(Hitters)
sum(is.na(Hitters))

Best Subset Selection

regfit.full = regsubsets(Salary ~ ., data=Hitters, nvmax=19)
reg.summary = summary (regfit.full)
reg.summary

查看每个 step 模型的统计指标，如 \(R^2\)（"rsq"）, \(\text{RSS}\)（"rss"）, adjusted \(R^2\)（"adjr2"）, \(\text{BIC}\)（"bic"）等

names(reg.summary)
# 输出
# [1] "which" "rsq" "rss" "adjr2" "cp" "bic" "outmat" "obj"
reg. summary$rsq

绘图

regsubsets() function 内置函数 plot()

plot(regfit.full, scale="r2")
plot(regfit.full, scale="adjr2")
plot(regfit.full, scale ="Cp")
plot(regfit.full, scale ="bic")

模型的系数和协方差矩阵

coef(regfit.full, 1:3)  # 输出模型 1 ~ 3 的系数
vcov(regfit.full, 1)

regfit.fwd = regsubsets(Salary ~ ., data=Hitters, nvmax=19, method="forward")
summary(regfit.fwd)

regfit.bwd = regsubsets(Salary ~ ., data=Hitters, nvmax=19, method="backward")
summary(regfit.bwd)

regfit.seq = regsubsets(Salary ~ ., data=Hitters, nvmax=19, method="seqrep")
summary(regfit.seq)

5.2 `step` 函数

或 stepAIC 函数 in MASS package

根据 AIC 准则（AIC最小）逐步选择回归。当 AIC 不在减小时，停止 step

step 函数主要参数

direction: str 类型, "both", "backward", or "forward"
trace: bool 类型或 int 类型；整数表示输出 step regression 的详细过程

实例

lm.inter_only = lm(Salary ~ 1, data=Hitters)
lm.all = lm(Salary ~ ., data=Hitters)

# Forward Stepwise Selection
step.fwd = step(lm.inter_only, direction="forward", scope=formula(lm.all), trace=1)
summary(step.fwd)

# Backward Stepwise Selection
step.bwd = step(lm.all, direction="backward", scope=formula(lm.all), trace=1)
summary(step.bwd)

References

[1] T. Hastie, R. Tibshirani and J. Friedman, "3.3 Subset Selection" in The Elements of Statistical Learning Second Edition. New York, NY: Springer New York, 2009. p.p. 57-60.

[2] G. James, D. Witten, T. Hastie, and R. Tibshirani, "6.1 Subset Selection" in An introduction to statistical learning: with applications in R. New York: Springer, 2013. p.p. 205-210.

[3] 'regsubsets' functions for model selection in Package 'leaps', 地址, 或地址, 或地址.

[4] A Complete Guide to Stepwise Regression in R, Statology, 地址

[5] Choose a model by AIC in a Stepwise Algorithm, 地址, 或地址, 或地址.

posted @ 2022-02-22 10:21 veager 阅读(102) 评论(0) 收藏举报

刷新页面返回顶部

veager

Linear Regression - Subset Selection

Linear Regression - Subset Selection

1 Best Subset Selection

2 Forward Stepwise Selection

3 Backward Stepwise Selection

4 Hybrid Approaches

5 Implement by R

5.1 regsubsets function in leaps package

5.1.1 安装

5.1.2 实例

5.2 step 函数

References

5 Implement by `R`

5.1 `regsubsets` function in `leaps` package

5.2 `step` 函数