Quantile Regression

Quantile Regression

1. Theory

(1) 分位数

对一个随机变量 \(X\),其 \(\tau \ (0 \leq \tau \leq 1)\)分位数 \(x\)

\[\tau = \Pr(X \leq x) \]

(2) Pinball Loss 损失

类比于最小二乘法使用的平均平方误差(Mean Squared Error,或 L2)损失函数,分位数回归定义损失 Pinball Loss (or Quantile Loss) 函数 \(\rho_\tau(u)\)

\[\begin{cases} \text{When } u \ge 0, & \rho_\tau(u) = \tau \cdot u \\ \text{When } u < 0, & \rho_\tau(u) = (1 - \tau) \cdot (-u) \end{cases} \quad \Rightarrow \quad \rho_\tau (u) = \begin{cases} \tau \cdot u, & u \ge 0 \\ (\tau - 1) u, & u < 0 \end{cases} \]

\[\rho_\tau (u) = \tau \max(u, 0) + (1 - \tau) \max(-u, 0) \]

(3) 分位数回归模型

在分位数回归中分位数回归中,对于给定的百分数 \(\tau (0 \leq \tau \leq 1)\),其模型的目标函数为:

\[\begin{aligned} \min_{\beta_\tau} L &= \min_{\beta_\tau} \sum_{i=1}^n \rho_\tau \left( y_i - \boldsymbol{\beta}_{\tau}^{\top} \boldsymbol{x}_i \right) \\ &= \min _{\beta_r} \sum_{y_i \geq \boldsymbol{\beta}_{\tau}^{\top} \boldsymbol{x}_i } \tau \left( y_i- \boldsymbol{\beta}_\tau^{\top} \boldsymbol{x}_i \right) + \sum_{y_i < \boldsymbol{\beta}_{\tau}^{\top} \boldsymbol{x}_i} (\tau-1) \left (y_i - \boldsymbol{\beta}_{\tau}^{\top} \boldsymbol{x}_i \right) \end{aligned} \]

  • \(\tau = 0.5\) 则为最小一乘回归,即为平均绝对误(Mean Squared Error)损失或 L1 损失

  • 越高的分位数估计,越倾向于高估预测值,越低的分位数则倾向于低估预测值。如:对于 0.25 分位数,高估 \(Y\) 的损失权重为 0.75,而低估的权重为 0.25;对于 0.75 分位数,则情况正好相反。

(4) 参数估计

分位回归模型的参数估计算法主要有单纯形法、内点算法和平滑算法。

2. Implement In Python

2.1 sci-kit learn 库

sklearn.linear_model.QuantileRegressor()

2.2 statsmodels 库

statsmodels.regression.quantile_regression.QuantReg()

  • 实例代码sci-kit learnstatsmodels 结果对比
import numpy as np
from sklearn.linear_model import QuantileRegressor
import statsmodels.api as sm

# data
x = np.linspace(0, 10, 100) + (1 - np.random.random(100)) * 5
x = np.expand_dims(x, axis=1)
y = 2 * x + (1 - np.random.random(100)) * 5

# sci-kit learn model
model = QuantileRegressor(quantile=0.1, alpha=0., fit_intercept=True)
model.fit(x, y)
print(model.intercept_, model.coef_)

# statsmodels model
model = sm.QuantReg(endog=y, exog=sm.add_constant(x))
res = model.fit(q=0.1)
print(res.params)

References

Stata+R:分位数回归一文读懂, site

posted @ 2023-05-05 14:27  veager  阅读(424)  评论(0)    收藏  举报