Weighted Least Squares(WLS)
When and How to use Weighted Least Squares (WLS) Models
WLS主要用途:
解决异方差问题,即解决 随着x的 增加,对应x的y的方差也随之"增加" 这类问题
异方差数据:具有随输入而变化的可变性,通常随着输入的变化,方差也会跟着变化
例如:
随着年龄的增长,净资产趋于分散
随着公司规模的扩大,收入趋于分散
或者,随着婴儿身高的增加,体重趋于发散
随着x的 增加,对应x的y的方差也随之增加。

`
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import and fit an OLS model, check coefficients
from sklearn.linear_model import LinearRegression
matplotlib inline
generate random data
np.random.seed(24)
x = np.random.uniform(-5,5,25)
ϵ = 2np.random.randn(25)
y = 2x+ϵ
alternate error as a function of x
ϵ2 = ϵ(x+5)
y2 = 2x+ϵ2
sns.regplot(x,y);
sns.regplot(x,y2);
model = LinearRegression()
model.fit(x.reshape(-1, 1), y)
print(model.intercept_, model.coef_)
add a strong outlier for high x
x_high = np.append(x,5)
y_high = np.append(y2,160)
add a strong outlier for low x
x_low = np.append(x,-4)
y_low = np.append(y2,160)
calculate weights for sets with low and high outlier
sample_weights_low = [1/(x+5) for x in x_low]
sample_weights_high = [1/(x+5) for x in x_high]
reshape for compatibility
X_low = x_low.reshape(-1, 1)
X_high = x_high.reshape(-1, 1)
model = LinearRegression()
model.fit(X_low, y_low)
fit WLS using sample_weights
WLS = LinearRegression()
WLS.fit(X_low, y_low, sample_weight=sample_weights_low)
sns.regplot(x_low,y_low);
print(model.intercept_, model.coef_)
print('WLS')
print(WLS.intercept_, WLS.coef_)
model = LinearRegression()
model.fit(X_high, y_high)
WLS = LinearRegression()
WLS.fit(X_high, y_high, sample_weight=sample_weights_high)
print(model.intercept_, model.coef_)
print('WLS')
print(WLS.intercept_, WLS.coef_)
model = LinearRegression()
model.fit(x.reshape(-1, 1), y)
WLS = LinearRegression()
sample_weights = [1/(i+5) for i in x]
WLS.fit(x.reshape(-1,1), y, sample_weight=sample_weights)
print(model.intercept_, model.coef_)
print('WLS')
print(WLS.intercept_, WLS.coef_)
plt.show()
`

浙公网安备 33010602011771号