Spatial Regression

Spatial Regression

1. Motivation

Spatial Heterogeneity: means that parts of the model may vary systematically with geography, change in different places.

Spatial dependence:

2. Models

spreg model, site

from pysal.model import spreg

2.1 Ordinary Least Squares

model = spreg.OLS(y, x, name_y, name_x, **kargs)
  • Parameters:

    • y (array): dependent variable

    • x (2d array): Two dimensional array with n rows and one column for each independent (exogenous) variable, excluding the constant

    • name_y (str): Name of dependent variable for use in output

    • name_x (list of str): Names of independent variables for use in output

注意:该函数在内部会自动添加常数项,因此 x 不需要statsmodel 那样添加一列 "all-one" 列用于估计常数项

2.2 Spatial Fixed Effects & Spatial Regimes

Spatial Fixed Effects model

\[Y_i = \alpha_r + \sum_k X_{ik}\beta_k + \epsilon_i \]

  • Where \(i\) is the index of samples

  • \(k\) is the index of explanatory variables

  • \(r\) is the index of neighborhoods (regions)

  • \(\alpha_r\) represents constant term \(\alpha\) is allowed to vary by neighborhood \(r\), i.e., \(\alpha_r\)

  • 相当于用 dummy variable 表示 neighborhood

2.3 Spatial Regimes

Spatial Regimes model

\[Y_i = \alpha_r + \sum_k X_{ik}\beta_{k,r} + \epsilon_i \]

  • where we are not only allowing the constant term to vary by region (\(\alpha_r\)), but also every other parameter (\(\beta_{k,r}\))

  • 相当于根据 region (\(r\)) 将样本分组,再分别使用 Linear Regerssion

model = spreg.OLS_Regimes(y, x, regimes, 
    constant_regi="many", cols2regi='all', regime_err_sep=True, **kargs)
  • Parameters:

    • constant_regi (str in {'one', 'many'}, default='many'): Switcher controlling the constant term setup.

      • 'one': a vector of ones is appended to x and held constant across regimes

      • 'many': a vector of ones is appended to x and considered different per regime (default)

    • cols2regi (list, or 'all', default='all'): Argument indicating whether each column of x should be considered as different per regime (True) or held constant across regimes (False).

      • If a list : k booleans indicating for each variable the option (True if one per regime, False to be held constant).

      • If 'all' (default) : all the variables vary by regime.

    • regime_err_sep (bool, default=True):

      • True: a separate regression is run for each regime.

Chow test

Null hypothesis: estimates from different regimes are undistinguishable

# global one that jointly tests for differences between the two regimes
model.chow.joint
# check whether each of the coefficients in our model differs across regimes
model.chow.regi

2.4 Exogenous effects: The SLX model

By including the spatial lag:

\[Y_i = \alpha + \sum^{p}_{k=1}X_{ik}\beta_k + \sum^{p}_{k=1} \left( \sum^{N}_{j=1}w_{ij}x_{jk}\right)\gamma_k + \epsilon_i \]

  • where \(\sum^{N} \limits_{j=1} w_{ij}x_{jk}\) represents the spatial lag of the \(k\)th explanatory variable.

This can be stated in matrix form using the spatial weights matrix \(\mathbf{W}\), as:

\[\boldsymbol{Y} = \alpha + \mathbf{X} \boldsymbol{\beta} + \mathbf{WX} \boldsymbol{\gamma} + \mathbf{\epsilon} \]

This splits the model to focus on two main effects: \(\boldsymbol{\beta}\) and \(\boldsymbol{\gamma}\). The effect \(\boldsymbol{\beta}\) describes the change in \(X_{ik}\) when \(y_i\) changes by one.

2.5 Spatial Error

The spatial error model includes a spatial lag in the error term of the equation:

\[Y_i = \alpha + \sum_k \beta_k X_{ik} + u_i \quad \text{where} \quad u_i = \lambda u^{\text{lag}}_{i} + \epsilon_i \]

  • where \(u^{\text{lag}}_{i} = \sum \limits_j w_{i,j} u_j\)

This specification violates the assumptions about the error term in a classical OLS model. Hence, alternative estimation methods are required.

model = spreg.GM_Error_Het(y, x, w, **kargs)

\(\lambda\) 值:model.name_x 中 lambda 所对应的系数

pandas.DataFrame({"Coeff."     : model.betas.flatten(),
                  "Std. Error" : model.std_err.flatten(), 
                  "P-Value"    : [i[1] for i in model.z_stat] },
                 index = model.name_x
).reindex(["lambda"])

2.6 Spatial Lag

The spatial lag model introduces a spatial lag of the dependent variable. In the example we have covered, this would translate into:

\[Y_i = \alpha + \rho Y^{\text{lag}}_i + \sum_k \beta_k X_{i,k} + \epsilon_i \]

  • where \(Y^{\text{lag}}_{i} = \sum \limits_j w_{i,j} Y_j\)

This model violates the exogeneity assumption, crucial for OLS to work. This occurs when \(Y_i\) exists on both "sides" of the equals sign. In theory, since
\(Y_i\) is included in computing \(Y^{\text{lag}}_i\), exogeneity is violated.

  • Two-stage least squares estimation
model = spreg.GM_Lag(y, x, w, **kargs)

2.7 Other Models

  • Generalized Additive Models

  • Spatial Gaussian Process Models or Kriging

References

Rey, Sergio J., et al. 2022, "ReySpatial Regression", in Geographic Data Science with Python, site

posted @ 2023-05-14 13:30  veager  阅读(88)  评论(0)    收藏  举报