Spatial Autocorrelation

Tobler's First Law of Geography:

Everything is usually related to all else but those which are near to each other are more related when compared to those that are further away.

1. Understanding spatial autocorrelation

Spatial randomness: a situation in which the location of an observation gives no information whatsoever about its value.
- In other words, a variable is spatially random if its distribution follows no discernible spatial pattern.

Spatial autocorrelation can thus be defined as the "absence of spatial randomness".

Two main forms of spatial autocorrelation: positive and negative

positive spatial autocorrelation: Similar values are located near each other, while different values tend to be scattered and further away
negative spatial autocorrelation: reflects a situation where similar values tend to be located away from each other.

Global spatial autocorrelation:

Local spatial autocorrelation:

2. Global spatial autocorrelation

2.1 Spatial lag

\[\boldsymbol{Y}_{sl} = \mathbf{W} \boldsymbol{Y} \quad \text{or} \quad y_{sl,i} = \sum_{j} w_{ij} y_j \]

where:

\(w_{ij}\) is the cell in the weight matrix \(\mathbf{W}\) on the \(i\)th row and \(j\)th column, thus capturing the spatial relationship between observations \(i\) and \(j\)
\(y_{sl,i}\) the product of the values and weights of each observation other than \(i\) in the dataset

Because non-neighbors receive a weight of zero, \(y_{sl,i}\) really captures the product of values and weights for \(i\)'s neighbors
For the Weight matrix \(\mathbf{W}\),
- If \(\mathbf{W}\) is binary, \(y_{sl,i}\) will amount to the sum of the values of \(i\)'s neighbors (useful in some contexts, such as studies of market potential)
- If \(\mathbf{W}\) is row standardized, a common transformation, then \(w_ij\)
  is bounded between zero and one; the spatial lag thus then becomes a "local average", the average value of \(\boldsymbol{Y}\) in the neighborhood of each observation

weights.spatial_lag.lag_spatial(w, y)

Parameters:
- w: ibpysal spatial weights object
- y (array): numpy array with dimensionality conforming to w
Returns:
- wy (array): array of numeric values for the spatial lag

2.2 Binary case: join counts

esda.join_counts.Join_Counts(y, w)

Binary Join Counts. "join count" statistic

Parameters:
- y (array): binary variable measured across \(n\) spatial units
- w: spatial weights instance

Statistical inference, based on random spatial permutations

2.3 Continuous case: Moran Plot and Moran's I

2.3.1 Moran's I Index

\[\begin{aligned} I &= \dfrac{N}{ \sum \limits_i^N \sum \limits_j^N w_{ij}} \dfrac{\sum \limits_i^N \sum \limits_j^N w_{ij} \, z_i \, z_j}{ \sum \limits_i z_i^2} \\ &= {\frac {N}{\sum \limits_i^N \sum \limits_j^N w_{ij}}} { \frac {\sum \limits_{i=1}^{N}\sum \limits_{j=1}^{N} w_{ij} (x_i-{\bar{x}} )( x_j - {\bar{x}}) }{ \sum \limits_{i=1}^{N}(x_i - {\bar{x}})^{2}} } \end{aligned} \]

where:

\(N\) is the number of observations
\(z_i\) is the standardized value of the variable of interest at location \(i\)
- \(z_i = x_i - \bar{x}\), where \(\tilde{x}_i\) is the values of original variable, \(\bar{x}\) is the mean of the original variable
\(w_{ij}\) is the cell corresponding to the \(i\)th row and \(j\)th column of a \(\mathbf{W}\) spatial weights matrix.

2.3.2 Expected value

The expected value of Moran's I under the null hypothesis of no spatial autocorrelation is

\[\mathbb{E}[I]={\frac {-1}{N-1}} \]

The null distribution used for this expectation is that the \(x\) input is permuted by a permutation
\(\pi\) picked uniformly at random (and the expectation is over picking the permutation).

At large sample sizes (i.e., as \(N\) approaches infinity), the expected value approaches zero.

Its variance equals

\[\operatorname{Var} [I] = {\frac {NS_{4}-S_{3}S_{5}}{(N-1)(N-2)(N-3)W^{2}}} - (\mathbb{E}[I])^{2} \]

where \(W=\sum \limits_i^N \sum \limits_j^N w_{ij}\)
more details can refer to Moran's I (Wikipedia).

2.3.2 Moran Plot

Moran Plot: a graphical interpretation of Moran's I

The Moran Plot is a way of visualizing a spatial dataset to explore the nature and strength of spatial autocorrelation.
- scatter plot: variable v.s. its 'spatial lag'
- the variable of interest is usually standardized (i.e., x = (x - x.mean()) / x.std())

2.4 Other global indices

Gear's C

The contiguity ratio \(C\), proposed by Geary (1954)

\[C = \dfrac{(n-1)} {2 \sum \limits_i \sum \limits_j w_{ij}} \dfrac{\sum \limits_i \sum \limits_j w_{ij} (y_i - y_{j})^2} {\sum \limits_i (y_i - \bar{y})^2} \]

where

\(n\) is the number of observations
\(w_{ij}\) is the cell in a binary matrix \(\mathbf{W}\) expressing whether \(i\) and \(j\) are neighbors (\(w_{ij}=1\)) or not (\(w_{ij}=0\))
\(y_i\) is the \(i\)th observation of the variable of interest, and \(\bar{y}\) is its sample mean.

When compared to Moran's I, it is apparent both measures compare the relationship of \(\boldsymbol{Y}\) within each observation's local neighborhood to that over the entire sample.

However, there are also subtle differences. While Moran's I takes cross-products on the standardized values, Geary's \(C\) uses differences on the values without any standardization.

Getis and Ord's G

Getis and Ord's \(G\) proposed by (). The \(G\) is the global version of a family of statistics of spatial autocorrelation based on distance.

The \(G\) class of statistics is conceived for points, hence the use of a distance \(\mathbf{W}\)

\[G(d) = \dfrac{ \sum \limits_i \sum \limits_j w_{ij}(d) \, y_i \, y_j } { \sum \limits_i \sum \limits_j y_i \, y_j } \]

where

\(w_{ij}(d)\) is the binary weight assigned on the relationship between observations \(i\) and \(j\)
following a distance band criterion.

\(G\) was originally proposed as a measure of concentration rather than of spatial autocorrelation.

As such, it is well suited to test to what extent similar values (either high or low) tend to co-locate. In other words, the \(G\) is a statistic of positive spatial autocorrelation.

However, it is important to note that, because \(G\) can be understood as a measure of the intensity with which \(\boldsymbol{Y}\) is concentrated, the statistic is not able to pick up cases of negative spatial autocorrelation.

3. Local Spatial Autocorrelation

3.1 Local Moran's I

to identify cases in which the value of an observation and the average of its surroundings is either more similar (HH or LL) or dissimilar (HL, LH)

\[I_{i}={\frac {x_{i}-{\bar {x}}}{m_{2}}}\sum _{j=1}^{N}w_{ij}(x_{j}-{\bar {x}}), \quad \text{where} \quad m_{2}={\frac {\sum \limits_{i=1}^{N}(x_{i}-{\bar {x}})^{2}}{N}} \]

where \(m_2\) represents the second moment (variance) of the distribution of values in the data

4. Implement

4.1 Global Spatial Autocorrelation Index

esda.Moran(y, w, transformation='r', permutations=999, two_tailed=True)

esda.Geary(y, w, transformation='r', permutations=999)

esda.G(y, w, permutations=999)

Paramters:
- y (array): variable measured across \(n\) spatial units
- w (w): spatial weights instance
- transformation (str, default="r") weights transformation,
  - "r": row-standardized
  - "B": binary
  - "D": doubly-standardized
  - "U": untransformed (general weights)
  - "V": variance-stabilizing.
- permutations (int): number of random permutations for calculation of pseudo-p_values
- two_tailed (bool)
Attributes
- .I or .C or .G: the value of Moran's I, or Gear's C, or Getis and Ord's G
- .p_sim: persudo-p-value based on permutations (one-tailed):
  - Null hypothesis: spatial randomness
  - Alternative hypothesis: the observed Moran's I is extreme if it is either extremely greater or extremely lower than the values obtained based on permutations
- .y, .z: original variable and normalized (standardized) variable

Using the random permutation simulation (i.e., Monte Carlo) to complete the statistic testing.

Explaination of .p_sim:

It is an empirical p-value that represents the proportion of realizations in the simulation under spatial randomness that are more extreme than the observed value.
A small enough p-value associated with the Moran's I of a map allows to reject the hypothesis that the map is random. In other words, we can conclude that the map displays more spatial pattern than we would expect if the values had been randomly allocated to a locations.
For example, if .p_sim=0.01: if we generated a large number of maps with the same values but randomly allocated over space, and calculated the Moran's I statistic for each of those maps, only 0.01% of them would display a larger (absolute) value than the one we obtain from the observed data, and the other 99.99% of the random maps would receive a smaller (absolute) value of Moran's I

4.2 Moran Plot

4.2.1 PySAL

related function: splot.esda

from splot.esda import moran_scatterplot
from splot.esda import plot_moran_simulation
from splot.esda import plot_moran

moran_scatterplot(moran, 
                  zstandard=True, p=None, 
                  aspect_equal=True, ax=None, 
                  scatter_kwds=None, fitline_kwds=None)

plot_moran_simulation(moran, 
                      aspect_equal=True, ax=None, 
                      fitline_kwds=None, **kwargs)

plot_moran(moran, 
           zstandard=True, aspect_equal=True, 
           scatter_kwds=None, fitline_kwds=None, **kwargs)

Parameters:
- moran (esda.moran.Moran instance): Values of Moran's I Global Autocorrelation Statistics
- zstandard (bool, default=True): If True, Moran Scatterplot will show z-standardized attribute and spatial lag values.
  - zstandard=False: will plot moran.y v.s. weights.lag_spatial(w, moran.y)
  - zstandard=True: will plot moran.z v.s. weights.lag_spatial(w, moran.z)
- p (float, Default=None): If given, the p-value threshold for significance for Local Autocorrelation analysis. Points will be colored by significance. By default it will not be colored.
- aspect_equal (bool, default=True): If True, Axes will show the same aspect or visual proportions for Moran Scatterplot.
  
  If True, 保持横纵坐标的代表的数量级相同
Returns:
- fig, ax

4.2.2 Code

Moran 散点图

点击查看代码

def moran_scatter_plot(w, var, zstandard=True, ax=None):
    '''
    plot Moran plot: variable v.s. spatial lag
    '''
    import seaborn as sns
    from pysal.lib import weights

    # normalization
    if zstandard:
        var = (var - var.mean()) / var.std()
    # spatial lag
    spatial_lag = weights.lag_spatial(w, var)

    if ax is None:
        f, ax = plt.subplots(1, 1, figsize=(6, 6))

    ax = sns.regplot(
        x = var, y = spatial_lag, ci = None,
        line_kws = {"color": "r"},
        scatter_kws = {"s": 10},
        ax = ax
    )

    ax.axvline(0, c="k", alpha=0.5)
    ax.axhline(0, c="k", alpha=0.5)
    ax.set_ylabel('Spatial lag')
    ax.set_title("Moran Plot")

    return ax

对比

from pysal.lib import weights
from pysal.explore import esda
from splot.esda import moran_scatterplot

# Generate W from the GeoDataFrame
w = weights.distance.KNN.from_dataframe(data, k=8, geom_col='geometry') 
# Row-standardization, equals to: w.transform = "R"
w.set_transform('R')

# compute Moran's I
moran = esda.moran.Moran(data['value'], w, permutations=99)

# plot
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
moran_scatter_plot(w, data['value'], zstandard=True, ax=axes[0])
moran_scatterplot(moran, zstandard=True, ax=axes[1], aspect_equal=False)

4.3 Local Moran's I

esda.Moran_Local(y, w, transformation='r', 
                 permutations=999, geoda_quads=False, n_jobs=1, 
                 keep_simulations=True, seed=None, island_weight=0)

Paramters:
- y (array, shape of (n,1)): attribute array
- w: weight instance assumed to be aligned with y
- transformation (str in {'R', 'B', 'D', 'U', 'V}, default='r'）: weights transformation
- permutations (int, default=999): number of random permutations for calculation of pseudo p_values
- geoda_quads (bool, default=False):
  - If True use GeoDa scheme: HH=1, LL=2, LH=3, HL=4
  - If False use PySAL Scheme: HH=1, LH=2, LL=3, HL=4
- n_jobs (int): Number of cores to be used in the conditional randomisation. If -1, all available cores are used.
- keep_simulations (bool, default=True): If True, the entire matrix of replications under the null is stored in memory and accessible; otherwise, replications are not saved
- seed (int, default=None): Seed to ensure reproducibility of conditional randomizations.
  - Must be set here, and not outside of the function, since numba does not correctly interpret external seeds nor numpy.random.RandomState instances.
- island_weight: value to use as a weight for the "fake" neighbor for every island.
  - If numpy.nan, will propagate to the final local statistic depending on the stat_func.
  - If 0, then the lag is always zero for islands.
Properties:
- y: original variable
- w: original w object
- Is: local Moran's I values
- q: (if permutations>0) values indicate quandrant location 1 HH, 2 LH, 3 LL, 4 HL

Reference

Sergio Rey, Dani Arribas-Bel, Levi John Wolf, 2020, Geographic Data Science with Python, site

posted @ 2023-05-13 20:09 veager 阅读(83) 评论(0) 收藏举报

刷新页面返回顶部