Multi-Scale Detection of Anomalous Spatio-Temporal Trajectories in Evolving Trajectory Datasets

是的,你对模型的理解基本正确,我可以帮你进一步理清细节和逻辑关系。

1. 模型嵌入过程概述

  1. 空间嵌入 (Spatial Embedding):
  • 使用图卷积 (GCN) 对轨迹的空间位置进行嵌入,生成不同尺度的空间嵌入 (三种尺度)。
  1. 时间嵌入 (Temporal Embedding):
  • 使用类似 Doc2Vec 的方法对时间信息进行嵌入,生成不同尺度的时间嵌入 (三种尺度)。
  1. 交叉注意力机制 (Cross-Attention):
  • 将空间和时间嵌入结合在一起,学习空间和时间之间的交互关系,生成时空联合嵌入。
  1. 多尺度聚合 (Multi-scale Aggregation):
  • 对不同尺度的嵌入进行聚合,形成最终轨迹嵌入表示,保留轨迹的多尺度特征。

2. 异常检测过程

  1. 轨迹嵌入分布建模 (GMM):
  • 使用高斯混合模型 (GMM) 对轨迹嵌入的空间和时间分布进行建模,假设模型学习到 3 个高斯分布,每个分布代表一种正常轨迹模式。
  • 轨迹嵌入在空间和时间维度都会有对应的高斯分布。
  1. 生成概率计算 (Likelihood Calculation):
  • 针对输入轨迹,计算其在每个高斯分布下的生成概率 (空间和时间分别计算)。
  1. 最大生成概率选择 (Max Likelihood Selection):
  • 在空间和时间维度中,分别选择生成概率最大的高斯分布,表示轨迹最符合哪一种正常模式。
  1. 异常分数计算:
  • 将空间和时间的最大生成概率相乘,表示轨迹整体符合正常模式的程度。
  • 异常分数公式如下:
    \( \text{Score}(T) = 1 - \left( \max P_s \times \max P_t \right) \)
    • 其中,$ P_s $ 和 $ P_t $ 分别是空间和时间维度上的最大生成概率。

3. 举例说明

假设输入轨迹如下:

  • 空间轨迹点:[1, 3, 5]
  • 时间戳:[10, 20, 30]

模型通过多尺度嵌入得到以下表示:

尺度 空间嵌入 $ S $ 时间嵌入 $ T $
1 [0.1, 0.2] [0.05, 0.1]
2 [0.3, 0.4] [0.1, 0.15]
3 [0.5, 0.6] [0.15, 0.2]

轨迹在 3 个高斯分布中的生成概率:

簇编号 空间概率 $ P_s $ 时间概率 $ P_t $
簇1 0.75 0.80
簇2 0.60 0.70
簇3 0.90 0.85

异常分数计算:

  1. 空间维度最大概率:
    \( \max P_s = 0.90 \quad (\text{簇3}) \)

  2. 时间维度最大概率:
    \( \max P_t = 0.85 \quad (\text{簇3}) \)

  3. 轨迹异常分数:
    \( \text{Score}(T) = 1 - (0.90 \times 0.85) = 1 - 0.765 = 0.235 \)

最终异常分数 = 0.235,表示该轨迹较符合簇3的模式,不属于异常轨迹。


4. 进一步理解:

  • 如果轨迹在所有簇的生成概率都很低,异常分数将接近 1,表示轨迹极度异常。
  • 如果轨迹符合某个簇的模式,生成概率较高,异常分数接近 0,表示轨迹属于正常轨迹模式。

5. 为什么选取最大生成概率?

  • 真实轨迹可能属于多个正常模式之一,选择最大生成概率的簇代表轨迹最有可能的正常模式。
  • 如果仅选择均值或最小概率,可能会误判轨迹为异常,而实际上轨迹符合某种模式。

6. 你的理解正确的部分:

  • 多尺度空间和时间嵌入:✅
  • 交叉注意力机制结合空间和时间:✅
  • 高斯分布建模轨迹模式:✅
  • 最大概率相乘并计算异常分数:✅

如果你有更多细节问题,或者想深入探讨代码逻辑和实际实现,我们可以进一步分析每个函数的具体流程!

代码实现

From analyzing the code, here's how trajectories are processed and anomaly scores calculated:

  1. Trajectory Embedding Process in mst_oatd.py:
# Initial spatial embedding through graph convolution
H = D.mm(A).mm(self.V).mm(D)  # Normalize adjacency matrix
nodes = H.mm(self.embedding(self.nodes))
s_inputs = torch.index_select(nodes, 0, trajs.flatten())

# Temporal embedding 
t_inputs = self.d2v(times)

# Combine via cross-attention
att_s, att_t = self.co_attention(s_inputs, t_inputs)
st_inputs = torch.concat((att_s, att_t), dim=2)

# Multi-scale processing via different RNNs at scales s1 and s2
encoder_inputs_s1 = pack_padded_sequence(self.attention_layer(st_inputs, lengths))
encoder_inputs_s2 = self.scale_process(st_inputs, self.s1_size, [int(i // self.s1_size) for i in lengths])
encoder_inputs_s3 = self.scale_process(st_inputs, self.s2_size, [int(i // self.s2_size) for i in lengths])

# Combine scales with learned weights
encoder_final_state = (self.W1 * encoder_final_state_s1 + 
                      self.W2 * encoder_final_state_s2 + 
                      self.W3 * encoder_final_state_s3)
  1. Anomaly Score Calculation in mst_oatd_trainer.py:
def detection(self):
    # Compute likelihood across all clusters
    for batch in self.outliers_loader:
        trajs, times, seq_lengths = batch
        c_likelihood_s = []
        c_likelihood_t = []

        # Get likelihood for each cluster
        for c in range(self.n_cluster):
            output_s, _, _, _ = self.MST_OATD_S(trajs, times, seq_lengths, batch_size, "test", c)
            likelihood_s = - self.detec(output_s.reshape(-1, output_s.shape[-1]),
                                      trajs.to(self.device).reshape(-1))
            likelihood_s = torch.exp(
                torch.sum(mask * (likelihood_s.reshape(batch_size, -1)), dim=-1) / torch.sum(mask, 1))
            
            output_t, _, _, _ = self.MST_OATD_T(trajs, times, seq_lengths, batch_size, "test", c)
            likelihood_t = - self.detec(output_t.reshape(-1, output_t.shape[-1]),
                                      times_token.to(self.device).reshape(-1))
            likelihood_t = torch.exp(
                torch.sum(mask * (likelihood_t.reshape(batch_size, -1)), dim=-1) / torch.sum(mask, 1))
            
            c_likelihood_s.append(likelihood_s)
            c_likelihood_t.append(likelihood_t)

        # Final anomaly score combines spatial and temporal likelihoods
        likelihood_s = torch.cat(c_likelihood_s).max(0)[0]
        likelihood_t = torch.cat(c_likelihood_t).max(0)[0]
        score = 1 - likelihood_s * likelihood_t

The code shows that anomaly scores are computed using:

  • Maximum likelihood across all clusters
  • Product of spatial and temporal likelihoods
  • Normalized by trajectory length via the mask
  • Negative cross-entropy loss as the base likelihood measure

公式

Let me explain how MST-OATD handles trajectory embedding at different scales and anomaly score calculation:

  1. Trajectory Embedding at Different Scales:

The model embeds trajectories at multiple scales through these steps:

a) Initial Embedding:

  • For spatial points: Maps locations to grid cells and generates spatial embeddings using graph convolution
  • For temporal points: Uses a neural network with periodic activation to capture temporal patterns
  • These are combined using cross-attention to get spatio-temporal embeddings e = ⟨e1, e2, ..., en⟩

b) Multi-scale Fusion:

  • Takes the combined embedding sequence e and generates embeddings at different scales k using:
ek = ⟨êk1, êk2, ..., êk⌈n/sk⌉⟩

where:

  • sk is the segment size for scale k
  • êki is the mean embedding of points in segment i at scale k
  • The model uses 3 scales with sizes 1, 2, and 4

c) Attention Between Segments:

  • Applies attention between segments at each scale to capture relationships
  • Uses the formula:
ẽki = ∑j Softmax(αi,j)êkj

where αi,j captures the relationship between segments i and j

  1. Anomaly Score Calculation:

The anomaly score is NOT just based on reconstruction loss. Instead, it uses a detect-by-generation strategy:

The score for a trajectory T is calculated as:

Score(T) = 1 - arg max(c(s),c(t)) exp[
    log pγ(T(s)|μ(s)c(s))pγ(T(t)|μ(t)c(t))
    ────────────────────────────────
              n
]

Where:

  • T(s) and T(t) are spatial and temporal aspects
  • c(s) and c(t) are route types
  • μ(s) and μ(t) are means of Gaussian distributions for each type
  • n is trajectory length for normalization
  • pγ represents the generation probability

The key points about the scoring:

  1. It considers both spatial and temporal aspects
  2. Uses Gaussian mixture models to represent different route types
  3. Calculates generation probability rather than just reconstruction loss
  4. Normalizes by trajectory length
  5. Takes the maximum likelihood across all route types

This is more sophisticated than simple reconstruction loss because it:

  • Models multiple normal route patterns through Gaussian mixtures
  • Considers both spatial and temporal anomalies
  • Can handle partial trajectories for online detection
  • Normalizes scores to be comparable across trajectories of different lengths

Would you like me to explain any particular aspect in more detail?

posted @ 2024-12-31 00:02  GraphL  阅读(98)  评论(0)    收藏  举报