多尺度计算
完整过程
Let me explain the technical details of route inference and anomaly detection based on the paper:
Route Inference Technical Details:
- Single-scale Inference:
- Uses RNN to process embeddings at each timestamp:
hi = g1(ẽi, hi-1), i = 1, 2, ..., n
where g1 is the RNN module and hi-1 is the previous hidden state
- Multi-scale Inference:
- Uses Gaussian mixture model with C components
- Parameters calculated through linear transformations:
μ(s) = f3(Σk λk(s)hk(s)), σ2(s) = f4(Σk λk(s)hk(s))
μ(t) = f5(Σk λk(t)hk(t)), σ2(t) = f6(Σk λk(t)hk(t))
where:
- hk(s) and hk(t) are final hidden states
- λk(s) and λk(t) are scale parameters
- f3-f6 are fully connected layers
Anomaly Detection Technical Details:
- Score Calculation:
 For full trajectory:
Score(T) = 1 - argmaxc(s),c(t) exp[logpγ(T(s)|μc(s))pγ(T(t)|μc(t))/n]
For online detection:
Score(T≤i) = 1 - argmaxc(s),c(t) exp[
  logpγ(T(s)≤i|μc(s))pγ(e(s)i|T(s)≤i,μc(s))/(i+1) +
  logpγ(T(t)≤i|μc(t))pγ(e(t)i|T(t)≤i,μc(t))/(i+1)
]
- Ranking Model for Updates:
rankc(T) = [1 + e-(w·p(c|r(s))+b)]-1 * N
where:
- w and b are weight and bias
- p(c|r(s)) is probability of type c
- N is total number of trajectories
The ranking loss function:
L(w;b) = ΣNi=1(yi - rankc(Ti))2
where yi is the true rank of trajectory Ti
This enables constant-time complexity for updating trajectory rankings in the model.
完整过程举例
Let me break this down with a concrete example to help explain how route inference and anomaly detection work:
Let me explain with this concrete example:
- Route Inference Example:
 Let's say we have a trajectory T with three points:
Spatial points: [(1,1), (2,2), (3,3)]
Timestamps: [0, 10, 20]
After embedding, we get:
Spatial embeddings: [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
Temporal embeddings: [[0.05, 0.1], [0.15, 0.2], [0.25, 0.3]]
- Gaussian Mixture Model:
 Let's say we have 3 route types (Gaussian components):
Component 1: mean = 0.5, std = 0.1
Component 2: mean = 0.7, std = 0.15
Component 3: mean = 0.3, std = 0.05
- Probability Calculation:
 For the first point:
Spatial probabilities:
- Component 1: 0.00034
- Component 2: 0.00034
- Component 3: 0.00034
Temporal probabilities:
- Component 1: 0.000040
- Component 2: 0.000084
- Component 3: 0.0000037
- Anomaly Score Calculation:
Max spatial probability = 0.00034 (Component 1)
Max temporal probability = 0.000084 (Component 2)
Anomaly score = 1 - (0.00034 * 0.000084) ≈ 0.99999
- Ranking Model Example:
 Let's say we're updating the dataset with a new trajectory:
Initial probabilities: [0.00034, 0.00034, 0.00034]
Apply ranking function: rankc(T) = [1 + e-(0.00034*w + b)]-1 * N
If N = 1000, w = 1, b = 0:
Rank ≈ 500 (middle of dataset)
In this example:
- The high anomaly score (close to 1) suggests this trajectory is likely anomalous
- The ranking puts it in the middle of the dataset, suggesting moderate similarity to normal patterns
- For online detection, we would update this score as new points arrive
Would you like me to explain any specific part in more detail or provide another example with different parameters?
编码部分
I'll walk through a numeric example of how temporal embedding fusion works with multiple scales in MST-OATD.
Let's say we have a trajectory with 8 points, each with a temporal embedding of dimension 4 for simplicity:
# Initial temporal embeddings (8 points x 4 dimensions)
t_inputs = [
    [1.0, 0.5, 0.3, 0.2],  # t1
    [1.2, 0.6, 0.4, 0.3],  # t2
    [1.4, 0.7, 0.5, 0.4],  # t3
    [1.6, 0.8, 0.6, 0.5],  # t4
    [1.8, 0.9, 0.7, 0.6],  # t5
    [2.0, 1.0, 0.8, 0.7],  # t6
    [2.2, 1.1, 0.9, 0.8],  # t7
    [2.4, 1.2, 1.0, 0.9]   # t8
]
# Parameters
s1_size = 2  # Scale 1: groups of 2
s2_size = 4  # Scale 2: groups of 4
# Scale 1 (segments of size 2)
scale1_embeddings = [
    # Mean of t1,t2
    [1.1, 0.55, 0.35, 0.25],  
    # Mean of t3,t4
    [1.5, 0.75, 0.55, 0.45],  
    # Mean of t5,t6
    [1.9, 0.95, 0.75, 0.65],  
    # Mean of t7,t8
    [2.3, 1.15, 0.95, 0.85]   
]
# Scale 2 (segments of size 4)
scale2_embeddings = [
    # Mean of t1,t2,t3,t4
    [1.3, 0.65, 0.45, 0.35],  
    # Mean of t5,t6,t7,t8
    [2.1, 1.05, 0.85, 0.75]   
]
# Weights learned for combining scales
W1 = 0.4  # Weight for original scale
W2 = 0.3  # Weight for scale1
W3 = 0.3  # Weight for scale2
# Final embedding combines all scales with weighted sum
final_state = (
    W1 * original_state +    # Original temporal features
    W2 * scale1_state +      # Scale 1 features (pairs)
    W3 * scale2_state        # Scale 2 features (groups of 4)
)
Let's see how one point's embedding gets updated through the attention mechanism:
# For t4 point, attention weights might look like:
attention_weights = [
    0.1,  # attention to t1-t2 group
    0.5,  # attention to t3-t4 group (highest as it contains t4)
    0.3,  # attention to t5-t6 group
    0.1   # attention to t7-t8 group
]
# Final embedding for t4 after attention:
t4_attended = (
    0.1 * [1.1, 0.55, 0.35, 0.25] +  # from first group
    0.5 * [1.5, 0.75, 0.55, 0.45] +  # from second group
    0.3 * [1.9, 0.95, 0.75, 0.65] +  # from third group
    0.1 * [2.3, 1.15, 0.95, 0.85]    # from fourth group
) = [1.64, 0.82, 0.62, 0.52]
This multi-scale approach allows the model to:
- Capture local temporal patterns (original scale)
- Learn patterns between pairs of points (scale 1)
- Learn longer-range patterns (scale 2)
- Use attention to weight the importance of different temporal segments
The final embedding for each point thus contains information about both its immediate temporal neighborhood and longer-range temporal dependencies.
示例:基于代码的轨迹推断、GMM分类与重构过程解析
以下是对轨迹嵌入聚合后,进行推断、GMM分类以及轨迹重构的详细示例。这个示例完全基于你提供的 mst_oatd.py 和 mst_oatd_trainer.py 代码。
场景设定:
假设轨迹嵌入经过多尺度融合,得到以下聚合后的嵌入向量:
encoder_final_state = torch.tensor([0.7, 0.5, 0.8])
这是在经过三层 GRU 编码器推断,并通过可学习权重 $ W1, W2, W3 $ 加权融合后得到的轨迹表示。
假设模型中定义了 3 个高斯混合模型 (GMM) 簇,分别代表不同的正常轨迹模式。
步骤 1:高斯混合模型 (GMM) 分类
代码片段:
mu = self.fc_mu(encoder_final_state)
logvar = self.fc_logvar(encoder_final_state)
z = self.reparameterize(mu, logvar)
示例解释:
- 将轨迹嵌入映射到潜在空间 (Latent Space):
 通过线性层计算均值 $ \mu $ 和对数方差 $ \log \sigma^2 $:
mu = fc_mu(encoder_final_state)  # 假设 fc_mu 输出 [0.6, 0.4, 0.7]
logvar = fc_logvar(encoder_final_state)  # 假设 fc_logvar 输出 [-0.5, -0.7, -0.3]
- 使用重参数化技巧生成潜在向量 $ z $:
std = torch.exp(0.5 * logvar)  # 计算标准差 std = exp([-0.25, -0.35, -0.15]) ≈ [0.78, 0.71, 0.87]
eps = torch.randn_like(std)  # eps ~ N(0, 1),假设 eps = [0.2, -0.3, 0.1]
z = mu + eps * std  # z = [0.6, 0.4, 0.7] + [0.2, -0.3, 0.1] * [0.78, 0.71, 0.87] ≈ [0.756, 0.187, 0.787]
步骤 2:GMM 分类
代码片段:
mu_prior = self.mu_prior  # 每个簇的均值
log_var_prior = self.log_var_prior  # 每个簇的方差
假设高斯混合模型有 3 个簇,其均值和方差如下:
mu_prior = torch.tensor([
    [0.5, 0.4, 0.6],  # 簇1
    [0.7, 0.5, 0.8],  # 簇2
    [0.4, 0.3, 0.5]   # 簇3
])
log_var_prior = torch.tensor([
    [-0.4, -0.5, -0.6],  # 簇1
    [-0.3, -0.3, -0.4],  # 簇2
    [-0.6, -0.7, -0.5]   # 簇3
])
计算轨迹嵌入 $ z $ 在每个簇的概率:
prob_c1 = -0.5 * torch.sum(((z - mu_prior[0]) ** 2) / torch.exp(log_var_prior[0]))
prob_c2 = -0.5 * torch.sum(((z - mu_prior[1]) ** 2) / torch.exp(log_var_prior[1]))
prob_c3 = -0.5 * torch.sum(((z - mu_prior[2]) ** 2) / torch.exp(log_var_prior[2]))
假设计算得到的结果:
prob_c1 = -1.2
prob_c2 = -0.7
prob_c3 = -1.5
选择最大概率的簇,即簇2 (概率最高,最接近轨迹模式)。
步骤 3:轨迹重构
代码片段:
decoder_outputs, _ = self.decoder(decoder_inputs, z)
在轨迹重构阶段,解码器以潜在表示 $ z $ 作为初始状态,生成轨迹嵌入。假设输出如下:
decoder_outputs = torch.tensor([
    [1.05, 0.55, 0.35, 0.25],
    [1.25, 0.65, 0.45, 0.35],
    [1.45, 0.75, 0.55, 0.45]
])  # 3个时间步的轨迹重构
全连接层将解码器输出映射回原始轨迹空间:
output = self.fc_out(decoder_outputs)
假设输出为:
output = torch.tensor([
    [1.1, 0.6, 0.4, 0.3],
    [1.3, 0.7, 0.5, 0.4],
    [1.6, 0.8, 0.6, 0.5]
])
步骤 4:异常检测概率计算
在 mst_oatd_trainer.py 文件中,异常检测通过生成轨迹概率计算:
likelihood = torch.exp(-torch.sum((output - embeddings) ** 2, dim=-1))
score = 1 - likelihood.max()
假设原始嵌入为:
embeddings = torch.tensor([
    [1.0, 0.5, 0.3, 0.2],
    [1.2, 0.6, 0.4, 0.3],
    [1.4, 0.7, 0.5, 0.4]
])
计算重构误差并转换为概率:
error = torch.sum((output - embeddings) ** 2, dim=-1)  # [0.02, 0.03, 0.04]
likelihood = torch.exp(-error)  # [0.98, 0.97, 0.96]
score = 1 - likelihood.max()  # 1 - 0.98 = 0.02
最终结果:
- 潜在表示 $ z $:\(0.756, 0.187, 0.787\)
- 选中的GMM簇: 簇2
- 轨迹重构: \(1.1, 0.6, 0.4, 0.3\), \(1.3, 0.7, 0.5, 0.4\), \(1.6, 0.8, 0.6, 0.5\)
- 异常分数: 0.02(接近 0,表示轨迹正常)
总结:
- GMM 分类 帮助模型选择轨迹所属的正常模式。
- 解码器重构轨迹 并与原始轨迹比较,计算重构误差。
- 异常分数 接近 1 表示轨迹异常,接近 0 表示轨迹正常。
embed以及检测的代码
From analyzing the code, here's how trajectories are processed and anomaly scores calculated:
- Trajectory Embedding Process in mst_oatd.py:
# Initial spatial embedding through graph convolution
H = D.mm(A).mm(self.V).mm(D)  # Normalize adjacency matrix
nodes = H.mm(self.embedding(self.nodes))
s_inputs = torch.index_select(nodes, 0, trajs.flatten())
# Temporal embedding 
t_inputs = self.d2v(times)
# Combine via cross-attention
att_s, att_t = self.co_attention(s_inputs, t_inputs)
st_inputs = torch.concat((att_s, att_t), dim=2)
# Multi-scale processing via different RNNs at scales s1 and s2
encoder_inputs_s1 = pack_padded_sequence(self.attention_layer(st_inputs, lengths))
encoder_inputs_s2 = self.scale_process(st_inputs, self.s1_size, [int(i // self.s1_size) for i in lengths])
encoder_inputs_s3 = self.scale_process(st_inputs, self.s2_size, [int(i // self.s2_size) for i in lengths])
# Combine scales with learned weights
encoder_final_state = (self.W1 * encoder_final_state_s1 + 
                      self.W2 * encoder_final_state_s2 + 
                      self.W3 * encoder_final_state_s3)
- Anomaly Score Calculation in mst_oatd_trainer.py:
def detection(self):
    # Compute likelihood across all clusters
    for batch in self.outliers_loader:
        trajs, times, seq_lengths = batch
        c_likelihood_s = []
        c_likelihood_t = []
        # Get likelihood for each cluster
        for c in range(self.n_cluster):
            output_s, _, _, _ = self.MST_OATD_S(trajs, times, seq_lengths, batch_size, "test", c)
            likelihood_s = - self.detec(output_s.reshape(-1, output_s.shape[-1]),
                                      trajs.to(self.device).reshape(-1))
            likelihood_s = torch.exp(
                torch.sum(mask * (likelihood_s.reshape(batch_size, -1)), dim=-1) / torch.sum(mask, 1))
            
            output_t, _, _, _ = self.MST_OATD_T(trajs, times, seq_lengths, batch_size, "test", c)
            likelihood_t = - self.detec(output_t.reshape(-1, output_t.shape[-1]),
                                      times_token.to(self.device).reshape(-1))
            likelihood_t = torch.exp(
                torch.sum(mask * (likelihood_t.reshape(batch_size, -1)), dim=-1) / torch.sum(mask, 1))
            
            c_likelihood_s.append(likelihood_s)
            c_likelihood_t.append(likelihood_t)
        # Final anomaly score combines spatial and temporal likelihoods
        likelihood_s = torch.cat(c_likelihood_s).max(0)[0]
        likelihood_t = torch.cat(c_likelihood_t).max(0)[0]
        score = 1 - likelihood_s * likelihood_t
The code shows that anomaly scores are computed using:
- Maximum likelihood across all clusters
- Product of spatial and temporal likelihoods
- Normalized by trajectory length via the mask
- Negative cross-entropy loss as the base likelihood measure
不需要计算直接得到排名
是的,你的理解基本正确!
核心理解:
- 异常分数计算 是用来判断轨迹异常程度的一个独立指标。
- 轨迹排名计算 是通过一个数学公式直接计算排名,而不需要逐个比较或排序整个轨迹集。
关键点:轨迹排名是在计算出异常分数后,通过一个公式直接映射到排名,而不是通过传统的“排序”方法逐一比较轨迹得分。
详细解释:
1. 异常分数的计算:
轨迹的空间和时间概率被计算出来后,我们通过公式计算异常分数:
[
Anomaly\ Score = 1 - (P_{spatial} \times P_{temporal})
]
- 如果轨迹点的空间和时间概率都很低,异常分数会接近 1,表示异常轨迹。
- 如果概率较高,分数接近 0,表示轨迹更接近正常路径。
2. 排名计算的方式:
排名使用下面的公式计算:
[
rank_c(T) = [1 + e^{-(w \cdot p(c|r(s)) + b)}]^{-1} \times N
]
- 不需要将所有轨迹的异常分数排序,直接将轨迹的异常程度映射到排名。
- 这个公式中的:
- ( p(c|r(s)) ) 表示轨迹属于路径类型 ( c ) 的概率(通过异常分数隐式体现)。
- ( w ) 是学习到的权重,表示对轨迹异常程度的敏感性。
- ( b ) 是偏置,用于调整排名结果的偏移量。
- ( N ) 是轨迹的总数量,表示轨迹的排名范围。
 
为什么这样做?
传统方法问题:
- 如果使用传统排序方式,每次轨迹更新都需要重新计算并对所有轨迹排序,复杂度较高 ( O(N \log N) )。
- 这种方法在轨迹数量较多时效率较低。
新方法优势:
- 这个排名公式允许在常数时间内计算排名 ( O(1) ),不管轨迹数量多少,都能快速得到结果。
- 异常分数映射到排名,不用实际排序轨迹集合。
示例说明:
假设有以下情况:
- 轨迹异常分数高(接近 1),表明轨迹异常。
- 异常分数转换成概率很低,例如 ( p(c|r(s)) = 0.00034 )。
- 通过排名公式计算出该轨迹的排名约为 500(在 1000 个轨迹中)。
这表示:
- 虽然轨迹有较高的异常分数,但排名公式自动将其映射到一个合理的中等位置。
- 如果分数更低(更接近正常),排名会自动更靠前。
小结:
- 不需要实际排序轨迹,而是通过公式直接映射轨迹分数到排名。
- 异常分数只是用来判断轨迹异常性的指标,而排名公式将异常程度直接转换为排名值。
- 这种方法更高效,特别适用于在线更新和实时检测。
轨迹分数计算
以下是对上述内容的翻译:
异常检测流程分析
从代码分析来看,在对轨迹进行排名后,异常检测过程发生在 train_mst_oatd 类的 detection 方法中。该方法的工作流程如下:
1. 轨迹处理
对于每条轨迹,模型计算两个似然分数(likelihood scores):
- 空间模式的似然分数(likelihood_s)
- 时间模式的似然分数(likelihood_t)
这些步骤的关键过程如下:
2. 对于每个簇 ( c )(从 0 到 ( n_{cluster} - 1 )):
- 模型使用空间和时间组件计算输出概率:
output_s, _, _, _ = self.MST_OATD_S(trajs, times, seq_lengths, batch_size, "test", c)
output_t, _, _, _ = self.MST_OATD_T(trajs, times, seq_lengths, batch_size, "test", c)
- 然后计算空间和时间组件的负对数似然:
likelihood_s = - self.detec(output_s.reshape(-1, output_s.shape[-1]), 
                              trajs.to(self.device).reshape(-1))
likelihood_t = - self.detec(output_t.reshape(-1, output_t.shape[-1]),
                              times_token.to(self.device).reshape(-1))
- 这些似然分数根据序列长度进行归一化并进行指数运算:
likelihood_s = torch.exp(torch.sum(mask * (likelihood_s.reshape(batch_size, -1)), 
                          dim=-1) / torch.sum(mask, 1))
3. 最大似然选择
对于每条轨迹,在所有簇中选择空间和时间组件的最大似然分数。
4. 最终异常分数计算
最终的异常分数计算方式如下:
1 - likelihood_s * likelihood_t
- 分数越高,轨迹异常的可能性越大。
原因:
- 正常轨迹在空间和时间维度上应具有较高的似然分数(接近 1)。
- 异常轨迹在至少一个维度上会具有较低的似然分数。
- 将两个似然分数相乘再用 1 减去,得到的分数越高,表示异常程度越高。
5. 模型性能评估
模型使用 PR-AUC(精确率-召回率曲线下面积)评估性能:
pr_auc = auc_score(self.labels, (1 - likelihood_s * likelihood_t).cpu().detach().numpy())
6. 关键点总结
- 该方法有效地结合了空间和时间模式来检测异常。
- 如果轨迹在空间或时间维度上显著偏离已学习的正常模式,该轨迹就会被视为异常。
我更理解你的问题了,让我解释一下排名和异常检测之间的关系:
从代码分析来看,排名步骤实际上发生在最终模型测试之前,它的用途不同——排名用于更新训练数据,而不是用于最终的异常检测。以下是流程:
1. 首先,在 train_update.py 文件中进行排名:
def get_score(z):
    probs = gmm.predict_proba(z)  # 获取每个簇的概率
    
    idx = []
    linear = Linear_Model()
    for label in range(args.n_cluster):
        data = -probs[:, label]  # 概率取负
        rank = linear.test(label, torch.Tensor(data).to(args.device))
        idx.append(rank)
    idx = np.array(idx).T
    idxs = np.argsort(idx, axis=1)  # 最终排序
2. 这个排名在 update_data() 方法中用于更新训练数据集:
def update_data(origin_trajs, train_trajs, cats_sample):
    _, z, train_trajs = get_index(train_trajs, cats_sample)
    idxs = get_score(z)  # 获取排名
    
    max_idxs = idxs[:, 0]
    for i, traj in enumerate(train_trajs):
        max_idx = max_idxs[i]
        origin_trajs[max_idx].append(traj)  # 添加到最匹配的簇
3. 实际的异常检测(如我之前的回答所述)是通过似然分数计算完成的,而不是通过排名。
结论:
- 排名和异常检测没有直接关系。
- 排名用于数据管理和训练优化,而异常分数是通过似然方法单独计算的。
补充说明:
如果我之前的回答引起了混淆,很抱歉。如果需要更详细地解释任何具体部分,请告诉我!
GMM生成器 so will there be a threshold for the gmm trajectories?
Looking at the code, there isn't actually an explicit threshold being used for the GMM trajectories. Instead, the system uses relative rankings and fixed sizes:
- For data updates in rank mode:
cats_sample = random.sample(cats, args.n_cluster // 4)  # Randomly sample 1/4 of clusters
- The number of trajectories is kept constant - when adding new trajectories, it removes an equal number of old ones:
# In update_data():
origin_trajs[max_idx].append(traj)        # Add new trajectory
origin_trajs[min_idx].pop(0)              # Remove old trajectory
- For anomaly detection, it uses the product of likelihoods rather than a threshold:
1 - likelihood_s * likelihood_t  # Higher score = more likely to be anomalous
The only thresholding happens implicitly through the PR-AUC evaluation, which considers different possible thresholds when calculating the area under the precision-recall curve.
Would you like me to explain more about how the system maintains balance without explicit thresholds?
什么轨迹算是正常,什么轨迹算是异常
Ah yes, now I understand your question!
Looking at the detection code:
for c in range(self.n_cluster):
    # Get likelihood for each cluster
    output_s, _, _, _ = self.MST_OATD_S(trajs, times, seq_lengths, batch_size, "test", c)
    output_t, _, _, _ = self.MST_OATD_T(trajs, times, seq_lengths, batch_size, "test", c)
    # Take the maximum likelihood across all clusters
    all_likelihood_s.append(likelihood_s)
    all_likelihood_t.append(likelihood_t)
likelihood_s = torch.cat(all_likelihood_s, dim=0).max(0)[0]  # Best spatial cluster fit
likelihood_t = torch.cat(all_likelihood_t, dim=0).max(0)[0]  # Best temporal cluster fit
# Final anomaly score
anomaly_score = 1 - likelihood_s * likelihood_t
So yes, if a trajectory is:
- Close to a spatial GMM center (high likelihood_s) AND
- Close to a temporal GMM center (high likelihood_t)
Then:
- Both likelihoods will be close to 1
- Their product will be close to 1
- The anomaly score (1 - product) will be close to 0
- Therefore it will be predicted as normal
It doesn't need to be close to the SAME cluster center in both spatial and temporal GMMs - it just needs to be normal in both dimensions separately.
损失的计算 训练 vs. 预测
代码中实际上存在两个阶段计算“类似异常”的分数:
1. 训练期间(损失计算):
def Loss(self, x_hat, targets, z_mu, z_sigma2_log, z, mode, mask):
    # 1. 重建损失
    reconstruction_loss = self.crit(x_hat[mask == 1], targets[mask == 1])
    
    # 2. 高斯损失(潜在空间与GMM先验匹配程度)
    gaussian_loss = torch.mean(torch.mean(self.gaussian_pdf_log(z, z_mu, z_sigma2_log).unsqueeze(1) - 
                                        self.gaussian_pdfs_log(z, mu_c, log_sigma2_c), dim=1))
    
    # 3. 类别损失(簇分配)
    logits = - torch.sum(torch.pow(z - mu_c, 2) / torch.exp(log_sigma2_c), dim=-1)
    logits = F.softmax(logits, dim=-1) + 1e-10
    category_loss = torch.mean(torch.sum(logits * (torch.log(logits) - torch.log(pi)), dim=-1))
    # 综合损失
    loss = reconstruction_loss + gaussian_loss / self.hidden_size + category_loss * 0.1
2. 检测/测试期间:
# 基于重建误差计算似然
likelihood_s = - self.detec(output_s.reshape(-1, output_s.shape[-1]),
                           trajs.to(self.device).reshape(-1))
likelihood_t = - self.detec(output_t.reshape(-1, output_t.shape[-1]),
                           times_token.to(self.device).reshape(-1))
# 最终异常分数
anomaly_score = 1 - likelihood_s * likelihood_t
关系说明:
- 
在训练期间,模型学习以下内容: - 良好地重建轨迹(重建损失)
- 将潜在表示匹配到GMM分布(高斯损失)
- 清晰地将轨迹分配到簇中(类别损失)
 
- 
在检测期间: - 使用学习到的重建能力和GMM先验计算似然。
- 重建误差隐式地受到轨迹与训练中学习到的GMM模式匹配程度的影响。
- 最终异常分数结合了空间和时间两个方面的似然分数。
 
关键点:
- 训练损失和最终异常分数之间没有直接的数学关系,但重建部分帮助模型学习更好的表示,从而使基于GMM的异常检测更加有效。
encode-decode
Let me explain the encoding-decoding process from the code in mst_oatd.py:
In the forward() method, here's how encoding and decoding works:
- Encoding Process:
def forward(self, trajs, times, lengths, batch_size, mode, c):
    # 1. Get embeddings
    # Spatial embedding
    adj = self.array2sparse(self.adj)
    d_norm = self.array2sparse(self.d_norm)
    if self.dataset == 'porto':
        H = self.Norm_A(adj, d_norm)
        nodes = H.mm(self.embedding(self.nodes))
    else:
        H = self.Norm_A_N(adj, d_norm)
        nodes = H.mm(self.embedding(self.nodes)).mm(self.V)
    
    s_inputs = torch.index_select(nodes, 0, trajs.flatten()).reshape(batch_size, -1, self.emb_size)
    
    # Temporal embedding
    t_inputs = self.d2v(times.to(self.device))
    # 2. Co-attention between spatial and temporal
    att_s, att_t = self.co_attention(s_inputs, t_inputs)
    st_inputs = torch.concat((att_s, att_t), dim=2)
    # 3. Multi-scale encoding
    encoder_inputs_s1 = pack_padded_sequence(self.attention_layer(st_inputs, lengths), lengths,
                                           batch_first=True, enforce_sorted=False)
    encoder_inputs_s2 = self.scale_process(st_inputs, self.s1_size, [int(i // self.s1_size) for i in lengths])
    encoder_inputs_s3 = self.scale_process(st_inputs, self.s2_size, [int(i // self.s2_size) for i in lengths])
    # Get encoded states at different scales
    _, encoder_final_state_s1 = self.encoder_s1(encoder_inputs_s1)
    _, encoder_final_state_s2 = self.encoder_s2(encoder_inputs_s2)
    _, encoder_final_state_s3 = self.encoder_s3(encoder_inputs_s3)
    # Combine multi-scale encodings
    encoder_final_state = (self.W1 * encoder_final_state_s1 + 
                          self.W2 * encoder_final_state_s2 + 
                          self.W3 * encoder_final_state_s3)
- Decoding Process:
    # Prepare decoder inputs
    d_inputs = torch.cat((
        torch.zeros(batch_size, 1, self.emb_size * 2).to(self.device),
        st_inputs[:, :-1, :]  # Previous timesteps
    ), dim=1)  
    
    decoder_inputs = pack_padded_sequence(d_inputs, lengths, batch_first=True)
    if mode == "pretrain" or "train":
        # Get latent representation
        mu = self.fc_mu(encoder_final_state)
        logvar = self.fc_logvar(encoder_final_state)
        z = self.reparameterize(mu, logvar)
        
        # Decode
        decoder_outputs, _ = self.decoder(decoder_inputs, z)
        decoder_outputs, _ = pad_packed_sequence(decoder_outputs, batch_first=True)
    elif mode == "test":
        # Use GMM prior during testing
        mu = torch.stack([self.mu_prior] * batch_size, dim=1)[c: c + 1]
        decoder_outputs, _ = self.decoder(decoder_inputs, mu)
        decoder_outputs, _ = pad_packed_sequence(decoder_outputs, batch_first=True)
    # Final output projection
    output = self.fc_out(self.layer_norm(decoder_outputs))
Then the reconstruction loss is calculated:
reconstruction_loss = self.crit(x_hat[mask == 1], targets[mask == 1])
Where:
- x_hat: Decoded output predictions
- targets: Original input sequences
- mask == 1: Only consider non-padded positions
- self.crit: CrossEntropyLoss
Key points:
- The encoder has 3 scales to capture different levels of patterns
- It uses both spatial (trajectory) and temporal (time) information
- During training, it learns a latent space through VAE (variational autoencoder)
- During testing, it uses GMM priors instead of encoded latent space
- The decoder tries to reconstruct the original sequence step by step
- The reconstruction loss measures how well the decoder matches the input
 
                    
                
 
                
            
         浙公网安备 33010602011771号
浙公网安备 33010602011771号