摘要: 1. ViT(Vision Transformer) 中图像的序列化 \[z_0 = [x_{class}; x^1_pE; x^2_pE;\cdots; x^N_pE] + E_{pos}, E ∈ R^{(P 2 ·C)×D} , E_{pos} ∈ R^{(N +1)×D} \]2. toke 阅读全文
posted @ 2025-11-12 10:11 ldfm 阅读(6) 评论(0) 推荐(0)