2025 年 3月 25 日随笔档案 - Gold_stein

2025年3月25日

摘要： CUDA三类__shfl函数总结内容 1. __shfl_xor_sync 蝴蝶交换函数签名 T __shfl_xor_sync( unsigned mask, // 参与线程的位掩码 (通常0xffffffff) T value, // 要交换的值 (int/float) int lane_m 阅读全文

posted @ 2025-03-25 23:43 Gold_stein 阅读(354) 评论(0) 推荐(0)

Flash Attention & Paged Attention

摘要： Flash Attention & Paged Attention 内容 FlashAttention 和 PagedAttention 是两种针对 Transformer 注意力机制的显存优化技术，分别解决不同维度的性能瓶颈。这里用技术对比的方式帮你快速理解： 1. FlashAttention（阅读全文

posted @ 2025-03-25 15:01 Gold_stein 阅读(787) 评论(0) 推荐(0)

公告