gguf格式量化方法
gguf格式下,各种量化方法后的支持情况,及运行速度
| Library | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute | 
|---|---|---|---|---|---|---|---|---|---|
| K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 慢 | ✅慢 | ✅ | 
| I-quants | ✅慢 | ✅慢 | ✅慢 | ✅ | ✅ | Partial¹ | ✅ | ✅ | ✅ | 
| Multi-GPU | N/A | N/A | N/A | ✅ | ❓ | ✅ | ❓ | ✅ | ❓ | 
| K cache quants | ✅ | ❓ | ✅ | ✅ 慢 | Partial⁶慢 | ❓ | ✅ | ✅ | ✅ | 
| MoE architecture | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | ✅ | ✅ | 
Note:
- ✅: Supported
- ❓: Not supported
- N/A: Not applicable
- Partial¹: Partially supported
- Partial²: Partially supported
- Partial⁶: Partially supported
- 🐢⁴: Limited support
- 🐢⁵: Limited support
- 🐢³: Limited support
 
                    
                
 
 
                
            
         浙公网安备 33010602011771号
浙公网安备 33010602011771号