flash-attention 安装

某些 LLM 需要 flash-attention 库来训练/推理，一些 LLM 可用可不用，但用了会快点。

flash attention 官网： https://github.com/Dao-AILab/flash-attention

检查版本：

ldd --version

如果 < 2.32，flash-attn 默认的镜像在 cu12 下是不支持的，需要手动编译，或者降到 2.7.4.post1 以下版本（参考 https://github.com/Dao-AILab/flash-attention/issues/1762）。

方法1：pip install （可能有编译）

安装好 pytorch 以后：

pip install packaging
pip install ninja
MAX_JOBS=4 pip install flash-attn --no-build-isolation

建议加上 --no-cache-dir，否则如果之前 build 过相同版本的 flash-attn 会直接调用缓存的，如果之前 build 时的 torch/cuda 版本不一致，缓存里的 flash-attn 不可用于当前这个环境：

MAX_JOBS=4 pip install flash-attn --no-build-isolation --no-cache-dir

如果不设置 MAX_JOBS ，官方说需要至少 96 GB 左右的内存，事实上更大的内存也会被占满（我 256GB 内存的开发机被占满，把 vscode 远程服务挤掉了）。

应该是 cpu 核越多，内存越大，编译速度越快，但有个上限。

方法2：镜像安装

可以在 https://github.com/Dao-AILab/flash-attention/releases 这里面找预编译的包。

一般的环境默认用 ABI false 的包。

里面好像都是 x86 的包，arm 只能方法 1 了。

方法3：从 github 源码编译安装

上面两种都不成功时，就只能从 github 上 clone 代码来从头编译了：

从 https://github.com/Dao-AILab/flash-attention clone，Tags 选择对应的版本。

然后：

cd flash-attention
python setup.py install

时间很久，占用很大。

检查：

import torch
print(torch.__version__)
print(torch.version.cuda)

import flash_attn
print("flash-attn version:", flash_attn.__version__)
print("CUDA available:", torch.cuda.is_available())
print("CUDA device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else None)

注意 flash-attn 和 pytorch 也有版本对应关系（但不是很严格，差的太远也不行），
可以参考我另一篇博客找版本的方法：https://www.cnblogs.com/coldchair/p/18519169

常见错误：

1：连接超时

Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [9 lines of output]
fatal: not a git repository (or any of the parent directories): .git
  torch.__version__  = 2.5.1+cu121
  
  
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.0/flash_attn-2.5.0+cu122torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
  error: <urlopen error [Errno 110] Connection timed out>
  [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Failed to build installable wheels for some pyproject.toml based projects (flash-attn)

可以看出来是连接超时了，换个代理网络。

2: ABI 版本不匹配

ImportError: ...flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
典型的 C++ ABI 不匹配问题
换 ABI False 的版本安装

3: PEP 517

          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try pip install --use-pep517.
          ********************************************************************************

强制用 PEP 517 方式构建

pip install flash-attn --no-build-isolation --use-pep517

这样会跳过隔离环境，直接用你当前环境里的依赖构建。
不过这会要求你本地已经有 NVIDIA CUDA toolkit 对应版本，并且能编译 C++/CUDA 代码。

posted @ 2024-12-18 16:57 Cold_Chair 阅读(5624) 评论(0) 收藏举报

刷新页面返回顶部

Cold_Chair的博客

天天被锤爆！怎么办？菜哭了啊o(╥﹏╥)o

flash-attention 安装

检查版本：

方法1：pip install （可能有编译）

方法2：镜像安装

方法3：从 github 源码编译安装

检查：

常见错误：

1：连接超时

2: ABI 版本不匹配

3: PEP 517

公告

Cold_Chair的博客

天天被锤爆！怎么办？菜哭了啊o(╥﹏╥)o

flash-attention 安装

检查版本：

方法1：pip install （可能有编译）

方法2： 镜像安装

方法3：从 github 源码编译安装

检查：

常见错误：

1：连接超时

2: ABI 版本不匹配

3: PEP 517

公告

方法2：镜像安装