OmniParser - 安装 linux 系统（一）

一、安装 Ubuntu

访问 https://cn.ubuntu.com/download 网站下载乌班图系统。我下载的是 24.04 LTS 版本，如下图。跟着提示全部安装即可，注意要安装 SSH 服务。

二、安装 python 虚拟环境

ssh 登陆服务器后，执行一下命令安装 miniconda

# 下载
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# 安装
bash Miniconda3-latest-Linux-x86_64.sh

安装完成后，重新执行 ssh 命令登陆服务器

三、创建 python 虚拟环境

执行如下命令，创建并激活 python3.12 环境

# 创建一个 pythone 3.12 环境，名字为 py312
conda create -n py312 python=3.12 

# 激活 py312 环境
conda activate py312

四、克隆 OmniParser 项目并安装依赖库

# 克隆 OmniParser 项目
git clone https://github.com/microsoft/OmniParser.git

# 进入 OmniParser 目录
cd OmniParser

# 安装依赖
pip install -r requirements.txt

五、下载 OmniParser v2 模型

5.1、模型地址： https://huggingface.co/microsoft/OmniParser-v2.0/tree/main

5.2、安装 huggingface

使用如下命令，安装 HuggingFace python 依赖库

# 安装 HuggingFace python 依赖库
pip install huggingface_hub

配置 huggingface 国内镜像源（由于国外的 huggingface 访问有问题，所以可以配置成国内镜像源来解决）

# 1. 临时生效（仅对当前终端窗口有效）
export HF_ENDPOINT=https://hf-mirror.com

# 2. 永久生效（推荐，将该命令写入配置文件，对新开的终端都有效）
echo "export HF_ENDPOINT=https://hf-mirror.com" >> ~/.bashrc
source ~/.bashrc

5.3、下载 OmniParser v2 模型

OmniParser Git 仓库给的下载模型命令用的是 huggingface-cli，如下图。

但新版本命令改成了 hf，你可以自己修改或直接使用如下命令下载模型库（注意有的时候访问不到 huggingface 需要你自己想办法～～～）

# 进入 OmniParser 目录
cd OmniParser

# 创建权重目录
mkdir -p weights
 
   # download the model checkpoints to local directory OmniParser/weights/
   for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do hf download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done
   mv weights/icon_caption weights/icon_caption_florence

六、启动 gradio_demo

6.1、修改 util/utils.py 文件支持中文 OCR

你可以直接下载 https://files-cdn.cnblogs.com/files/rslai/utils.zip 替换现有的 utils.py 文件。或按如下文档做两处修改。

1、找到如下代码

import easyocr
from paddleocr import PaddleOCR
reader = easyocr.Reader(['en'])
paddle_ocr = PaddleOCR(
    lang='en',  # other lang also available
    use_angle_cls=False,
    use_gpu=False,  # using cuda will conflict with pytorch in the same process
    show_log=False,
    max_batch_size=1024,
    use_dilation=True,  # improves accuracy
    det_db_score_mode='slow',  # improves accuracy
    rec_batch_num=1024)

修改为

import easyocr
import torch

# 设置 GPU/CPU 参数
use_gpu = torch.cuda.is_available()  # 自动检测是否有 GPU
# 或者手动设置：use_gpu = False  # 强制使用 CPU

# 初始化 EasyOCR，支持中英文
# EasyOCR 支持 gpu 参数：
# - gpu=True: 使用 GPU（需要 CUDA）
# - gpu=False: 使用 CPU
reader = easyocr.Reader(
    ['ch_sim', 'en'],  # 简体中文 + 英文
    gpu=use_gpu,       # GPU 参数
    # model_storage_directory='./ocr_models',  # 可选：模型存储路径
    download_enabled=True,  # 允许下载模型
    verbose=False      # 减少输出信息
)

print(f"EasyOCR initialized with GPU: {use_gpu}") # 打印是否开启了 GPU

2、找到如下修改

def check_ocr_box(image_source: Union[str, Image.Image], display_img = True, output_bb_format='xywh', goal_filtering=None, easyocr_args=None, use_paddleocr=False):
    if isinstance(image_source, str):
        image_source = Image.open(image_source)
    if image_source.mode == 'RGBA':
        # Convert RGBA to RGB to avoid alpha channel issues
        image_source = image_source.convert('RGB')
    image_np = np.array(image_source)
    w, h = image_source.size
    if use_paddleocr:
        if easyocr_args is None:
            text_threshold = 0.5
        else:
            text_threshold = easyocr_args['text_threshold']
        result = paddle_ocr.ocr(image_np, cls=False)[0]
        coord = [item[0] for item in result if item[1][1] > text_threshold]
        text = [item[1][0] for item in result if item[1][1] > text_threshold]
    else:  # EasyOCR
        if easyocr_args is None:
            easyocr_args = {}
        result = reader.readtext(image_np, **easyocr_args)
        coord = [item[0] for item in result]
        text = [item[1] for item in result]
    if display_img:
        opencv_img = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
        bb = []
        for item in coord:
            x, y, a, b = get_xywh(item)
            bb.append((x, y, a, b))
            cv2.rectangle(opencv_img, (x, y), (x+a, y+b), (0, 255, 0), 2)
        #  matplotlib expects RGB
        plt.imshow(cv2.cvtColor(opencv_img, cv2.COLOR_BGR2RGB))
    else:
        if output_bb_format == 'xywh':
            bb = [get_xywh(item) for item in coord]
        elif output_bb_format == 'xyxy':
            bb = [get_xyxy(item) for item in coord]
    return (text, bb), goal_filtering

替换为

def check_ocr_box(image_source: Union[str, Image.Image], display_img = True, output_bb_format='xywh', goal_filtering=None, easyocr_args=None, use_paddleocr=False):
    if isinstance(image_source, str):
        image_source = Image.open(image_source)
    if image_source.mode == 'RGBA':
        # Convert RGBA to RGB to avoid alpha channel issues
        image_source = image_source.convert('RGB')
    image_np = np.array(image_source)
    w, h = image_source.size
    
    # 强制使用 EasyOCR，忽略 use_paddleocr 参数
    # 设置默认参数
    if easyocr_args is None:
        easyocr_args = {'paragraph': False, 'text_threshold': 0.5}
    
    # 使用 EasyOCR 进行识别
    result = reader.readtext(image_np, **easyocr_args)
    coord = [item[0] for item in result]
    text = [item[1] for item in result]
    
    if display_img:
        opencv_img = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
        bb = []
        for item in coord:
            x, y, a, b = get_xywh(item)
            bb.append((x, y, a, b))
            cv2.rectangle(opencv_img, (x, y), (x+a, y+b), (0, 255, 0), 2)
        # matplotlib expects RGB
        plt.imshow(cv2.cvtColor(opencv_img, cv2.COLOR_BGR2RGB))
    else:
        if output_bb_format == 'xywh':
            bb = [get_xywh(item) for item in coord]
        elif output_bb_format == 'xyxy':
            bb = [get_xyxy(item) for item in coord]
    
    return (text, bb), goal_filtering  # 返回格式保持与原来一致

6.2、修改默认 gradio_demo.py 启动监听 IP

默认 gradio_demo 不允许外网访问，所以需要把 127.0.0.1 改为 0.0.0.0，命令如下

# 编辑 gradio_demo.py 文件
vi gradio_demo.py

# 找到 
demo.launch(share=True, server_port=7861, server_name='127.0.0.1')

# 将其修改为
demo.launch(share=True, server_port=7861, server_name='0.0.0.0')

6.3、升级 transformers

执行 python gradio_demo.py 报如下错误

需要升级 transformers，执行如下命令

pip install transformers==4.38.2 accelerate torch

6.4、安装 flash_attn

再运行会提示如下错误

由于我没有显卡，flash_attn 一直安装失败。没办法只能 mock flash-attn 模块

1、创建 flash-attn 的 mock 模块

# 进入 miniconda 虚拟环境目录（需要根据你的安装位置修改）
cd /root/miniconda3/envs
# 进入对应虚拟环境（需要根据你的虚拟环境名称修改）
cd py312
# 创建 flash-attn 的 mock 模块
mkdir lib/python3.12/site-packages/flash_attn

lib/python3.12/site-packages/flash_attn：这个目录是linux上的，windows上没有 python3.12 这一层。

2、初始化 __init__.py 文件（用的linux命令，如果 windows 需要手动创建文件，并去掉第一行和最后一行）

cat > lib/python3.12/site-packages/flash_attn/__init__.py << 'EOF'
"""
Mock flash-attn module for CPU-only environment
"""
import warnings
warnings.warn("Using mock flash-attn module. This is not the real flash-attn and will be slower.")
 
__version__ = "0.0.0"
 
def flash_attn_func(q, k, v, dropout_p=0.0, softmax_scale=None, causal=False, return_attn_probs=False):
    """Mock flash attention function that falls back to standard implementation"""
    import torch
    import torch.nn.functional as F
     
    # 简单的标准 attention 实现
    d_k = q.size(-1)
    scores = torch.matmul(q, k.transpose(-2, -1)) / (d_k ** 0.5)
    if causal:
        mask = torch.triu(torch.ones(scores.size(-2), scores.size(-1)), diagonal=1).bool().to(scores.device)
        scores = scores.masked_fill(mask, float('-inf'))
    attn_weights = F.softmax(scores, dim=-1)
    attn_weights = F.dropout(attn_weights, p=dropout_p)
    output = torch.matmul(attn_weights, v)
     
    if return_attn_probs:
        return output, attn_weights
    return output
 
# 导出需要的类
class FlashAttention:
    def __init__(self, attention_dropout=0.0, num_heads=8, head_dim=64):
        self.attention_dropout = attention_dropout
        self.num_heads = num_heads
        self.head_dim = head_dim
     
    def forward(self, q, k, v, causal=False):
        return flash_attn_func(q, k, v, self.attention_dropout, causal=causal)
 
# 添加其他可能需要导入的函数
def flash_attn_with_kvcache(*args, **kwargs):
    return flash_attn_func(*args, **kwargs)
 
__all__ = ['flash_attn_func', 'FlashAttention', 'flash_attn_with_kvcache']
EOF
 
# 验证模块是否创建成功
python -c "import flash_attn; print('flash_attn mock module loaded successfully')"

6.5、运行 gradio_demo.py

执行如下命令

python gradio_demo.py

浏览器访问，ip地址根据实际情况修改

　 http://172.31.100.27:7861/

访问后如下图，上传一个图片后分析结果如下图

七、在 Ubuntu 24.04 LTS + python 3.12 + RTX 4090 24G 上运行

7.1、安装 NVIDIA 驱动 + CUDA

安装 NVIDIA 驱动

sudo apt install nvidia-driver-580 nvidia-dkms-580

安装 CUDA

# 添加 NVIDIA 官方仓库
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu$(lsb_release -rs | tr -d '.')/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

# 安装 CUDA 12.8
sudo apt-get install cuda-toolkit-12-8

安装 CUDA toolkit

sudo apt install nvidia-cuda-toolkit

7.2、创建 python 虚拟环境

用标题往上搜索，按照上面执行操作

7.3、克隆 OmniParser 项目并安装依赖库

用标题往上搜索，按照上面执行操作

7.4、下载 OmniParser v2 模型

用标题往上搜索，按照上面执行操作

7.5、修改 util/utils.py 文件支持中文 OCR

用标题往上搜索，按照上面执行操作

7.6、升级 transformers

用标题往上搜索，按照上面执行操作

7.7、mock flash-attn 模块

主要在这个目录中 mock 两个文件

linux 上直接运行如下代码，即可自动创建两个 mock 文件

# 创建 flash_attn mock 模块
mkdir -p /root/miniconda3/envs/py312/lib/python3.12/site-packages/flash_attn

cat > /root/miniconda3/envs/py312/lib/python3.12/site-packages/flash_attn/__init__.py << 'EOF'
"""
Mock flash_attn module for Florence-2 model
"""
import torch
import torch.nn.functional as F

__version__ = "2.6.3"

def flash_attn_func(q, k, v, dropout_p=0.0, softmax_scale=None, causal=False, 
                    window_size=(-1, -1), alibi_slopes=None, deterministic=False):
    """Fallback to PyTorch's scaled_dot_product_attention"""
    return F.scaled_dot_product_attention(
        q, k, v, 
        dropout_p=dropout_p,
        is_causal=causal,
        scale=softmax_scale
    )

def flash_attn_qkvpacked_func(qkv, dropout_p=0.0, softmax_scale=None, causal=False,
                               window_size=(-1, -1), alibi_slopes=None, deterministic=False):
    """Mock for packed QKV input"""
    # Assume qkv shape: (batch, seqlen, 3, head, dim)
    q, k, v = qkv.unbind(2)
    return flash_attn_func(q, k, v, dropout_p, softmax_scale, causal, 
                           window_size, alibi_slopes, deterministic)

print("✓ Using mock flash_attn (fallback to PyTorch implementation)")
EOF

# 创建 flash_attn_interface.py
cat > /root/miniconda3/envs/py312/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py << 'EOF'
"""
Flash Attention Interface Mock
"""
from . import flash_attn_func, flash_attn_qkvpacked_func

__all__ = ['flash_attn_func', 'flash_attn_qkvpacked_func']
EOF

echo "✓ Mock flash_attn installed successfully"

7.8、附录：显示 CUDA 版本

如何查看 cuda是否在使用

# 查看 GPU 整体状态
nvidia-smi

# 查找 Python 进程的 GPU 使用
nvidia-smi | grep python

# 或查看所有进程
fuser -v /dev/nvidia*

cuda toolkit 版本

nvcc --version

posted @ 2026-03-26 13:44 rslai 阅读(27) 评论(0) 收藏举报

刷新页面返回顶部

赖荣生

OmniParser - 安装 linux 系统（一）

一、安装 Ubuntu

二、安装 python 虚拟环境

三、创建 python 虚拟环境

四、克隆 OmniParser 项目并安装依赖库

五、下载 OmniParser v2 模型

5.1、模型地址： https://huggingface.co/microsoft/OmniParser-v2.0/tree/main

5.2、安装 huggingface

5.3、下载 OmniParser v2 模型

六、启动 gradio_demo

6.1、修改 util/utils.py 文件支持中文 OCR

6.2、修改默认 gradio_demo.py 启动监听 IP

6.3、升级 transformers

6.4、安装 flash_attn

6.5、运行 gradio_demo.py

七、在 Ubuntu 24.04 LTS + python 3.12 + RTX 4090 24G 上运行

7.1、安装 NVIDIA 驱动 + CUDA

7.2、创建 python 虚拟环境

7.3、克隆 OmniParser 项目并安装依赖库

7.4、下载 OmniParser v2 模型

7.5、修改 util/utils.py 文件支持中文 OCR

7.6、升级 transformers

7.7、mock flash-attn 模块

7.8、附录：显示 CUDA 版本

公告