无线表格识别模型LORE转换库:ConvertLOREToONNX

引言

总有小伙伴问到阿里的无线表格识别模型是如何转换为ONNX格式的。这个说来有些惭愧,现有的ONNX模型是很久之前转换的了,转换环境已经丢失,且没有做任何笔记。

今天下定决心再次尝试转换,庆幸的是转换成功了。于是有了转换笔记:ConvertLOREToONNX

这次吸取教训,环境文件采用Anaconda导出的,更加详细记录当前转换环境。以下是转换仓库的README,感兴趣小伙伴可以点击文末的“阅读原文”跳转到转换仓库尝试。

1. Clone the source code.

git clone https://github.com/SWHL/ConvertLaTeXOCRToONNX.git

2. Install env.

conda install --yes --file requirements.txt

3. Run the demo, and the converted model is located in the moodels directory.

python main.py

4. Install lineless_table_rec

pip install lineless_table_rec

5. Use

from pathlib import Path

from lineless_table_rec import LinelessTableRecognition

detect_path = "models/lore_detect.onnx"
process_path = "models/lore_process.onnx"
engine = LinelessTableRecognition(
    detect_model_path=detect_path, process_model_path=process_path
)

img_path = "images/lineless_table_recognition.jpg"
table_str, elapse = engine(img_path)

print(table_str)
print(elapse)

with open(f"{Path(img_path).stem}.html", "w", encoding="utf-8") as f:
    f.write(table_str)

print("ok")
posted @ 2024-03-10 15:10  Danno  阅读(12)  评论(0编辑  收藏  举报