OCR相关的笔记

OCR相关的知识整理：建议实际业务使用的时候，基地模型使用PaddleOCR，然后布局可以使用minerU，pdf类型的文本，可以使用PymuPDF工具，我们实际产品中就是这么用的，文本布局和表格部分一般需要自己训练优化，很难满足各类自己的应用场景。

https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/quickstart.md 官方文档说明
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/ppstructure/docs/quickstart.md ： structure分析

docker pull paddlepaddle/paddle:2.6.0-gpu-cuda12.0-cudnn8.9-trt8.6
docker:nvidia-docker run --name paddle -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.5.2-gpu-cuda10.2-cudnn7.6-trt7.0 /bin/bash

数据准备，模型训练：
https://zhuanlan.zhihu.com/p/686402622
数据标注： paddleLabel
https://blog.csdn.net/qq_49627063/article/details/119134847
数据标注工具：
https://github.com/PFCCLab/PPOCRLabel #标注工具
https://github.com/sohaib023/T-Truth # 表格识别表格标注工具，需要做转换
https://github.com/PaddleCV-SIG/PaddleLabel/blob/v1.0.0/doc/CN/install.md

https://aistudio.baidu.com/modelsdetail/18?modelId=18 官方文档信息

https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/table_recognition.md 表格识别
https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/dataset/table_datasets.md 表格数据集
https://github.com/PaddlePaddle/PaddleOCR/blob/main/applications 应用说明

https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/ppstructure/layout/README_ch.md 数据集连接

https://github.com/WenmuZhou/TableGeneration 表格数据生成

GPT使用
https://learn.microsoft.com/zh-cn/azure/ai-services/openai/how-to/gpt-with-vision?tabs=python%2Csystem-assigned%2Cresource

算法说明：
https://blog.csdn.net/shiwanghualuo/article/details/129132206

https://huggingface.co/datasets/juliozhao/DocSynth300K 数据集

========================
PyMuPDF相关
========================
https://github.com/pymupdf
https://github.com/pymupdf/RAG
https://pymupdf4llm.readthedocs.io/en/latest/ PyMuPDF4LLM
https://pymupdf.readthedocs.io/en/latest/rag.html# rag_with llm
https://github.com/pymupdf/RAG 代码
https://pymupdf.readthedocs.io/en/latest/tutorial.html 文档

https://blog.csdn.net/shiwanghualuo/article/details/129132206 SLANet总结

tesseract-ocr
https://github.com/tesseract-ocr/tesseract
https://github.com/tesseract-ocr/tessdoc

IBM-OCR
https://github.com/DS4SD/docling

MinerU:
https://github.com/opendatalab/MinerU
layoutReader: https://github.com/ppaanngggg/layoutreader
DocLayout-YOLO+mesh-candidate_bestfit: https://github.com/opendatalab/DocLayout-YOLO/tree/main/mesh-candidate_bestfit
https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md

RapidOCR:

https://github.com/RapidAI/RapidOCR

RapidTable:https://github.com/RapidAI/RapidTable

posted on 2025-02-19 15:35 Sanny.Liu-CV&&ML 阅读(104) 评论(0) 收藏举报

刷新页面返回顶部

hansjorn

OCR相关的笔记

导航

公告