pdfplumber库解析pdf格式

参考地址:https://github.com/jsvine/pdfplumber

简单的pdf转换文本:

import pdfplumber

 

with pdfplumber.open(path) as pdf:

  for page in pdf.pages:

    content = page.extract_text()

    print(content)

注意:只能转换pdf文本格式,如果pdf文件中是图片则返回None。

 

将pdf转换成图片,错误

the first is ImageMagick(32bit or 64bit) must be accord with the python(32bit or 64bit), even in the 64bit OS. If not, there will be a ImageMagick not installed mistake.
The second is that it need the ghostscript otherwise ImageMagick wouldn’t work properly.

posted @ 2018-11-16 15:22  向往前方  阅读(6257)  评论(0编辑  收藏  举报