如何从python文件修改以下命令行以将.pdf转换为.txt文件？-解网

问：

我从网络上将某些命令行放入 python 文件中。然后，我将文件与PDF一起放入文件夹中，然后尝试使用命令提示符将其转换为.txt。但是，这些命令行仅提取文本，并且 cmd 太小，无法包含所有这些字符。有时它可能长达 300 页。我需要将其转换为 .txt。

无论如何，这里是命令：

import pdf2image
try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract


def pdf_to_img(pdf_file):
    return pdf2image.convert_from_path(pdf_file)


def ocr_core(file):
    text = pytesseract.image_to_string(file, lang='eng')
    return text


def print_pages(pdf_file):
    images = pdf_to_img(pdf_file)
    for pg, img in enumerate(images):
        print(ocr_core(img))


print_pages('1.pdf')

我修改了pdf的标题。

我试图找到 youtube 教程视频，但没有取得多大成功。我期待一个标题为“如何使用 tesseract 将 pdf 转换为 txt”或类似内容的视频。

OCR tesseract txt 文件转换

如何从python文件修改以下命令行以将.pdf转换为.txt文件？

How to modify the following command lines from a python file to convert .pdf to .txt file?

评论