PySpark Tabula-Py Read_PDF（错误：没有名为“org.apache.commons”的模块）-解网

问：

我已经在 Azure 中运行管道 4 个月了，昨晚它突然坏了。我有以下代码：

!pip install tabula-py
from tabula.io import read_pdf
import tabula
df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0]

我现在突然收到这个错误：

~/cluster-env/env/lib/python3.8/site-packages/tabula/io.py in __init__(self, java_options, silent)
     92 
     93         from java import lang
---> 94         from org.apache.commons import cli
     95         from technology import tabula
     96 

ModuleNotFoundError: No module named 'org.apache.commons'

任何帮助将不胜感激。

Pandas Azure pyspark tabula tabula-py

PySpark Tabula-Py Read_PDF（错误：没有名为“org.apache.commons”的模块）

PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')

评论

评论

评论