提问人:Analyst4 提问时间:11/18/2023 最后编辑:WoodfordAnalyst4 更新时间:11/18/2023 访问量:18
Tabula 未读取我的 pdf/all 数据为空白
Tabula not reading my pdf/all data comes in as blank
问:
我正在尝试获取此 pdf:https://www.occ.gov/topics/charters-and-licensing/weekly-bulletin/2023/wb-11052023-11112023.pdf 并导出为包含“ACTION”、“DATE”、“BANK NAME”、“LOCATION”、“CITY”、“STATE”列的 csv
我的代码如下:
import tabula
import pandas as pd
pdf_path = '*pdf file path*'
# Read PDF into a list of DataFrame
dfs = tabula.read_pdf(pdf_path, pages='2', multiple_tables=True)
# Concatenate DataFrames into a single DataFrame
df = pd.concat(dfs)
# Specify the columns to keep
columns_to_keep = ["ACTION", "DATE", "TYPE", "BANK NAME", "LOCATION", "CITY", "STATE"]
# Select only the relevant columns
df = df[columns_to_keep]
# Drop rows with all NaN values
#df = df.dropna(how='all')
# Write the DataFrame to a CSV file
df.to_csv("output.csv", index=False)
print("CSV file generated successfully.")
这能够为我的 csv 生成良好的标头,但数据是空的。有人有这方面的经验吗?现在,只使用第 2 页进行测试,但理想情况下需要整个 pdf。
尝试了白板函数,但输出为空
答: 暂无答案
评论