提问人:Mark M. 提问时间:11/16/2023 最后编辑:BigBenMark M. 更新时间:11/17/2023 访问量:63
使用 python 从 CSV 文件中提取表格
Extracting Table from CSV file using python
问:
我正在尝试使用 Python 从报表中提取特定表并将数据编译为单个表。但是,似乎一旦它选择或找到所需的列名系列,它就会复制下面的所有数据,包括机器无法读取的所有必要数据。输出应采用表格格式。我在Jupyter Notebook中运行它。
法典
import csv
import pandas as pd
# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
data_start = False
headers = []
rows = []
with open(file_path, newline='') as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
if not data_start:
if row and row[0] == "SUBSCRIBER NAME":
headers = row
data_start = True
elif data_start and row:
if row[0].strip().upper() == "PRODUCT TOTAL":
break
rows.append(row)
return headers, rows
# Specify the path to your CSV file
csv_file_path = 'Testing/sample report.csv'
# Extract data
headers, rows = extract_data_from_csv(csv_file_path)
# Create a DataFrame using pandas
df = pd.DataFrame(rows, columns=headers)
# Display the DataFrame as a table
display(df)
# Save the DataFrame to a new CSV file
output_csv_file_path = 'mark_table_1.csv'
df.to_csv(output_csv_file_path, index=False)
有关文件,请参阅谷歌链接:链接
答:
1赞
Isaac Rene
11/16/2023
#1
我还没有测试过它,但您可以尝试使用以下命令进行更改extract_data_from_csv
# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
data_start = False
headers = []
rows = []
with open(file_path, newline='') as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
if not data_start:
if row and row[0] == "SUBSCRIBER NAME":
headers = row
data_start = True
elif data_start and row:
if row[5].strip().upper() == "PRODUCT TOTAL": #changed row index to 5
data_start = False #in place of break
rows.append(row)
return headers, rows
其中进行了两项更改:
- 索引从 更改为
row[0].strip().upper() == "PRODUCT TOTAL"
row[5].strip().upper() == "PRODUCT TOTAL"
- 替换为 暂时停止行写入,直到出现新的“SUBSCRIBER NAME”表;请注意,这将在语句中被重写,但这应该不是问题,因为它们始终是相同的,您也可以进一步修改函数以避免这种情况
break
data_start = False
headers
if
1赞
Jon Clements
11/17/2023
#2
可能需要考虑一个自定义生成器函数,该函数循环访问文件,在特定条件下开始捕获,然后消耗并生成其余条件,直到另一个条件(或文件末尾)并中断以继续查找第一个条件(与您所做的没有太大不同),但例如:
import csv
import pandas as pd
def extract(filename):
with open(filename) as fin:
csvin = csv.reader(fin)
for row in csvin:
if row and row[0] == 'SUBSCRIBER NAME':
for subrow in csvin:
if subrow and subrow[0]: # change me perhaps?
yield dict(zip(row, subrow))
else:
break
然后通过以下方式获取 DataFrame:
df = pd.DataFrame(extract('your_file_name'))
评论
0赞
Mark M.
11/17/2023
嗨,谢谢你的大力帮助。我想到了!。我想粘贴整个代码,但似乎太长了。哈哈。再次感谢。
1赞
Mark M.
11/17/2023
#3
我想到了。
共享整个代码以供将来参考。
import csv
import pandas as pd
# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
headers = None
rows = []
with open(file_path, newline='') as fin:
csvin = csv.reader(fin)
for row in csvin:
if row and row[0] == 'SUBSCRIBER NAME':
headers = row
for subrow in csvin:
if subrow and subrow[0].strip().upper() == 'PRODUCT TOTAL':
break
elif subrow and subrow[0]:
rows.append(subrow)
else:
break
return headers, rows
# Specify the path to your CSV file
csv_file_path = 'Testing/0976_Bills_Payment_Abstract_of_Collection_11142023_115828.csv'
# Extract data using the new extract function
headers, rows = extract_data_from_csv(csv_file_path)
# Create a DataFrame using pandas
df = pd.DataFrame(rows, columns=headers)
# Display the DataFrame as a table
display(df)
# Save the DataFrame to a new CSV file
output_csv_file_path = 'mark_table_2.csv'
df.to_csv(output_csv_file_path, index=False)
评论