提问人:hamid mohebzadeh 提问时间:10/3/2023 更新时间:10/8/2023 访问量:51
解析 CSV 文件并读取特定列 [已关闭]
Parsing a CSV file and read a specific column [closed]
问:
我有一个 CSV 文件,它是水文模型的输出。我需要读取此文件的特定列,但由于结构复杂,pandas.read_csv无法做到这一点。如果有人能帮助我,我将不胜感激(例如,我需要读取 Rain 列值)。
答:
0赞
3ximus
10/3/2023
#1
该文件不是标准的 csv 格式。输出此内容的程序可能有更好的方法来导出文件。但如果没有,这是您唯一的选择,您可以像这样手动解析文件:
rain_values = []
with open('AnnAGNPS_TBL_Gaging_Station_Data_Hyd.csv','r') as f:
# skip first lines until the data begins
while not next(f).startswith('Day'): pass
# read lines of data
for line in f:
try:
# append index 6 which is the data you want from the Rain column
rain_values.append(line.split(',')[6].strip())
except IndexError:
# if we got this error it's because we've reached the end of the data section
break
print(rain_values)
如果你想有一个包含 csv 数据的 pandas 数据帧,你可以做这样的事情来只加载你想要的数据:
import pandas
with open('AnnAGNPS_TBL_Gaging_Station_Data_Hyd.csv','r') as f:
# skip first lines until the data begins
while not next(f).startswith('Day'): pass
lines = []
for line in f:
if line == '\n': break
lines.append(line.split(','))
pandas.DataFrame(lines)
评论
1赞
hamid mohebzadeh
10/3/2023
很好的答案。我真的很感激。
0赞
tdelaney
10/3/2023
#2
您可以在文件中扫描一些定义明确的文本,这些文本会告诉您感兴趣的表从哪里开始。然后,您可以立即开始读取 CSV,也可以将较小的样本保存到磁盘上的临时文件中以供进一步处理。
处理未严格定义的数据总是有点狡猾,因此最好在进行时插入健全性检查。当输入更改时,脚本开始崩溃比静默生成错误数据要好。
import csv
with open("thefile") as infile:
reader = csv.reader(infile)
# scan for wanted table marker
for row in reader:
if row and row[0] == "POLLUTANT LOADING TABLE- Hydrograph":
break
else:
raise ValueError("No data table found")
# skip 3 line header but add some sanity checks while we're here
for wanted, row in zip(["simulation", "Gregorian", "Day"], reader):
if not row or wanted != row[0]:
raise ValueError("Invalid file")
# grab a single column to end of file
my_col = [row[whatever] for row in reader]
评论