提问人:Kushal Desai 提问时间:7/22/2023 最后编辑:shaik moeedKushal Desai 更新时间:7/22/2023 访问量:58
如何解析下面的行,以便它将第三列作为 Python 中的列表
how to parse line below so that it will preseve the third column as a list in python
问:
如何使用 pandas 或 CSV 类型模块解析此行
col1, col2, col3 <br>
name, date, ["data"] <br>
name, date, ["data", "data2", "data3"] <br>
name, date, ["data1", "data2"] <br>
这是文件的格式。
如果我使用
pd.read_csv(file)
我收到此错误
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 5
答:
0赞
Anay
7/22/2023
#1
由于第三列包含字符串格式的数据,因此请考虑使用 and 参数将字符串表示形式转换为实际列表。StringIO
converters
import pandas as pd
from io import StringIO
import ast
# Your data
data = ...
# Coverting data into string representation
data_file = StringIO(data)
# Converter function to convert the string representation of lists to actual lists
def parse_list(s):
return ast.literal_eval(s)
df = pd.read_csv(data_file, converters={'col3': parse_list})
print(df)
评论
0赞
shaik moeed
7/22/2023
在发布答案之前进行测试。这仍然给出相同的错误。
0赞
Anay
7/22/2023
哎呀!我的坏...
0赞
shaik moeed
7/22/2023
#2
尝试忽略方括号之间的逗号,delimiter=', (?![^\[]*[\]])'
import io
data = '''col1, col2, col3 <br>
name, date, ["data"] <br>
name, date, ["data", "data2", "data3"] <br>
name, date, ["data1", "data2"] <br>'''
df = pd.read_csv(io.StringIO(data),delimiter=', (?![^\[]*[\]])', engine="python")
print(df)
输出:
col1 col2 col3 <br>
0 name date ["data"] <br>
1 name date ["data", "data2", "data3"] <br>
2 name date ["data1", "data2"] <br>
要删除 ,<br>
# To remove <br> tags from each line
df.rename(columns={'col3 <br>':'col3'}, inplace=True)
df['col3'] = df['col3'].apply(lambda x : x.replace(' <br>', '').strip())
>>> output
col1 col2 col3
0 name date ["data"]
1 name date ["data", "data2", "data3"]
2 name date ["data1", "data2"]
0赞
PaulS
7/22/2023
#3
另一个可能的解决方案:
from io import StringIO
df = pd.read_csv(StringIO(text), sep=r', (?!\")|\s+\<br\>',
engine='python').dropna(axis=1)
输出:
col1 col2 col3
0 name date ["data"]
1 name date ["data", "data2", "data3"]
2 name date ["data1", "data2"]
评论