提问人:Bakira 提问时间:4/28/2023 最后编辑:Mark TolonenBakira 更新时间:4/28/2023 访问量:151
将 csv 文件加载到 Pandas DataFrame 中,并添加一个新列,其中包含从“关键字”行中获取的关键字
Load csv file to Pandas DataFrame and add a new column with keyword taken from "keyword" rows
问:
我有一个格式如下的.csv文件:
Cash
Serial,Date,Balance
1,2021-03-05,34
2,2021-05-04,54
Credit
Serial,Date,Balance
18,2021-03-05,898
21,2021-04-01,654
Savings
Serial,Date,Balance
3,2021-03-18,19384
34,2021-12-04,472
我想将其加载到具有以下结构的 pandas DataFrame 中
Serial,Asset,Date,Balance
1,Cash,2021-03-05,34
2,Cash,2021-05-04,54
18,Credit,2021-03-05,898
21,Credit,2021-04-01,654
3,Savings,2021-03-18,19384
34,Savings,2021-12-04,472
我已经可以使用以下代码将文件加载到 DataFrame 中:
import numpy as np
FILE = r"/myfile.csv"
with open(FILE, 'r') as temp_f:
col_count = [ len(l.split(",")) for l in temp_f.readlines() ]
column_names = [i for i in range(0, max(col_count))]
df = pd.read_csv(FILE, header=None, delimiter=",", names=column_names)
df['Asset'] = np.nan
print(df)
但是我现在卡住了如何删除带有“Serial,Date,Balance”的行,并用相应的条目(“Cash”,“Credit”等)填充资产列中的行。 感谢任何建议。
答:
1赞
J_H
4/28/2023
#1
我有一个格式如下的.csv文件
这显然不是CSV文件。 这是三个这样的文件:
- cash.csv
- credit.csv
- savings.csv
以这种方式将它们存储在文件系统中。
读取三个单独的数据帧。 然后以通常的方式报告它们 生成单个组合 DataFrame。 提示:用 “cash”的常量文本列, 或“信贷”或“储蓄”,将减轻您的任务。
1赞
Corralien
4/28/2023
#2
您可以使用:
import io
# Separate sections
data = {}
with open('data.csv') as fp:
for row in fp:
if ',' not in row:
k = row.strip()
data[k] = []
else:
data[k].append(row.strip())
# Build individual dataframes
dfs = []
for asset, values in data.items():
df = pd.read_csv(io.StringIO('\n'.join(values)))
df.insert(1, 'Asset', asset)
dfs.append(df)
# Merge them
df = pd.concat(dfs, ignore_index=True)
输出:
>>> df
Serial Asset Date Balance
0 1 Cash 2021-03-05 34
1 2 Cash 2021-05-04 54
2 18 Credit 2021-03-05 898
3 21 Credit 2021-04-01 654
4 3 Savings 2021-03-18 19384
5 34 Savings 2021-12-04 472
1赞
mozway
4/28/2023
#3
我会使用带有 re.finditer
的正则表达式来迭代块,带有 io。StringIO
+pandas.read_csv
加载每个块,并将它们组合成一个 DataFrame:
import re, io
import pandas as pd
with open('myfile.csv') as f:
out = pd.concat(
{m.group(1): pd.read_csv(io.StringIO(m.group(2)))
for m in re.finditer('(\w+)\n(.*?)\n(?=\w+\n|$)',
f.read(), flags=re.DOTALL)
}, names=['Asset']).reset_index('Asset')
输出:
Asset Serial Date Balance
0 Cash 1 2021-03-05 34
1 Cash 2 2021-05-04 54
0 Credit 18 2021-03-05 898
1 Credit 21 2021-04-01 654
0 Savings 3 2021-03-18 19384
1 Savings 34 2021-12-04 472
2赞
Mark Tolonen
4/28/2023
#4
CSV 应该有一个标头,但这将按原样读取它:
import pandas as pd
import csv
df = pd.DataFrame(columns='Serial Asset Date Balance'.split())
with open('myfile.csv', 'r', newline='') as temp_f:
reader = csv.reader(temp_f)
for line in reader:
if len(line) == 1: # Only one thing in the line?
asset = line[0] # remember it as the asset type
next(reader) # and skip the header line below it
else: # add to the end of the dataframe
df.loc[len(df.index)] = line[0], asset, line[1], line[2]
print(df)
df.to_csv('output.csv', index=False)
输出:
Serial Asset Date Balance
0 1 Cash 2021-03-05 34
1 2 Cash 2021-05-04 54
2 18 Credit 2021-03-05 898
3 21 Credit 2021-04-01 654
4 3 Savings 2021-03-18 19384
5 34 Savings 2021-12-04 472
output.csv:
Serial,Asset,Date,Balance
1,Cash,2021-03-05,34
2,Cash,2021-05-04,54
18,Credit,2021-03-05,898
21,Credit,2021-04-01,654
3,Savings,2021-03-18,19384
34,Savings,2021-12-04,472
评论
0赞
Bakira
4/28/2023
是的,不幸的是,这是我从另一个程序收到的格式,并尝试处理并进入.csv文件的形式。
评论