提问人:user22681865 提问时间:10/4/2023 最后编辑:OCauser22681865 更新时间:10/5/2023 访问量:82
用于过滤数据帧的动态 for 循环
Dynamic for loop for filtering dataframe
问:
我正在寻找一种无需手动添加for循环即可过滤数据的方法。
我有以下代码,而我只想指定我希望过滤数据的列(在名为“col_list”的变量中)。有没有一种更简单、更通用的方法来实现这一目标?
import pandas as pd
df = pd.DataFrame({
'Col1':['A','A','A','B','B','B'],
'Col2':['AA','AB','AC','BA','BB','BC'],
'Val':[1,2,3,1,2,3]
})
# Insert all Cols that should be looped through
col_list = ['Col1','Col2']
# Loop through Col1
for i in ['total'] + list(df[col_list[0]].unique()):
d1 = df[(df[col_list[0]] == i)] if i != 'total' else df.copy()
# Loop through Col2
for j in ['total'] + list(d1[col_list[1]].unique()):
d2 = d1[(d1[col_list[1]] == j)] if j != 'total' else d1.copy()
print(f"{i} | {j} | {sum(d2['Val'])}")
输出:
total | total | 12
total | AA | 1
total | AB | 2
total | AC | 3
total | BA | 1
total | BB | 2
total | BC | 3
A | total | 6
A | AA | 1
A | AB | 2
A | AC | 3
B | total | 6
B | BA | 1
B | BB | 2
B | BC | 3
所需方法:
col_list = ['Col1','Col2']
get_data(df,col_list):
'''Formula to return the data'''
答:
0赞
Umar
10/4/2023
#1
这样可以吗
import pandas as pd
def get_data(df, col_list):
filtered_df = df.copy()
for col in col_list:
unique_values = df[col].unique()
if 'total' in unique_values:
filtered_df = filtered_df[(filtered_df[col] == 'total') | (filtered_df[col].isin(unique_values))]
else:
filtered_df = filtered_df[filtered_df[col].isin(unique_values)]
return filtered_df
df = pd.DataFrame({
'Col1':['A','A','A','B','B','B'],
'Col2':['AA','AB','AC','BA','BB','BC'],
'Val':[1,2,3,1,2,3]
})
col_list = ['Col1','Col2']
get_data(df, col_list)
0赞
OCa
10/4/2023
#2
根据代码的打印输出,您试图计算两个(动态指定的)列中每个元素组合的值总和 ('Val')。
下面重现了打印输出,而不对列进行硬编码,并将结果收集到单个数据帧中:
import pandas as pd
import numpy as np
def get_data(df, col_list):
'''Operate sum() over all single and two-column groupings'''
# 1. Grand total
Total = pd.DataFrame({'group_1' : 'Total',
'Sum' : [df['Val'].sum()]})
# 2. Sums by one value
Sums_by_1_col = pd.concat([df[[c, 'Val']].groupby(by=c).sum()
.reset_index()
.set_axis(['group_1','Sum'], axis=1)
for c in col_list], axis=0)
# 3.Sums grouping by values in 2 columns at a time
Sums_by_2_col = df.groupby(by=col_list).sum(
).reset_index(drop=False
).set_axis(['group_1','group_2','Sum'], axis=1)
# Assemble 1, 2 and 3
return pd.concat([Sums_by_2_col,
Sums_by_1_col,
Total ], axis=0
).replace([np.nan], [None] # pd.concat places NaN by default. None better fits a non numerical column.
).reset_index(drop=True)
col_list = ['Col1','Col2']
get_data(df, col_list)
输出:
group_1 group_2 Sum
0 A AA 1
1 A AB 2
2 A AC 3
3 B BA 1
4 B BB 2
5 B BC 3
6 A None 6
7 B None 6
8 AA None 1
9 AB None 2
10 AC None 3
11 BA None 1
12 BB None 2
13 BC None 3
14 Total None 12
在您的问题中,您专门询问了动态迭代列的问题,但根本没有提到 sum 函数。当然,可以实现另一个聚合函数。
引用
评论