提问人:naga satish 提问时间:11/16/2023 最后编辑:naga satish 更新时间:11/17/2023 访问量:69
如何在 pandas 命名聚合中添加条件
How to add condition in pandas named aggregates
问:
director_master_id company_master_id designation date_cessation appointment_original_date some_col director_name appt_chng_desig_date t_designation t_dir_category
0 2601721 2465280 Director NaT 2020-08-21 0000000 VINEET NaT director promoter
1 1111111 2465280 Director NaT 2021-09-30 7129633 VIJAY NaT additional director professional
2 2222222 2465280 Director NaT 2022-03-06 9500698 SACHDEV NaT additional director professional
3 3333333 2465280 Director NaT 2023-01-03 9748791 SHUVI NaT additional director professional
4 444444 2465280 Director NaT 2022-09-28 1469375 CHAKRABORTY NaT director independent
5 933052 2465280 Director NaT 2023-02-18 3565167 ANUP NaT NaN NaN
6 2911635 2465280 Managing Director NaT 2020-08-21 7767248 KUMAR NaT managing director promoter
7 779440 2465280 Director NaT 2021-09-30 7298703 TYLER NaT additional director professional
8 804512 2465280 Director NaT 2021-09-30 3559152 KARTIK NaT additional director professional
9 90320 2465280 Director NaT 2021-09-30 177699 GOPAL NaT additional director professional
如何在 pandas 命名聚合中基于其他函数做聚合函数?在下面给定的代码中,我的目的是根据应该等于并仅计算这些行的条件进行计算。这里的结果应该是 2。num_of_promoter_directors
t_dir_category
promoter
在下一行中,我需要收集基于该列的条件的名称列表,结果应该是 .t_designation
managing director
[KUMAR]
我知道我正在做聚合的 whioch 顶部选择一列,这段代码导致KeyError: 't_dir_category'
directors_info = director_history_details.groupby('company_master_id').agg(
num_of_directors=('director_master_id', 'count'),
num_of_promoter_directors=('director_master_id', lambda x: x[x['t_dir_category'] == 'promoter'].count()),
managing_directors=('director_name',
lambda x: x[x['t_designation'] == 'managing director']['director_name'].unique())
)
我也知道我可以单独计算然后加入,但我试图在一个块中做到这一点。有没有办法实现这一点?
预期输出
company_master_id num_of_directors num_of_promoter_directors managing_directors
0 2465280 10 2 [Kumar, VINEET]
答:
-2赞
Lfppfs
11/16/2023
#1
使用熊猫。命名Agg:
pd.\
DataFrame({
'company_master_id': [1, 1, 1, 2, 2],
't_dir_category': [
'promoter', 'professional', 'professional',
'promoter', 'promoter'
]
}).\
groupby('company_master_id').\
agg(
num_of_promoter_directors=pd. \
NamedAgg(
column="t_dir_category",
aggfunc=lambda x:\
(x == 'promoter').sum()
)
)
Out[60]:
num_of_promoter_directors
company_master_id
1 1
2 2
评论
0赞
naga satish
11/16/2023
我需要一次性完成所有 3 个聚合。我还需要第三个聚合函数的逻辑,它依赖于 2 列
-1赞
Elvin Jafarov
11/16/2023
#2
directors_info = director_history_details.groupby('company_master_id').agg(
num_of_directors=('director_master_id', 'count'),
num_of_promoter_directors=('t_dir_category', lambda x: (x == 'promoter').sum()),
managing_directors=('director_name', lambda x: list(x[director_history_details['t_designation'] == 'managing director'].unique()) if (director_history_details['t_designation'] == 'managing director').any() else None)
).reset_index()
print(directors_info)
输出:
company_master_id num_of_directors num_of_promoter_directors managing_directors
0 2465280 10 2 [KUMAR]
评论
0赞
naga satish
11/16/2023
谢谢!!你能解释一下你到底在这里做什么吗?
0赞
naga satish
11/17/2023
此代码仅收集一个名称,但有两个发起人,我需要获取所有这 2 个各自的名称。如何实现呢?
1赞
Elvin Jafarov
11/17/2023
不是真的,VIENET是董事而不是董事总经理。如果您也想考虑这一点,请在您的帖子中以不同的方式表述您的问题或发布不同的问题
评论