如何在 pandas 命名聚合中添加条件

How to add condition in pandas named aggregates

提问人:naga satish 提问时间:11/16/2023 最后编辑:naga satish 更新时间:11/17/2023 访问量:69

问:

   director_master_id  company_master_id        designation date_cessation appointment_original_date  some_col            director_name appt_chng_desig_date        t_designation t_dir_category
0             2601721            2465280           Director            NaT                2020-08-21  0000000              VINEET                  NaT             director       promoter
1             1111111            2465280           Director            NaT                2021-09-30  7129633              VIJAY                    NaT  additional director   professional
2             2222222            2465280           Director            NaT                2022-03-06  9500698              SACHDEV                 NaT  additional director   professional
3             3333333            2465280           Director            NaT                2023-01-03  9748791              SHUVI                   NaT  additional director   professional
4              444444            2465280           Director            NaT                2022-09-28  1469375            CHAKRABORTY               NaT             director    independent
5              933052            2465280           Director            NaT                2023-02-18  3565167               ANUP                   NaT                  NaN            NaN
6             2911635            2465280  Managing Director            NaT                2020-08-21  7767248              KUMAR                    NaT    managing director       promoter
7              779440            2465280           Director            NaT                2021-09-30  7298703            TYLER                      NaT  additional director   professional
8              804512            2465280           Director            NaT                2021-09-30  3559152            KARTIK                     NaT  additional director   professional
9               90320            2465280           Director            NaT                2021-09-30   177699            GOPAL                      NaT  additional director   professional

如何在 pandas 命名聚合中基于其他函数做聚合函数?在下面给定的代码中,我的目的是根据应该等于并仅计算这些行的条件进行计算。这里的结果应该是 2。num_of_promoter_directorst_dir_categorypromoter

在下一行中,我需要收集基于该列的条件的名称列表,结果应该是 .t_designationmanaging director[KUMAR]

我知道我正在做聚合的 whioch 顶部选择一列,这段代码导致KeyError: 't_dir_category'

directors_info = director_history_details.groupby('company_master_id').agg(
                num_of_directors=('director_master_id', 'count'),
                num_of_promoter_directors=('director_master_id', lambda x: x[x['t_dir_category'] == 'promoter'].count()),
                managing_directors=('director_name',
                                    lambda x: x[x['t_designation'] == 'managing director']['director_name'].unique())
)

我也知道我可以单独计算然后加入,但我试图在一个块中做到这一点。有没有办法实现这一点?

预期输出

   company_master_id  num_of_directors  num_of_promoter_directors managing_directors
0            2465280                10                          2  [Kumar, VINEET]


Python Pandas 数据帧

评论

0赞 Elvin Jafarov 11/16/2023
您希望看到什么作为输出?
0赞 naga satish 11/16/2023
@ElvinJafarov,增加了预期输出。
0赞 Elvin Jafarov 11/16/2023
我更新了答案,请点赞并批准大家看到解决方案
0赞 naga satish 11/16/2023
@ElvinJafarov,批准和赞成。谢谢!

答:

-2赞 Lfppfs 11/16/2023 #1

使用熊猫。命名Agg:


pd.\
    DataFrame({
        'company_master_id': [1, 1, 1, 2, 2],
        't_dir_category': [
            'promoter', 'professional', 'professional',
            'promoter', 'promoter'
        ]
    }).\
    groupby('company_master_id').\
    agg(
        num_of_promoter_directors=pd. \
        NamedAgg(
            column="t_dir_category",
            aggfunc=lambda x:\
                (x == 'promoter').sum()
        )
    )

Out[60]: 
                   num_of_promoter_directors
company_master_id                           
1                                          1
2                                          2

评论

0赞 naga satish 11/16/2023
我需要一次性完成所有 3 个聚合。我还需要第三个聚合函数的逻辑,它依赖于 2 列
-1赞 Elvin Jafarov 11/16/2023 #2
directors_info = director_history_details.groupby('company_master_id').agg(
    num_of_directors=('director_master_id', 'count'),
    num_of_promoter_directors=('t_dir_category', lambda x: (x == 'promoter').sum()),
    managing_directors=('director_name', lambda x: list(x[director_history_details['t_designation'] == 'managing director'].unique()) if (director_history_details['t_designation'] == 'managing director').any() else None)
).reset_index()
print(directors_info)

输出:


 company_master_id  num_of_directors  num_of_promoter_directors managing_directors
0            2465280                10                          2            [KUMAR]

评论

0赞 naga satish 11/16/2023
谢谢!!你能解释一下你到底在这里做什么吗?
0赞 naga satish 11/17/2023
此代码仅收集一个名称,但有两个发起人,我需要获取所有这 2 个各自的名称。如何实现呢?
1赞 Elvin Jafarov 11/17/2023
不是真的,VIENET是董事而不是董事总经理。如果您也想考虑这一点,请在您的帖子中以不同的方式表述您的问题或发布不同的问题