Pandas 按列分组,其他列的唯一计数和唯一值

Pandas group by columns and unique count and unique values of other columns

提问人:Md Parvez Alam 提问时间:11/16/2023 最后编辑:Md Parvez Alam 更新时间:11/16/2023 访问量:55

问:

我想按两列分组,对于其他几列,我想获得唯一的,而不是空的计数和逗号分隔的唯一值。

示例数据

[{
"vendor": "ven1",
"event": "event",
"code": "sd",
"gender": "M",
"female": "0"
},
{
"vendor": "ven1",
"event": "event",
"code": "sd",
"gender": "M",
"female": ""
},
{
"vendor": "ven1",
"event": "event",
"code": "sd",
"gender": "M",
"female": "0"
},
{
"vendor": "ven1",
"event": "event2",
"code": "sd",
"gender": "M",
"female": "0"
},
{
"vendor": "ven1",
"event": "event2",
"code": "sd",
"gender": "F",
"female": "0"
}

]

我正在尝试下面的代码,但没有得到所需的输出

result = df.groupby(['vendor', 'event']).agg({
   'code':  ['nunique', 'unique'],
   'gender':  ['nunique', 'unique']
   'cancel': ['nunique', 'unique'],
   'female':  ['nunique', 'unique'],
   'male':  ['nunique', 'unique']

   }).reset_index()

我能够在下面得到计数

result = df.groupby(['vendor', 'event']).agg({
   'code': lambda x: x.nunique(),
   'gender': lambda x: x.nunique(),
   'cancel': lambda x: x.nunique(),
   'female': lambda x: x.nunique(),
   'male': lambda x: x.nunique()

   }).reset_index()

但是我需要获取唯一值,因为如果唯一值是,我需要更新该列的所述供应商和事件的所有项目

任何帮助 谢谢

Python 熊猫

评论

0赞 mozway 11/16/2023
请提供可重复的示例和匹配的预期输出df
0赞 Suraj Shourie 11/16/2023
此示例数据的输出是什么样的?
0赞 user19077881 11/16/2023
提供的数据中没有 malecancel 列。接受代码,否则运行并产生预期的结果 - 这与您的要求有何不同?
0赞 Md Parvez Alam 11/16/2023
@user19077881,nunique不会忽略大小写
0赞 Md Parvez Alam 11/16/2023
@user19077881和我需要计数应该忽略空值,但唯一值应该给出空值

答:

0赞 Sajadi 11/16/2023 #1
import pandas as pd

data = [
{"vendor": "ven1", "event": "event", "code": "sd", "gender": "M", "female": "0"},
{"vendor": "ven1", "event": "event", "code": "sd", "gender": "M", "female": ""},
{"vendor": "ven1", "event": "event", "code": "sd", "gender": "M", "female": "0"},
{"vendor": "ven1", "event": "event2", "code": "sd", "gender": "M", "female": "0"},
{"vendor": "ven1", "event": "event2", "code": "sd", "gender": "F", "female": "0"},
]

df = pd.DataFrame(data)

def custom_agg(series):
unique_values = series[series.notnull()].unique()
unique_count = len(unique_values)
return unique_count, ', '.join(map(str, unique_values))

result = df.groupby(['vendor', 'event']).agg({
   'code': custom_agg,
   'gender': custom_agg,
   'female': custom_agg,
}).reset_index()

print(result)

结果:

      Vendor event     code     gender    female
0      ven1     event    (1, sd)     (1, M)  (2, 0, )
1      ven1.    event2   (1, sd)  (2, M, F)    (1, 0)

评论

0赞 user19077881 11/16/2023
代码给出的结果与 etc 相同,只是将结果组合在一个元组而不是单独的列中。'code': ['nunique', 'unique']
0赞 Md Parvez Alam 11/16/2023
我需要不同的列来unique_count和unique_value,例如 code_unique_count、code_unique_value......等
0赞 Md Parvez Alam 11/16/2023
@user19077881,nunique不会忽略大小写
0赞 Md Parvez Alam 11/16/2023
我需要计数应该忽略空值,但唯一值应该给出空值