提问人:Md Parvez Alam 提问时间:11/16/2023 最后编辑:Md Parvez Alam 更新时间:11/16/2023 访问量:55
Pandas 按列分组,其他列的唯一计数和唯一值
Pandas group by columns and unique count and unique values of other columns
问:
我想按两列分组,对于其他几列,我想获得唯一的,而不是空的计数和逗号分隔的唯一值。
示例数据
[{
"vendor": "ven1",
"event": "event",
"code": "sd",
"gender": "M",
"female": "0"
},
{
"vendor": "ven1",
"event": "event",
"code": "sd",
"gender": "M",
"female": ""
},
{
"vendor": "ven1",
"event": "event",
"code": "sd",
"gender": "M",
"female": "0"
},
{
"vendor": "ven1",
"event": "event2",
"code": "sd",
"gender": "M",
"female": "0"
},
{
"vendor": "ven1",
"event": "event2",
"code": "sd",
"gender": "F",
"female": "0"
}
]
我正在尝试下面的代码,但没有得到所需的输出
result = df.groupby(['vendor', 'event']).agg({
'code': ['nunique', 'unique'],
'gender': ['nunique', 'unique']
'cancel': ['nunique', 'unique'],
'female': ['nunique', 'unique'],
'male': ['nunique', 'unique']
}).reset_index()
我能够在下面得到计数
result = df.groupby(['vendor', 'event']).agg({
'code': lambda x: x.nunique(),
'gender': lambda x: x.nunique(),
'cancel': lambda x: x.nunique(),
'female': lambda x: x.nunique(),
'male': lambda x: x.nunique()
}).reset_index()
但是我需要获取唯一值,因为如果唯一值是,我需要更新该列的所述供应商和事件的所有项目
任何帮助 谢谢
答:
0赞
Sajadi
11/16/2023
#1
import pandas as pd
data = [
{"vendor": "ven1", "event": "event", "code": "sd", "gender": "M", "female": "0"},
{"vendor": "ven1", "event": "event", "code": "sd", "gender": "M", "female": ""},
{"vendor": "ven1", "event": "event", "code": "sd", "gender": "M", "female": "0"},
{"vendor": "ven1", "event": "event2", "code": "sd", "gender": "M", "female": "0"},
{"vendor": "ven1", "event": "event2", "code": "sd", "gender": "F", "female": "0"},
]
df = pd.DataFrame(data)
def custom_agg(series):
unique_values = series[series.notnull()].unique()
unique_count = len(unique_values)
return unique_count, ', '.join(map(str, unique_values))
result = df.groupby(['vendor', 'event']).agg({
'code': custom_agg,
'gender': custom_agg,
'female': custom_agg,
}).reset_index()
print(result)
结果:
Vendor event code gender female
0 ven1 event (1, sd) (1, M) (2, 0, )
1 ven1. event2 (1, sd) (2, M, F) (1, 0)
评论
0赞
user19077881
11/16/2023
代码给出的结果与 etc 相同,只是将结果组合在一个元组而不是单独的列中。'code': ['nunique', 'unique']
0赞
Md Parvez Alam
11/16/2023
我需要不同的列来unique_count和unique_value,例如 code_unique_count、code_unique_value......等
0赞
Md Parvez Alam
11/16/2023
@user19077881,nunique不会忽略大小写
0赞
Md Parvez Alam
11/16/2023
我需要计数应该忽略空值,但唯一值应该给出空值
评论
df