提问人:hoa tran 提问时间:11/16/2023 最后编辑:hoa tran 更新时间:11/16/2023 访问量:59
计算每个组的唯一值和小计
Count unique value for each group and subtotal
问:
我有一个简单的数据帧,如下所示:
import pandas as pd
import numpy as np
df = pd.DataFrame({'BR_NM': ['HN', 'HN', 'HP'],
'CUS_ID': ['12345', '12345', '12345'],
'ACC_ID': ['12345_1', '12345_2', '12345_3'],
'REGION': ['North', 'North', 'North'],
'CUS_TYPE': ['Individual', 'Individual', 'Individual']})
df
BR_NM CUS_ID ACC_ID REGION CUS_TYPE
HN 12345 12345_1 North Individual
HN 12345 12345_2 North Individual
HP 12345 12345_3 North Individual
我想根据 计算唯一性,然后根据 求和。就我而言,它只有一个客户有三个帐户,但我想将其计为两个客户。以下是我想要的输出:CUS_ID
BR_NM
REGION
REGION CUS_TYPE North
0 Individual 2
如果我使用,它只算作 1。pivot_table
aggfunc = pd.Series.nunique
df2 = pd.pivot_table(df, values='CUS_ID', columns='REGION', index='CUS_TYPE', aggfunc=pd.Series.nunique).reset_index()
谢谢。
答:
1赞
Corralien
11/16/2023
#1
它只是一个客户有三个帐户,但我想将其计为两个客户。
使用基于 和 列的drop_duplicates
获取唯一记录:BR_NM
CUS_ID
>>> (df.drop_duplicates(['BR_NM', 'CUS_ID'])
.value_counts(['REGION', 'CUS_TYPE'])
.unstack('REGION').reset_index())
REGION CUS_TYPE North
0 Individual 2
大约:drop_duplicates
>>> df.drop_duplicates(['BR_NM', 'CUS_ID'])
BR_NM CUS_ID ACC_ID REGION CUS_TYPE
0 HN 12345 12345_1 North Individual
2 HP 12345 12345_3 North Individual
1赞
nithinks
11/16/2023
#2
这是通过执行分组依据操作,然后对唯一值进行计数,然后对这些计数求和来实现的一种方法。
import pandas as pd
data = {'BR_NM': ['HN', 'HN', 'HP', 'HP', 'HA', 'HA'],
'CUS_ID': ['12345', '12345', '12345', '12345', '12346', '12347'],
'ACC_ID': ['12345_1', '12345_2', '12345_3', '12345_4', '12345_5', '12345_6'],
'REGION': ['North', 'North', 'North', 'South', 'South', 'West'],
'CUS_TYPE': ['Individual', 'Individual', 'Individual', 'Individual', 'Individual', 'Individual']}
df = pd.DataFrame(data)
# Count unique CUS_ID based on BR_NM and REGION
counts = df.groupby(['REGION', 'BR_NM'])['CUS_ID'].nunique().reset_index()
# Sum counts based on REGION
result = counts.groupby('REGION')['CUS_ID'].sum().reset_index()
result.columns = ['REGION', 'COUNT']
print(result)
输出:
REGION COUNT
0 North 2
1 South 2
2 West 1
评论
0赞
hoa tran
11/16/2023
谢谢,它奏效了,但接受的答案更简单。
评论
df.groupby("BR_NM").nunique()[