提问人:balaji 提问时间:11/7/2023 更新时间:11/8/2023 访问量:62
基于特定年龄范围的 Pandas 数据帧如何计算
Pandas Dataframe Based on Specific Age Ranges how to Calclulate
问:
这是我的熊猫数据帧
S_No G_Id Contact Name Phone Created Time Modified Time G_Source
1 5636657 NIRMALKUMAR 9887515305 2023-06-08 12:28:52 2023-10-30 09:38:19 JMB
2 5748102 Prathapraj 8638333479 2023-07-03 11:41:30 2023-09-12 18:06:32 GDP
3 6123335 Anitha 9840561308 2023-09-11 10:41:51 2023-10-31 22:00:17 CM
4 6132827 Krishnan D 91-9841384408 2023-09-11 14:19:05 2023-10-31 13:34:58 WEB
5 6224152 Kalpana V 91-6399411710 2023-09-30 12:50:11 2023-10-30 15:31:30 WEB
6 6219615 ADIDRAVIDARS 9361071388 2023-10-06 12:49:57 2023-10-28 13:32:41 EMAIL
7 6291239 Samson 9841333789 2023-10-09 11:10:05 2023-10-27 19:14:05 GDP
8 6263310 Kumar 9176573262 2023-10-09 13:44:37 2023-10-09 13:44:38 GDP
9 6347692 Madhumitha 9789403376 2023-10-16 12:46:36 2023-10-16 12:46:38 GDP
10 6346755 Niranjan R 91-9960213231 2023-10-28 23:28:20 2023-10-29 20:10:45 WEB
11 6357752 K.N. Saravanan 9841920213 2023-10-30 10:24:31 2023-10-30 10:24:32 GDP
12 6351222 SusilaDevi 9941813960 2023-10-30 12:16:11 2023-10-30 12:25:01 GDP
13 6366717 Pon Mudhrai Muthu 9956954152 2023-10-30 12:24:55 2023-10-30 12:24:57 GDP
14 6399804 A.Suresh 9677003529 2023-10-30 13:47:22 2023-10-31 18:13:29 GDP
15 6363839 M. Ganesan 9176146184 2023-10-30 17:14:05 2023-10-30 17:14:08 COLLOTH
16 6365628 Niranjan R 91-9943213231 2023-10-31 09:02:30 2023-10-31 14:27:23 WEB
17 6369042 Bhuneshwari 9176375400 2023-10-31 12:13:41 2023-10-31 12:13:43 COLLOTH
18 6369042 Balaji 9177395400 2023-10-31 12:13:41 2023-10-31 12:13:43 COLLOTH
我想要以下格式 My Dataframe Create Time Coulmn
任何一个帮助你
谢谢。。。。。。。。。
答:
0赞
JulienV
11/7/2023
#1
首先,您需要在读取 CSV 时解析日期
import datetime
df = pd.read_csv("table.csv", parse_dates=[4,5])
然后,我会重新命令添加一列,该列将对相应范围类别中的每条记录进行分类:
def get_range_interval(creation_date):
current_date = datetime.datetime.now()
if current_date - creation_date < pd.Timedelta(30, "D"):
range = 0
elif current_date - creation_date < pd.Timedelta(90, "D"):
range = 1
...
return range
df["range_interval"] = df["Created Time"].map(get_range_interval)
然后根据每个范围组进行聚合。像这样:
df_result = df.groupby(["G_Source", "range_interval"]).count()
评论
0赞
balaji
11/7/2023
UnboundLocalError:无法访问与值无关的局部变量“range”-this Error show Result my Data frame..............
0赞
JulienV
11/7/2023
答案是肯定的。这只是一个要完成的示例。它并未涵盖所有间隔范围。请添加涵盖所有范围所需的语句get_range_interval()
elif
1赞
Corralien
11/7/2023
#2
IIUC,您想透视您的数据帧吗?
df['Created Time'] = pd.to_datetime(df['Created Time'])
df['Modified Time'] = pd.to_datetime(df['Modified Time'])
bins = [0, 15, 30, 90, 180, 360, np.inf]
labels = ['1-15 days', '16-30 days', '1-3 months', '3-6 months', '6 months - 1 year', '> 1 year']
diff = (pd.Timestamp.today() - df['Created Time']).dt.days
age = pd.cut(diff, bins=bins, labels=labels)
out = pd.crosstab(df['G_Source'], age, margins=True)
输出:
>>> out
Created Time 1-15 days 16-30 days 1-3 months 3-6 months All
G_Source
CM 0 0 1 0 1
COLLOTH 3 0 0 0 3
EMAIL 0 0 1 0 1
GDP 4 3 0 1 8
JMB 0 0 0 1 1
WEB 2 0 2 0 4
All 9 3 4 2 18
更新
更准确的版本:pd.DateOffset
today = pd.Timestamp.today()
offsets = [pd.DateOffset(years=1), pd.DateOffset(months=6), pd.DateOffset(months=3),
pd.DateOffset(months=1), pd.DateOffset(days=15), pd.DateOffset(days=1)]
bins = today - pd.Series(offsets)
labels = ['6 months - 1 year', '3-6 months', '1-3 months', '16-30 days', '1-15 days']
age = pd.cut(df['Created Time'], bins=bins, labels=labels)
out = (pd.crosstab(df['G_Source'], age, margins=True)
.reindex(columns=labels[::-1] + ['All'], fill_value=0))
输出:
>>> out
Created Time 1-15 days 16-30 days 1-3 months 3-6 months 6 months - 1 year All
G_Source
CM 0 0 1 0 0 1
COLLOTH 3 0 0 0 0 3
EMAIL 0 0 1 0 0 1
GDP 4 3 0 1 0 8
JMB 0 0 0 1 0 1
WEB 2 0 2 0 0 4
All 9 3 4 2 0 18
评论
0赞
balaji
11/8/2023
非常感谢您的帮助,我真的很感激回复
0赞
balaji
11/8/2023
特别感谢Corralien.........................
0赞
balaji
11/8/2023
这个命令工作得很好
1赞
Corralien
11/8/2023
别客气。很乐意帮忙:-)。如果这个答案解决了你的问题,不要忘记接受它,以表明你的问题有一个解决方案。
0赞
balaji
11/8/2023
小问题 ?以上答案确定,但单击全部 共 9,3,2,4,2,18 查看记录
评论