提问人:Pete 提问时间:11/13/2023 更新时间:11/13/2023 访问量:57
使用 lambda 函数,如何遍历 pandas 数据框中具有列表值的列
Using lambda function, how to iterate over the columns having list values in pandas data frame
问:
import pandas as pd
mydata = {"Key" : [567, 568, 569, 570, 571, 572] , "Sprint" : ["Max1;Max2", "Max2", "DI001 2", "DI001 25", "DAS 100" , "DI001 101"]}
df = pd.DataFrame(mydata)
df ["sprintlist"]= df["Sprint"].str.split(";")
print (df)
从此数据帧中,我只想将列表中每个值的“Sprintlist”列中字符串最后一部分中出现的数字提取到新列表“Sprintnumb”中,如下所示
预期输出:
在我之前的一个查询中,我清楚地了解了当“Sprint”列中只有一个值时如何提取数字。我尝试使用 lambda 函数来实现所需的输出,但出现错误“str' 对象没有属性'str'”
df["Sprint Number"] = df.Sprint.str.extract(r"(\d+)$").astype(int)
答:
1赞
jezrael
11/13/2023
#1
将 Series.explode 与 Series.str.extractall
一起使用,转换为数字列表和聚合列表:
df["Sprint Number"] = (df["sprintlist"].explode()
.str.extractall(r"(\d+)$")[0]
.astype(int)
.groupby(level=0)
.agg(list))
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max2 [Max1, Max2] [1, 2]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
或者将列表包含与:regex
df["Sprint Number"] = [[int(re.search('(\d+)$', y).group(0)) for y in x]
for x in df["sprintlist"]]
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max2 [Max1, Max2] [1, 2]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
如果可能的话,某些字符串不以数字结尾,添加带有测试的分配运算符::=
None
import re
mydata = {"Key" : [567, 568, 569, 570, 571, 572] ,
"Sprint" : ["Max1;Max", "Max2", "DI001 2", "DI001 25", "DAS 100" , "DI001 101"]}
df = pd.DataFrame(mydata)
df ["sprintlist"]= df["Sprint"].str.split(";")
df["Sprint Number"] = [[int(m.group(0))
for y in x if( m:=re.search('(\d+)$', y)) is not None]
for x in df["sprintlist"]]
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max [Max1, Max] [1]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
0赞
mozway
11/13/2023
#2
将 str.findall
与 lookahead 一起使用:
df['Sprint'].str.findall(r'\d+(?=$|\s*;)')
或者对于自定义格式(转换为 int 或 joining):
import re
pat = re.compile(r'\d+(?=$|\s*;)')
df['Sprintbumb'] = [';'.join(pat.findall(s)) for s in df['Sprint']]
# or
df['Sprintbumb'] = [list(map(int, pat.findall(s))) for s in df['Sprint']]
输出:
Key Sprint Sprintbumb
0 567 Max1;Max2 [1, 2]
1 568 Max2 [2]
2 569 DI001 2 [2]
3 570 DI001 25 [25]
4 571 DAS 100 [100]
5 572 DI001 101 [101]
评论