提问人:highbury 提问时间:7/3/2022 更新时间:7/4/2022 访问量:56
Python Pandas 数据框 如果字符串列表包含在一列中,则创建列
python pandas data frame create columns if list of string contained in one columns
问:
鉴于此 DF:
data = {'Description': ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
'Amount': ['10', '20', '10', '5', '9', '15']}
df = pd.DataFrame(data)
print (df)
和以下向量:
Fruits = ['apple','banana','lemon', 'orange']
如何获取“水果”列?(在列描述中搜索向量水果的所有元素,并将它们添加到“水果”列中(如果包含在描述中)
datanew = {'Description': ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
'Amount': ['10', '20', '10', '5', '9', '15'],
'Fruit': ['orange', '', 'banana', 'banana-apple', 'lemon', 'lemon'],
}
df2 = pd.DataFrame(datanew)
print (df2)
答:
1赞
mozway
7/3/2022
#1
您可以使用 str.extractall
和 groupby.agg
:
import re
df['Fruit'] = (df['Description']
.str.extractall(f"({'|'.join(Fruits)})", flags=re.I)
.groupby(level=0).agg('-'.join)[0]
.str.lower()
)
输出:
Description Amount Fruit
0 with milk and orange 10 orange
1 champagne 20 NaN
2 BANANA 10 banana
3 bananas and apple 5 banana-apple
4 fafsa Lemons 9 lemon
5 GIN LEMON 15 lemon
1赞
rhug123
7/4/2022
#2
这是另一种方法,通过使用str.findall()
(df.assign(Fruit = df['Description'].str.lower()
.str.findall('|'.join(Fruits))
.str.join('-')))
Description Amount Fruit
0 with milk and orange 10 orange
1 champagne 20
2 BANANA 10 banana
3 bananas and apple 5 banana-apple
4 fafsa Lemons 9 lemon
5 GIN LEMON 15 lemon
上一个:从逗号分隔值中搜索完全匹配项
评论