Python Pandas 数据框 如果字符串列表包含在一列中,则创建列

python pandas data frame create columns if list of string contained in one columns

提问人:highbury 提问时间:7/3/2022 更新时间:7/4/2022 访问量:56

问:

鉴于此 DF:

data = {'Description':  ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
        'Amount': ['10', '20', '10', '5', '9', '15']}
df = pd.DataFrame(data)
print (df)

和以下向量:

Fruits = ['apple','banana','lemon', 'orange']

如何获取“水果”列?(在列描述中搜索向量水果的所有元素,并将它们添加到“水果”列中(如果包含在描述中)

datanew = {'Description':  ['with milk and orange', 'champagne', 'BANANA', 'bananas and apple', 'fafsa Lemons', 'GIN LEMON'],
        'Amount': ['10', '20', '10', '5', '9', '15'],
        'Fruit':  ['orange', '', 'banana', 'banana-apple', 'lemon', 'lemon'],
       
       }
df2 = pd.DataFrame(datanew)
print (df2)
Python Pandas 字符串 搜索 匹配

评论


答:

1赞 mozway 7/3/2022 #1

您可以使用 str.extractallgroupby.agg

import re
df['Fruit'] = (df['Description']
               .str.extractall(f"({'|'.join(Fruits)})", flags=re.I)
               .groupby(level=0).agg('-'.join)[0]
               .str.lower()
              )

输出:

            Description Amount         Fruit
0  with milk and orange     10        orange
1             champagne     20           NaN
2                BANANA     10        banana
3     bananas and apple      5  banana-apple
4          fafsa Lemons      9         lemon
5             GIN LEMON     15         lemon

1赞 rhug123 7/4/2022 #2

这是另一种方法,通过使用str.findall()

(df.assign(Fruit = df['Description'].str.lower()
.str.findall('|'.join(Fruits))
.str.join('-')))

            Description Amount         Fruit
0  with milk and orange     10        orange
1             champagne     20              
2                BANANA     10        banana
3     bananas and apple      5  banana-apple
4          fafsa Lemons      9         lemon
5             GIN LEMON     15         lemon