提问人:PineNuts0 提问时间:5/11/2017 最后编辑:PineNuts0 更新时间:11/23/2021 访问量:126037
条件 If 语句:如果行中的值包含字符串...设置另一列等于字符串
Conditional If Statement: If value in row contains string ... set another column equal to string
问:
编辑:
我的“Activity”列中填充了字符串,我想使用 if 语句派生“Activity_2”列中的值。
因此,Activity_2显示了所需的结果。从本质上讲,我想指出正在发生什么类型的活动。
我尝试使用下面的代码执行此操作,但它不会运行(请参阅下面的屏幕截图以获取错误)。任何帮助都非常感谢!
for i in df2['Activity']:
if i contains 'email':
df2['Activity_2'] = 'email'
elif i contains 'conference'
df2['Activity_2'] = 'conference'
elif i contains 'call'
df2['Activity_2'] = 'call'
else:
df2['Activity_2'] = 'task'
Error: if i contains 'email':
^
SyntaxError: invalid syntax
答:
38赞
Psidom
5/11/2017
#1
我假设您正在使用 ,那么您可以使用 ,它是 if/else 的矢量化版本,其条件构造为:pandas
numpy.where
str.contains
df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
pd.np.where(df.Activity.str.contains("conference"), "conference",
pd.np.where(df.Activity.str.contains("call"), "call", "task")))
df
# Activity Activity_2
#0 email personA email
#1 attend conference conference
#2 send email email
#3 call Sam call
#4 random text task
#5 random text task
#6 lwantto call call
3赞
Prakash Palnati
5/11/2017
#2
检查字符串的语法无效。
尝试使用
for i in df2['Activity']:
if 'email' in i :
df2['Activity_2'] = 'email'
14赞
moshfiqur
12/8/2017
#3
这也有效:
df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'
11赞
DovaX
5/13/2019
#4
如果 df 包含 NaN 值,则当前解决方案的行为是错误的。在这种情况下,我建议使用以下对我有用的代码
temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
pd.np.where(temp.str.contains("email"), "email",
pd.np.where(temp.str.contains("conference"), "conference",
pd.np.where(temp.str.contains("call"), "call", "task"))))
1赞
Hedge92
6/4/2021
#5
另一个解决方案可以在 @unutbu 发布的帖子中找到。这也适用于创建条件列。我更改了该帖子中的示例以匹配您的问题。请参阅下面的示例:df['Set'] == Z
df['Activity'].str.contains('yourtext')
import pandas as pd
import numpy as np
df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call foo']})
conditions = [
df['Activity'].str.contains('email'),
df['Activity'].str.contains('conference'),
df['Activity'].str.contains('call')]
values = ['email', 'conference', 'call']
df['Activity_2'] = np.select(conditions, values, default='task')
print(df)
你可以在这里找到原帖:Pandas conditional creation of a series/dataframe column
2赞
Dave Liu
11/12/2021
#6
- 您的代码有错误 - “elif”行上没有冒号。
- 你没有提到你正在使用 Pandas,但这就是我要接受的假设。
- 我的答案处理默认值,使用适当的 Python 约定,是最有效、最新且易于适应其他活动的。
DEFAULT_ACTIVITY = 'task'
def assign_activity(todo_item):
"""Assign activity to raw text TODOs
"""
activities = ['email', 'conference', 'call']
for activity in activities:
if activity in todo_item:
return activity
else:
# Default value
return DEFAULT_ACTIVITY
df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
'Colleague': ['Knor', 'Koen', 'Hedge']})
# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)
评论
if i == 'email': df2['Activity_2'] = 'email'