基于多个条件创建新列 [duplicate]-解网

问：

17天前关闭。

我有一个包含 3 列的数据帧。我的目标是根据下面列出的条件创建 2 个新列。

ID A B C 新 COL1 新 COL2 1 99 0.1 5
2 35 0.4 6 3 60 0.9
3
4 99 0.04 0

if(df['a'] > 50 & df['b'] >= 0.04 and df['c'] ==0):
    df['nc1'] = 'message1'
    df['nc2'] = 'message2'
elif(df['a'] >= 50 & df['b'] < 0.04 and df['c'] > 0):
    df['nc1'] = 'message3'
    df['nc2'] = 'message4'
else:
    df['nc1'] = null
    df['nc2'] = null

当我尝试在 if else 结构中只给出 1 个条件时，它可以工作，但是当我尝试添加更多条件时，代码会失败，并出现错误“序列的真值不明确。使用 a.empty、a.bool（）、a.item（）、a.any（）或 a.all（）'

我尝试修改代码添加条件，然后使用 np.where 创建新列，但我认为这不是解决它的正确方法。

任何人都可以帮助解决这个问题吗？这是我试图解决的工作，所以无法粘贴实际的代码片段，但这给出了整体思路。提前致谢。

python if-statement 条件语句

df = pd.DataFrame({'a':[99, 35, 60, 99, 51], 'b':[0.1, 0.4, 0.9, 0.04, 0.02], 'c':[5, 6, 3, 0, 1]})
df['nc1'] = np.nan
df['nc2'] = np.nan

a_mask1 = df['a'] > 50
b_mask1 = df['b'] >= 0.04
c_mask1 = df['c'] == 0

a_mask2 = df['a'] >= 50
b_mask2 = df['b'] < 0.04
c_mask2 = df['c'] > 0

mask1 = a_mask1 & b_mask1 & c_mask1
mask2 = a_mask2 & b_mask2 & c_mask2

# set the values for nc1
df['nc1'][mask1] = 'message1'
df['nc1'][mask2] = 'message3'
# set the values for nc2
df['nc2'][mask1] = 'message2'
df['nc2'][mask2] = 'message4'

df

针对您关于有一长串 conditoins 的评论，也许这会解决您的问题？

import operator

df = pd.DataFrame({'a':[99, 35, 60, 99, 51], 'b':[0.1, 0.4, 0.9, 0.04, 0.02], 'c':[5, 6, 3, 0, 1]})

source_col_names = df.columns 

n_new_cols = 3
new_col_names = [f'nc{x+1}' for x in range(n_new_cols)]
df[new_col_names] = np.nan

operators = {
    '<': operator.lt,
    '<=': operator.le,
    '==': operator.eq,
    '>=': operator.ge,
    '>': operator.gt,
}

conditions = [
    {
        'a': {'operator':'>','threshold':50},
        'b': {'operator':'>=','threshold':0.04},
        'c': {'operator':'==','threshold':0},
        'messages': ['message1', 'message2', 'rtyu'], # one for every new column added
    },
    {
        'a': {'operator':'>=','threshold':50},
        'b': {'operator':'<','threshold':0.04},
        'c': {'operator':'>','threshold':0},
        'messages': ['message3', 'message4', '231234'], # one for every new column added
    },
    {
        'a': {'operator':'==','threshold':60},
        'messages': ['xyz', 'rghyt', 'r642'], # one for every new column added
    },
]

for condition in conditions:
    composite_mask = pd.Series(True, index=df.index)
    for new_col_num, new_col_name in enumerate(new_col_names):
        for source_col_name in source_col_names:
            if source_col_name in condition.keys():
                op_str = condition[source_col_name]['operator']
                threshold = condition[source_col_name]['threshold']
                
                op_func = operators[op_str]
                mask = op_func(df[source_col_name], threshold)
                
                composite_mask = composite_mask & mask
        df[new_col_name][composite_mask] = condition['messages'][new_col_num]
        
df

可能有更有效/pythonic 的方式来执行循环，但这似乎是一种蛮力方法。

这假设你在某个时候无论如何都需要写出你的条件，所以你可以把它们作为一个新条目写进字典。conditions

还可以提供任意数量的新列，并为每个条件提供一条消息以应用于每个新列，具体取决于中每个条目中的列表。messagesconditions

这应该适用于任意数量的源数据列，并且不需要为每个源数据列提供条件。当然，由于循环的性质，大量的源列和结果列会有些慢。

基于多个条件创建新列 [duplicate]

Create new columns based on multiple conditions [duplicate]

评论

评论