在数据帧中过滤并将字符串替换为 NaN-解网

问：

我想过滤含有%G%或%P%的水果如果不像 %G% 或 %P%，则应替换为 NaN

唯一 ID	水果1	水果2	水果3
1234	香蕉	桃	番石榴
1235	橙	葡萄	南
1236	梨	木瓜	杏
1237	番石榴	南	南
1238	几维鸟	樱桃	桃

我的结果需要如下所示：

唯一 ID	水果1	水果2	水果3
1234	南	桃	番石榴
1235	南	葡萄	南
1236	梨	木瓜	南
1237	番石榴	南	南
1238	南	南	桃

python pandas numpy null nan

评论

3赞 Michael Butscher 4/22/2023

在问题中以格式正确的文本形式显示您自己的努力（代码）。

答：

0赞 Cryosin 4/22/2023 #1

# Create a list with the strings to match
to_search = ["G","P"]
# Replaces string by np.NaN if it doesnt contain any "G" or "P" 
rep = lambda string: np.nan if not any(s in string for s in to_search) else string
# Apply the function to the whole DataFrame
df = df.applymap(rep)

这应该可以解决问题，假设你的 DataFrame 被称为 df。

评论

0赞 Community 4/25/2023

您的答案可以通过额外的支持信息得到改进。请编辑以添加更多详细信息，例如引文或文档，以便其他人可以确认您的答案是正确的。您可以在帮助中心找到有关如何写出好答案的更多信息。

1赞 maximdu 4/22/2023 #2

如果只想更改某些列，可以执行以下操作：

to_replace = ["Fruit1", "Fruit2", "Fruit3"]  # these columns will be changed
for column in to_replace:
    df[column] = df[column].where(
        df[column].str.contains("G|P")
    )

默认情况下，使用正则表达式，因此如果您熟悉正则表达式，可能会更方便一些。str.contains

0赞 mozway 4/22/2023 #3

您可以筛选“水果”列，然后使用正则表达式替换不以 P 或 G 开头的值：

cols = list(df.filter(like='Fruit'))
# ['Fruit1', 'Fruit2', 'Fruit3']

out = (df.drop(columns=cols)
         .join(df[cols].replace('^(?![PG])', np.nan, regex=True))
         [df.columns]
      )

输出：

   Unique ID Fruit1  Fruit2 Fruit3
0       1234    NaN   Peach  Guava
1       1235    NaN   Grape    NaN
2       1236   Pear  Papaya    NaN
3       1237  Guava     NaN    NaN
4       1238    NaN     NaN  Peach

上一个：df 根据另一个 col 的 groupby 搜索 null cols？

下一个：Groupby with agg 和基于行的条件