提问人:sushi 提问时间:11/15/2023 最后编辑:wjandreasushi 更新时间:11/15/2023 访问量:67
Pandas 比较列并认为 NULLS 相等
Pandas compare columns and consider NULLS equal
问:
我使用 Pandas 将两个数据帧连接在一起,并想比较列值是否相等。
但是,每当我遇到 NULL(或 NA ?) 值时,我的比较都会返回False
import pandas as pd
# create test dataframes
df_1 = pd.DataFrame({'key': [1, 2, 3, 4, 5], 'field1': ['foo', pd.NA, None, None, 6]})
df_2 = pd.DataFrame({'pk': [1, 2, 3, 4, 6], 'field2': ['foo', pd.NA, pd.NA, None, 6.0]})
# left join
df_joined = df_1.merge(df_2, 'left', left_on='key', right_on='pk')
# calculate comparison field
df_joined['compare'] = df_joined['field1'] == df_joined['field2']
print(df_joined)
收益 率
key field1 pk field2 compare
0 1 foo 1 foo True
1 2 <NA> 2 <NA> False
2 3 None 3 None False
3 4 <NA> 4 None False
4 5 10 5 10.0 True
我希望所有值都是 .compare
True
我知道这可以通过函数和 Pandas 来实现,但我希望有一个类似于我在示例中已经存在的很好的单行比较。apply
答:
2赞
PaulS
11/15/2023
#1
可能的解决方案:
df2 = df_joined[['field1', 'field2']].dropna(how = 'all')
df_joined['compare'] = pd.NA
df_joined.update(df2.assign(compare = df2.field1.eq(df2.field2)))
df_joined['compare'] = df_joined['compare'].fillna(True)
输出:
key field1 pk field2 compare
0 1 foo 1 foo True
1 2 <NA> 2 <NA> True
2 3 None 3 <NA> True
3 4 None 4 None True
4 5 6 5 6.0 True
上一个问题的可能解决方案:
df_joined.assign(compare = df_joined.field1.eq(df_joined.field2))
输出:
key field1 pk field2 compare
0 1 foo 1 foo True
1 2 bar 2 bar True
2 3 baz 3 baz True
3 4 None 4 None True
1赞
louis
11/15/2023
#2
Andrej + Paul 的比我的更简洁,但如果一个值是 None,另一个值是 pd,则不返回 true。不适用。如果您希望这也返回 true...
df_joined['compare'] = (df_joined['field1'] == df_joined['field2']) | (df_joined['field1'].isnull() & df_joined['field2'].isnull())
print(df_joined)
指纹
key field1 pk field2 compare
0 1 foo 1 foo True
1 2 bar 2 bar True
2 3 baz 3 baz True
3 4 None 4 <NA> True
我还用...
df_1 = pd.DataFrame({'key': [1, 2, 3, 4], 'field1': ['foo', 'bar', 'baz', None]})
df_2 = pd.DataFrame({'pk': [1, 2, 3, 4], 'field2': ['foo', 'bar', 'baz', pd.NA]})
评论