提问人:Pro Q 提问时间:4/30/2023 更新时间:4/30/2023 访问量:72
如何比较两个具有文本、数字和 None 值的 Pandas Dataframes
How to compare two Pandas Dataframes with text, numerical, and None values
问:
我有两个数据帧,除了 s 之外,它们都包含文本和数字数据。但是,具有整数,并且具有浮点数。df1
df2
None
df1
df2
我尝试将它们的相等性与 进行比较,但由于类型差异(整数与浮点数),这失败了。我也尝试过这样做,但这失败了(我想这是因为文本数据)。df1.equals(df2)
np.allclose(df1, df2, equal_nan=True)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
如何检查数据是否相同?df1
df2
答:
0赞
Pro Q
4/30/2023
#1
不幸的是,在这种情况下,似乎没有任何简单的函数可以检查是否相等,因此我们必须构建自己的函数。
为了进行检查,我们将根据是文本数据(“对象”)还是数字数据来拆分列。然后,我们可以将数字数据与文本数据进行比较。我们还将通过将 s 转换为 numpy 来处理它们,以便 numpy 可以更好地处理它们。None
Nan
代码如下:
def compare_mixed_dataframes(df1, df2) -> bool:
# (This code was written by GPT-4, but I've tested it and it works)
# Get the column names of numerical columns
num_cols = df1.select_dtypes(include=[np.number]).columns
# Convert numerical columns to float and replace None with NaN
df1_num = df1[num_cols].astype(float).fillna(np.nan)
df2_num = df2[num_cols].astype(float).fillna(np.nan)
# Compare numerical columns with a tolerance value using numpy.allclose()
num_comparison = np.allclose(df1_num, df2_num, rtol=1e-05, atol=1e-08, equal_nan=True)
# Compare sentence columns using pandas.DataFrame.equals()
string_cols = df1.select_dtypes(include=['object']).columns
str_comparison = df1[string_cols].equals(df2[string_cols])
# Combine the results of numerical and sentence columns comparisons
return num_comparison and str_comparison
如果你想自己测试代码,下面是一个快速脚本来测试它:
# Also written by GPT-4, but edited by me to contain a more advanced test case
# I have also checked to make sure that this works
import numpy as np
import pandas as pd
def compare_mixed_dataframes(df1, df2):
# Get the column names of numerical columns
num_cols = df1.select_dtypes(include=[np.number]).columns
# Convert numerical columns to float and replace None with NaN
df1_num = df1[num_cols].astype(float).fillna(np.nan)
df2_num = df2[num_cols].astype(float).fillna(np.nan)
# Compare numerical columns with a tolerance value using numpy.allclose()
num_comparison = np.allclose(df1_num, df2_num, rtol=1e-05, atol=1e-08, equal_nan=True)
# Compare sentence columns using pandas.DataFrame.equals()
string_cols = df1.select_dtypes(include=['object']).columns
str_comparison = df1[string_cols].equals(df2[string_cols])
# Combine the results of numerical and sentence columns comparisons
return num_comparison and str_comparison
# Create example DataFrames with mixed types (ints, floats, text, and Nones)
data1 = {'text': ['hello', 'world', None],
'num': [None, 2, 3]}
df1 = pd.DataFrame(data1)
data2 = {'text': ['hello', 'world', None],
'num': [None, 2.0, 3.0]}
df2 = pd.DataFrame(data2)
# DataFrames with different numbers
data3 = {'text': ['hello', 'world', None],
'num': [None, 2, 4]}
df3 = pd.DataFrame(data3)
# Test the custom function with same and different DataFrames
print(compare_mixed_dataframes(df1, df2)) # True
print(compare_mixed_dataframes(df1, df3)) # False
1赞
Panda Kim
4/30/2023
#2
例
data1 = {'text': ['hello', 'world', None],
'num': [None, 2, 3]}
df1 = pd.DataFrame(data1)
data2 = {'text': ['hello', 'world', None],
'num': [None, 2.0, 3.0]}
df2 = pd.DataFrame(data2)
法典
df1.equals(df2.astype(df1.dtypes))
输出:
True
如果您担心转换 dtypes 时发生错误,请使用下面的代码。
df1.equals(df2.astype(df1.dtypes, errors='ignore'))
如果您无法将 dtype 更改为相同(忽略时),无论如何它们都不相同
评论