提问人:Boni Srinu 提问时间:11/17/2023 更新时间:11/17/2023 访问量:18
DataFrame 中具有最常见值的缺失值:如何填充它们?
missing values in a DataFrame with the most frequent value how to fill them?
问:
我有一个包含两列的 pandas DataFrame:和 .该列包含缺失值。toy
color
color
如何用该特定最频繁的值填充缺失值?color
color
toy
下面是用于创建示例数据集的代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
下面是示例数据集:
toy color
0 car red
1 car blue
2 car blue
3 car NaN
4 train green
5 train NaN
6 train red
7 train red
8 train NaN
9 ball blue
10 ball red
11 ball NaN
12 truck green
答:
0赞
Gabriel Ramuglia
11/17/2023
#1
您应该使用 groupby 为每个玩具找到最常见的颜色,然后使用带有 lambda 函数的 apply 来填充 NaN 值。
试试这个:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'toy': ['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
'color': ['red', 'blue', 'blue', np.nan, 'green', np.nan, 'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
most_frequent_color = df.groupby('toy')['color'].apply(lambda x: x.mode()[0])
df['color'] = df.apply(lambda row: most_frequent_color[row['toy']] if pd.isna(row['color']) else row['color'], axis=1)
评论