DataFrame 中具有最常见值的缺失值:如何填充它们?

missing values in a DataFrame with the most frequent value how to fill them?

提问人:Boni Srinu 提问时间:11/17/2023 更新时间:11/17/2023 访问量:18

问:

我有一个包含两列的 pandas DataFrame:和 .该列包含缺失值。toycolorcolor

如何用该特定最频繁的值填充缺失值?colorcolortoy

下面是用于创建示例数据集的代码:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
    'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
             'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
    })

下面是示例数据集:

      toy  color
0     car    red
1     car   blue
2     car   blue
3     car    NaN
4   train  green
5   train    NaN
6   train    red
7   train    red
8   train    NaN
9    ball   blue
10   ball    red
11   ball    NaN
12  truck  green
蟒蛇 HTML

评论


答:

0赞 Gabriel Ramuglia 11/17/2023 #1

您应该使用 groupby 为每个玩具找到最常见的颜色,然后使用带有 lambda 函数的 apply 来填充 NaN 值。

试试这个:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'toy': ['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
    'color': ['red', 'blue', 'blue', np.nan, 'green', np.nan, 'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})

most_frequent_color = df.groupby('toy')['color'].apply(lambda x: x.mode()[0])
df['color'] = df.apply(lambda row: most_frequent_color[row['toy']] if pd.isna(row['color']) else row['color'], axis=1)

来源:我的文章 https://ioflood.com/blog/python-groupby/