提问人:enriicoo 提问时间:11/7/2022 更新时间:11/7/2022 访问量:60
如何从 Pandas 上的嵌套列表列中获取最小值?为什么numpy.min()在numpy.mean()工作的情况下不起作用?
How to get the minimum value from a nested-list-column on Pandas? Why numpy.min() doesn't work in the situation that numpy.mean() works?
问:
我有一小段代码需要修改,我没有找到为什么 np.mean() 在 pandas 列由嵌套列表组成的特定情况下 np.min() 不起作用。也许这里有人可以澄清一下?
这里的代码片段非常有效:
import pandas as pd
import numpy as np
def transformation(custom_df):
dic = dict(zip(custom_df['customers'], custom_df['values']))
custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
custom_df['neighbors'].apply(
lambda row: np.mean([dic[v] for v in row if dic.get(v)])),
custom_df['values'])
return custom_df
customers = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6], [3], [], [3, 5], [6], [5]]
vn = [1, 1, 0, 2, 1, 1]
df2 = pd.DataFrame({'customers': customers, 'values': values, 'neighbors': neighbors, 'valid_neighbors': vn})
customers values neighbors valid_neighbors
0 1 NaN [6] 1
1 2 NaN [3] 1
2 3 10.0 [] 0
3 4 NaN [3, 5] 2
4 5 11.0 [6] 1
5 6 12.0 [5] 1
df2 = transformation(df2)
结果:
customers values neighbors valid_neighbors
0 1 12.0 [6] 1
1 2 10.0 [3] 1
2 3 10.0 [] 0
3 4 10.5 [3, 5] 2
4 5 11.0 [6] 1
5 6 12.0 [5] 1
但是,如果我在“transformation()”函数上将 np.mean() 更改为 np.min(),它将返回一个 ValueError,让我想知道为什么当我调用 np.mean() 函数时它不会发生:
ValueError: zero-size array to reduction operation minimum which has no identity
我想知道我没有满足哪些条件,以及我能做些什么来获得预期的结果,这将是:
customers values neighbors valid_neighbors
0 1 12.0 [6] 1
1 2 10.0 [3] 1
2 3 10.0 [] 0
3 4 10.0 [3, 5] 2
4 5 11.0 [6] 1
5 6 12.0 [5] 1
答:
1赞
Abhi
11/7/2022
#1
您的列中有一个空列表,它会抛出错误,但其中 as 甚至适用于空列表。neighbors
np.min
np.mean
import numpy as np
print(np.mean([]))
# Output
# nan
print(np.min([]))
# Throws error
# ValueError: zero-size array to reduction operation minimum which has no identity
1赞
Panda Kim
11/7/2022
#2
使用以下代码并获取结果:
df3 = df2.set_index('customers')
df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].mean()))
输出(平均值):
0 12.00
1 10.00
2 10.00
3 10.50
4 11.00
5 12.00
Name: values, dtype: float64
您可以更改为:mean
min
df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].min()))
输出(min):
0 12.00
1 10.00
2 10.00
3 10.00
4 11.00
5 12.00
Name: values, dtype: float64
使所需的结果列value
评论
0赞
enriicoo
11/9/2022
工作得很好。比原来的方法更干净,更快。谢谢!
1赞
tejash popate
11/7/2022
#3
最好通过调整列中的空数组来更新函数。
下面是一个可能有效的解决方法。transformation
neighbors
def transformation(custom_df):
dic = dict(zip(custom_df['customers'], custom_df['values']))
custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
custom_df['neighbors'].apply(
lambda row: np.min([dic[v] for v in row if dic.get(v)]) if len(row) else 0),
custom_df['values'])
return custom_df
评论
initial
np.min([])