提问人:majid Hakimi 提问时间:10/31/2023 更新时间:10/31/2023 访问量:51
我定义了一个函数,当分母为零时,输出为零,但输出是一个数字
I defined a function that when the denominator is zero, the output is zero, but the output is a number
问:
我在stackoverflow中搜索,但我没有看到同样的问题。
我有一个数据框,数据帧的信息如下所示。这是一个财务数据框架
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44691 entries, 0 to 44690
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 44691 non-null int64
1 title 44691 non-null object
2 tagline 20284 non-null object
3 release_date 44657 non-null datetime64[ns]
4 genres 42586 non-null object
5 belongs_to_collection 4463 non-null object
6 original_language 44681 non-null object
7 budget_musd 8854 non-null float64
8 revenue_musd 7385 non-null float64
9 production_companies 33356 non-null object
10 production_countries 38835 non-null object
11 vote_count 44691 non-null float64
12 vote_average 42077 non-null float64
13 popularity 44691 non-null float64
14 runtime 43179 non-null float64
15 overview 43740 non-null object
16 spoken_languages 41094 non-null object
17 poster_path 44467 non-null object
18 cast 42502 non-null object
19 cast_size 44691 non-null int64
20 crew_size 44691 non-null int64
21 director 43960 non-null object
dtypes: datetime64[ns](1), float64(6), int64(3), object(12)
memory usage: 7.5+ MB
有两个变量对我来说很重要 • budget_musd:电影的预算,单位为百万美元。 • revenue_musd:电影的总收入,单位为百万美元。
我想创建一个等于 revenue_musd/budget_musd 的新列名称 ROI 我使用以下函数并将其应用于ROI列
def calculate_roi(row):
if pd.isna(row['budget_musd']) or pd.isna(row['revenue_musd']):
return 0
elif row['budget_musd'] == 0.00 or row['revenue_musd'] == 0.00:
return 0
else:
return row['revenue_musd'] / row['budget_musd']
df['ROI'] = df.apply(lambda row: calculate_roi(row), axis=1)
但
我收到逻辑错误。 我已经定义,当分数的分母变为零时,ROI 等于零,但答案是错误的。
我向您发送原始 3976 数据帧和 ROI 结果
RAW 3976 的数据和结果
id 13703
title Less Than Zero
tagline In Beverly Hills, you can have anything your heart desires. You just can't have it the way it used to be.
release_date 1987-11-06 00:00:00
genres Drama|Crime|Romance
belongs_to_collection NaN
original_language en
budget_musd 0.00
revenue_musd 12.40
production_companies Twentieth Century Fox Film Corporation|Amercent Films|American Entertainment Partners L.P.
production_countries United States of America
vote_count 77.00
vote_average 6.10
popularity 4.03
runtime 98.00
overview A college freshman returns to Los Angeles for the holidays at his ex-girlfriend's request, but discovers that his former best friend has an out-of-control drug habit.
spoken_languages English|Deutsch|Español
poster_path <img src='http://image.tmdb.org/t/p/w185//1GY0ZhAxOR2RgxGnOkeKoKb2mFM.jpg' style='height:100px;'>
cast Andrew McCarthy|Jami Gertz|Robert Downey Jr.|James Spader|Brad Pitt|Tony Bill|Nicholas Pryor|Michael Bowen|Sarah Buxton|Donna Mitchell
cast_size 10
crew_size 3
director Marek Kanievska
profit 12.40
ROI 12396383.00
Name: 3965, dtype: object
根据函数,我应该收到 0 的 ROI,但答案是错误的!
我的问题在哪里
答:
0赞
Gabriel Santello
10/31/2023
#1
如果要先处理 NaN 值,可以执行:
df = df.fillna(0)
然后你可以执行 np.divide,零将由函数管理。
df['ROI'] = np.divide(df['revenue_musd'],df['budget_musd'])
0赞
Code Different
10/31/2023
#2
您可以使用 Series.div
并将异常结果替换为 0:
df["ROI"] = df["revenue_musd"].div(df["budget_musd"]).replace([np.nan, np.inf, -np.inf], 0)
工作原理:Series.div
nan / anything
或者是anything / nan
nan
anything / 0
是无穷大或负无穷大- 所有其他情况下的正常除法
因此,您只需要将前 2 个案例的结果替换为 0 即可满足您的需要。
评论
0赞
majid Hakimi
11/5/2023
如果可能的话,通过这一行为我重写函数
0赞
Code Different
11/6/2023
不需要任何功能。 比上面的矢量化代码慢很多apply
评论