提问人:Qendrim Krasniqi 提问时间:9/8/2022 最后编辑:Michael S.Qendrim Krasniqi 更新时间:9/9/2022 访问量:861
如何使用 pandas 根据之前的行值填充数据框?
How to populate data frame based on previous row values with pandas?
问:
我有以下数据帧:
df = pd.DataFrame(data={
"id": ['a', 'd'],
"amount": [3000, 4000],
"rate": [0.2, 0.3],
"date": ["2022-07-21", "2022-08-11"],
"months": [4, 5],
})
我想为给定的交易创建分期付款。输出应如下所示:
id interest principal date balance installment
0 a 50.000000 731.508250 2022-07-21 2.268492e+03 781.508250
1 a 37.808196 743.700055 2022-08-21 1.524792e+03 781.508250
2 a 25.413195 756.095055 2022-09-21 7.686966e+02 781.508250
3 a 12.811611 768.696640 2022-10-21 0.00000 781.508250
4 d 100.000000 760.987444 2022-08-11 3.239013e+03 860.987444
5 d 80.975314 780.012130 2022-09-11 2.459000e+03 860.987444
6 d 61.475011 799.512433 2022-10-11 1.659488e+03 860.987444
7 d 41.487200 819.500244 2022-11-11 8.399877e+02 860.987444
8 d 20.999694 839.987750 2022-12-11 0.00000 860.987444
这里的关键点是下一行值取决于前一行值。第一次是源数据帧。balance
balance
amount
我目前的解决方案:
import numpy_financial as npf
import pandas as pd
from dateutil.relativedelta import relativedelta
df = pd.DataFrame(data={
"id": ['a', 'd'],
"amount": [3000, 4000],
"rate": [0.2, 0.3],
"date": ["2022-07-21", "2022-08-11"],
"months": [4, 5],
})
def get_output_df(df):
columns = ["id", "interest", "principal", "date", "balance"]
output_df = pd.DataFrame(columns=columns)
for _, row in df.iterrows():
rate = row["rate"] / 12
months = row["months"]
amount = row["amount"]
date = pd.to_datetime(row["date"]).date()
installment_amount = npf.pmt(rate=rate, nper=months, pv=-amount)
prior_balance = amount
loan_installment_data = []
for i in range(months):
interest_amount = rate * prior_balance
principal_amount = installment_amount - interest_amount
balance = prior_balance - principal_amount
loan_installment_data.append(
{
"id": row["id"],
"interest": interest_amount,
"principal": principal_amount,
"date": date,
"installment": installment_amount,
"balance": balance
}
)
prior_balance = balance
date += relativedelta(months=1)
output_df = output_df.append(loan_installment_data, ignore_index=True)
return output_df
output_df = get_output_df(df)
是否有任何 pandas 功能可用于执行相同的实现?
答:
0赞
Adam Jaamour
9/8/2022
#1
您可以尝试使用 pandas 移位
函数,该函数将行值移位指定的值(因此可以是上一行)。通过使用 ,您可以将上一行的金额(例如)提供给下一行,并且无需遍历每一行并将每个前一行值保存在临时变量中即可进行计算。shift
下面是数据的示例:
import pandas as pd
df = pd.DataFrame(data={
"id": ['a', 'd'],
"amount": [3000, 4000],
"rate": [0.2, 0.3],
"date": ["2022-07-21", "2022-08-11"],
"months": [4, 5],
})
df["prior_balance"] = df["amount"].shift(1, fill_value=0)
df
输出:
使用 ,第一行将是 NaN,因此您可以使用该 NaN 值设置为 0,因为例如没有以前的量。shift
fill_value=0
评论
1赞
BeRT2me
9/9/2022
更好的是,有一个参数。 这也将防止整数列转换为浮点数,因此您也不需要调用!shift
fill_value
.shift(fill_value=0)
.astype(int)
评论
shift(1)
balance
principal
prior_balance
interest