提问人:PeterBe 提问时间:6/6/2023 最后编辑:PeterBe 更新时间:8/5/2023 访问量:193
如何在 pandas 数据帧中找到帕累托最优解
How to find the pareto-optimal solutions in a pandas dataframe
问:
我有一个 pandas 数据帧,其名称可以作为 csv 文件下载到此处:https://easyupload.io/bdqso4df_merged_population_current_iteration
现在,我想创建一个新的数据帧,该数据帧包含所有帕累托最优解,这些解决方案与数据帧中的 2 个目标“成本”和“峰值负载”最小化有关。此外,它应确保不存储重复值,这意味着如果解决方案在“成本”和“峰值负载”这两个目标上具有相同的值,则它应该只保存一个解决方案。此外,还可以检查“热不适”的值是否小于 2。如果不是这种情况,则该解决方案将不会包含在新的 .pareto_df
df_merged_population_current_iteration
pareto_df
为此,我想出了以下代码:
import pandas as pd
df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")
# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)
for i, row in df_merged_population_current_iteration.iterrows():
is_dominated = False
is_duplicate = False
for j, other_row in df_merged_population_current_iteration.iterrows():
if i == j:
continue
# Check if the other solution dominates the current solution
if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
(other_row['Costs'] <= row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
(other_row['Costs'] < row['Costs'] and other_row['Peak Load'] <= row['Peak Load']):
# The other solution dominates the current solution
is_dominated = True
break
# Check if the other solution is a duplicate
if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
is_duplicate = True
break
if not is_dominated and not is_duplicate and row['Thermal Discomfort'] < 2:
# The current solution is Pareto-optimal, not a duplicate, and meets the discomfort threshold
row_df = pd.DataFrame([row])
pareto_df = pd.concat([pareto_df, row_df], ignore_index=True)
print(pareto_df)
在大多数情况下,代码工作正常。但是,在某些情况下,没有将帕累托最优解添加到新的数据帧中,尽管存在满足条件的帕累托最优解。这可以从我上面发布的数据中看出。您可以看到,“运行的 id”为 7 和 8 的解是帕累托最优的(并满足了热不适约束)。但是,当前代码不会将这 2 个中的任何一个添加到新数据帧中。它应该添加其中一个(但不是 2 个,因为这将是重复的)。我不得不承认,我已经尝试了很多,并仔细查看了代码,但我无法找到代码中的错误。pareto_df
以下是上传数据的实际输出:
Empty DataFrame
Columns: [Unnamed: 0, id of the run, Costs, Peak Load, Thermal Discomfort, Combined Score]
Index: []
您是否看到错误可能是什么以及我必须如何调整代码,以便它实际上在不添加重复项的情况下找到所有帕累托最优解?
提醒:有谁知道为什么代码没有找到所有帕累托最优解?我将非常感谢任何评论。
答:
测试优势的条件应该写得更严格。罪魁祸首似乎是你同时检查非支配性和双重性的最后一个条款。if
您的旧代码有一个错误,只有当它不受支配并且不会同时重复时,它才会向 output() DataFrame 添加一行。如果输入 DataFrame 中有重复的行,则此条件将不起作用。如果两行是重复的,我们应该添加其中一行,因为它们彼此之间不占主导地位。旧代码没有正确执行,因此是空的 DataFrame。pareto_df
您应该记住,只有当一个点保持不占主导地位时,我们才会将其添加到帕累托数据帧中。输出中的重复性将通过 处理。drop_duplicates
df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")
# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)
for i, row in df_merged_population_current_iteration.iterrows():
is_dominated = False
is_duplicate = False
for j, other_row in df_merged_population_current_iteration.iterrows():
if i == j:
continue
# Check if the other solution dominates the current solution
if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']):
# The other solution dominates the current solution and hence row cannot be added to pareto set.
is_dominated = True
break
# Check if the other solution is a duplicate
if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
is_duplicate = True
break
if not is_dominated and row['Thermal Discomfort'] < 2:
# The current solution is Pareto-optimal, and meets the discomfort threshold
row_df = pd.DataFrame([row])
pareto_df = pd.concat([pareto_df, row_df], ignore_index=True).drop_duplicates()
print(pareto_df)
评论