如何在 pandas 数据帧中找到帕累托最优解-解网

问：

我有一个 pandas 数据帧，其名称可以作为 csv 文件下载到此处：https://easyupload.io/bdqso4df_merged_population_current_iteration

现在，我想创建一个新的数据帧，该数据帧包含所有帕累托最优解，这些解决方案与数据帧中的 2 个目标“成本”和“峰值负载”最小化有关。此外，它应确保不存储重复值，这意味着如果解决方案在“成本”和“峰值负载”这两个目标上具有相同的值，则它应该只保存一个解决方案。此外，还可以检查“热不适”的值是否小于 2。如果不是这种情况，则该解决方案将不会包含在新的 .pareto_dfdf_merged_population_current_iterationpareto_df

为此，我想出了以下代码：

import pandas as pd

df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")

# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)

for i, row in df_merged_population_current_iteration.iterrows():
    is_dominated = False
    is_duplicate = False
    for j, other_row in df_merged_population_current_iteration.iterrows():
        if i == j:
            continue
        # Check if the other solution dominates the current solution
        if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
                (other_row['Costs'] <= row['Costs'] and other_row['Peak Load'] < row['Peak Load']) or \
                (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] <= row['Peak Load']):
            # The other solution dominates the current solution
            is_dominated = True
            break
        # Check if the other solution is a duplicate
        if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
            is_duplicate = True
            break

    if not is_dominated and not is_duplicate and row['Thermal Discomfort'] < 2:
        # The current solution is Pareto-optimal, not a duplicate, and meets the discomfort threshold
        row_df = pd.DataFrame([row])
        pareto_df = pd.concat([pareto_df, row_df], ignore_index=True)

print(pareto_df)

在大多数情况下，代码工作正常。但是，在某些情况下，没有将帕累托最优解添加到新的数据帧中，尽管存在满足条件的帕累托最优解。这可以从我上面发布的数据中看出。您可以看到，“运行的 id”为 7 和 8 的解是帕累托最优的（并满足了热不适约束）。但是，当前代码不会将这 2 个中的任何一个添加到新数据帧中。它应该添加其中一个（但不是 2 个，因为这将是重复的）。我不得不承认，我已经尝试了很多，并仔细查看了代码，但我无法找到代码中的错误。pareto_df

以下是上传数据的实际输出：

Empty DataFrame
Columns: [Unnamed: 0, id of the run, Costs, Peak Load, Thermal Discomfort, Combined Score]
Index: []

这是所需的输出（一个帕累托最优解）：

您是否看到错误可能是什么以及我必须如何调整代码，以便它实际上在不添加重复项的情况下找到所有帕累托最优解？

提醒：有谁知道为什么代码没有找到所有帕累托最优解？我将非常感谢任何评论。

Python pandas 帕累托最优

@itprorh66 这绝对不是一个基于意见的问题。我提供了您要求的所有东西，并明确了输出应该是什么。我也有示例输入数据。你说的“因为开放式和基于意见的问题归结为主观反应”是完全错误的。在我提供了你要求的一切之后，你投票关闭这个问题也真的很粗鲁。如果你不想回答这个明确的问题，那很好。但是投票关闭，以至于其他人无法回答它，这是相当卑鄙的（特别是考虑到你错误的理由）

答：

0赞 MSS 8/5/2023 #1

测试优势的条件应该写得更严格。罪魁祸首似乎是你同时检查非支配性和双重性的最后一个条款。if

您的旧代码有一个错误，只有当它不受支配并且不会同时重复时，它才会向 output（） DataFrame 添加一行。如果输入 DataFrame 中有重复的行，则此条件将不起作用。如果两行是重复的，我们应该添加其中一行，因为它们彼此之间不占主导地位。旧代码没有正确执行，因此是空的 DataFrame。pareto_df

您应该记住，只有当一个点保持不占主导地位时，我们才会将其添加到帕累托数据帧中。输出中的重复性将通过处理。drop_duplicates

df_merged_population_current_iteration = pd.read_csv("C:/Users/wi9632/Desktop/sample_input.csv", sep=";")

# create a new DataFrame to store the Pareto-optimal solutions
pareto_df = pd.DataFrame(columns=df_merged_population_current_iteration.columns)

for i, row in df_merged_population_current_iteration.iterrows():
    is_dominated = False
    is_duplicate = False
    for j, other_row in df_merged_population_current_iteration.iterrows():
        if i == j:
            continue
        # Check if the other solution dominates the current solution
        if (other_row['Costs'] < row['Costs'] and other_row['Peak Load'] < row['Peak Load']):
            # The other solution dominates the current solution and hence row cannot be added to pareto set.
            is_dominated = True
            break
        # Check if the other solution is a duplicate
        if (other_row['Costs'] == row['Costs'] and other_row['Peak Load'] == row['Peak Load']):
            is_duplicate = True
            break

    if not is_dominated and row['Thermal Discomfort'] < 2:
        # The current solution is Pareto-optimal, and meets the discomfort threshold
        row_df = pd.DataFrame([row])
        pareto_df = pd.concat([pareto_df, row_df], ignore_index=True).drop_duplicates()

print(pareto_df)

上一个：如何确定R中6D矩阵的帕累托最优解？

下一个：do（）缺少 2 个必需的位置参数：“n_select”和“n_parents”在 Pymoo 优化期间

如何在 pandas 数据帧中找到帕累托最优解

How to find the pareto-optimal solutions in a pandas dataframe

评论