DQN 模型选择具有可变参数的操作-解网

问：

我正在尝试为纸牌游戏编写 DQN 模型。我面临的问题是，我的代理选择了一个动作，然后根据动作，它要么选择不卡，要么选择一张卡。q_values总是根据代理应该选择的操作进行计算，因此我无法使用这些q_values来选择卡片。我认为为每张卡创建动作是不可能的。一个原因是游戏非常复杂，可能有很多卡牌，第二个原因是可供选择的卡牌数量总是不同的。它的范围可以从 25 张到 70 张牌不等。这些卡片也很难以任何方式分类。这将使每次代理选择动作时动作空间不同，我认为这不利于模型的学习。

我知道它现在无法正常工作，但为了向您展示，我目前是这样设置的：

if random.random() > epsilon:
            action_index = np.argmax(q_values)
            action = self.actions_to_chose_from[action_index]
            end = time.time()
            #print(f"chose_action took: {end - start}")
            return action
else:
            action = random.choice(self.actions_to_chose_from)
            end = time.time()
            #print(f"chose_action took: {end - start}")
            return action

 def take_action(self, action):
        last_state = self.get_state()
        chosen_cards = []
        q_values = agent.get_q_values(last_state)

        all_cards = self.clickable_cards
        all_cards_2 = []

        try:
            action()

        except:
            try:
                for _ in all_cards:
                    if epsilon < random.random():
                        try:
                            card_index = np.argmax(q_values)
                            card = all_cards[card_index]
                        except:
                            card = None
                    else:
                        try:
                            card = random.choice(all_cards)
                        except:
                            card = None

                    if card is not None:
                        chosen_cards.append(card)
                        all_cards.remove(card)

                action(chosen_cards)

            except:
                self.reset_cards_elements()
                for card in self.clickable_cards_e:
                    card = self.card_to_int(card)
                    if card is not None:
                        all_cards_2.append(card)
                if epsilon < random.random():
                    try:
                        card_index = np.argmax(q_values)
                        card = all_cards_2[card_index]
                    except:
                        card = None
                else:
                    try:
                        card = random.choice(all_cards_2)
                    except:
                        card = None

                if card is not None:
                    action(card)

我想到的可能解决方案：

现在，首先，我不确定是否有比深度 q 学习更好的方法来为纸牌游戏创建 AI。无论如何，这是我在谷歌搜索时发现的最好的方法，但如果你知道更好的方法，请告诉我。

我考虑过让 DQN 模型创建 2 个q_values是否是一个好主意，一个用于操作，另一个用于参数。我还考虑过为参数创建第二个代理。我没有尝试过这些想法，因为我不确定它们是否真的可能，如果它们可能的话，它们是否是一个好主意。

最后，我想过是否有一种方法可以让代理从大小不断变化的操作空间中进行选择。我找不到任何东西，就像我已经说过的，我不确定模型是否能够用这样的方法学习。但是，如果有办法，我可能会为每张卡创建一个特定的操作。

Python 机器学习人工智能

答： 暂无答案

上一个：合并 2 个 Pandas 数据帧和 [副本]

下一个：区间算术中的mod（模，fmod）

DQN 模型选择具有可变参数的操作

DQN model choosing actions with variable parameters

评论