测试两个 std：：lists 是否包含相同的唯一元素的最省资源的方法是什么？-解网

问：

在我的代码中，我必须比较以列表形式随机返回的结构中的键。我需要检查两个结构是否具有相同的关键元素，忽略顺序，只比较唯一元素。

目前，我使用的代码如下例所示：

#include <list>
#include <set>
#include <string>


template<typename T>
auto areListsAsSetsEqual(const std::list<T> &a, const std::list<T> &b) -> bool {
    auto aSet = std::set<T>{a.begin(), a.end()};
    auto bSet = std::set<T>{b.begin(), b.end()};
    return aSet == bSet;
}


auto main() -> int {
    auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
    auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
    auto z = std::list<std::string>{"green", "red", "yellow"};

    auto xyEqual = areListsAsSetsEqual(x, y);
    assert(xyEqual == true);
    auto xzEqual = areListsAsSetsEqual(x, z);
    assert(xzEqual == false);
    return 0;
}

它有效并且是简短而可靠的代码，但对于每次比较，必须创建两个新集合，并且必须复制两个列表中的所有元素。

有没有一种更有效、更优雅的方法来比较相同唯一元素的两个列表，使用更少的 CPU 和/或内存？

C++ 列表 17 比较 C++-标准库

如果集合的大小不同，则在两种情况下，比较都是恒定时间。因此，当集合相等或非常相似时，无序集合比较可能是最糟糕的（当键确实不同时，我想在这两种情况下比较几乎立即终止），所以我认为只有当集合相等或相似时，较慢的比较才可能是问题。

答：

0赞 sweenish 2/16/2023 #1

这是一个不同的看法。它只需要一个新容器，并且只复制一个列表。

#include <cassert>
#include <list>
#include <string>
#include <unordered_map>

template <typename T>
auto areListsAsSetsEqual(const std::list<T>& a, const std::list<T>& b) -> bool {
  std::unordered_map<T, bool> firstListKeys;
  for (const auto& i : a) {
    firstListKeys[i] = true;
  }

  for (const auto& i : b) {
    if (firstListKeys.find(i) != firstListKeys.end()) {
      firstListKeys[i] = false;
    } else {
      return false;
    }
  }

  for (const auto& p : firstListKeys) {
    if (p.second == true) {
      return false;
    }
  }

  return true;
}

auto main() -> int {
  auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
  auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
  auto z = std::list<std::string>{"green", "red", "yellow"};

  auto xyEqual = areListsAsSetsEqual(x, y);
  assert(xyEqual == true);
  auto xzEqual = areListsAsSetsEqual(x, z);
  assert(xzEqual == false);
  return 0;
}

第一个列表将复制到中，并且每个键都设置为。第二个列表将迭代并搜索地图（O（1））。如果没有找到，我们可以立即返回。如果找到，则将密钥设置为。然后，我们必须搜索以查看是否有任何元素留在状态中。std::unordered_maptruefalseunordered_maptrue

我没有运行任何基准测试，因此需要测试以查看此解决方案是否更有效。从理论上讲，两者都以 O（3N）运行，但平均运行时间需要测试。

但是，虽然运行时效率的提高是模糊的，但这种方法确实提供了明显的空间效率优势。您当前的算法目前需要 ~2N 个空间，而我的空间更接近 N。

我建议添加而不是使用该值作为计数器。否则，您将无法考虑重复的元素。仍然需要三遍，list 增加计数器，然后遍历列表减少计数器，最后检查地图中的所有值是否都相同，否则两个列表不相等。std::unordered_map<T, unsigned int>std::unordered_map<T, bool>ab0

0赞 sweenish 2/16/2023

@JorgeLópez 这不是被问到的问题。OP 只关心两个列表中存在的相同唯一值。再看一遍他们的示例代码，特别是函数。而锤击这一点 home std：：set 只能保存唯一值。main()

0赞 Jorge 2/16/2023

ops，你是对的，我发出了嘘声。

0赞 Jorge 2/16/2023

事实上，我建议你更新你的答案，以实际反映原始问题的意图。我还建议使用无序集合。

1赞 Flovdis 2/17/2023

@sweenish 请参阅我的答案，了解基于您的原则的解决方案，但不需要复制密钥。对于我的用例，我的加速几乎是原始方法的十倍。然而，有了大量的短键集，你的解决方案是优越的。

0赞 kaba 2/16/2023 #2

为了完整起见：您也可以尝试使用 std 算法。假设您不想修改输入，则需要副本。（对于只有一个副本的解决方案，请参阅@sweenish答案。

#include <list>
#include <string>
#include <cassert>
#include <algorithm>
#include <iterator>
#include <vector>

template<typename T>
auto areListsEqual(std::list<T> const &a, std::list<T> const &b) -> bool {
    std::vector<T> aVector(a.size());
    std::partial_sort_copy(a.cbegin(), a.cend(), aVector.begin(), aVector.end());
    auto aEnd = std::unique(aVector.begin(), aVector.end());
    
    std::vector<T> bVector(b.size());
    std::partial_sort_copy(b.cbegin(), b.cend(), bVector.begin(), bVector.end());
    auto bEnd = std::unique(bVector.begin(), bVector.end());

    return std::distance(aVector.begin(),aEnd) == std::distance(bVector.begin(),bEnd)
         ? std::equal(aVector.begin(), aEnd, bVector.begin())
         : false;
}

auto main() -> int {
    auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
    auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
    auto z = std::list<std::string>{"green", "red", "yellow"};
    auto w = std::list<std::string>{"green", "red", "yellow", "black"};

    auto xyEqual = areListsEqual(x, y);
    assert(xyEqual == true);
    auto xzEqual = areListsEqual(x, z);
    assert(xzEqual == false);
    auto xwEqual = areListsEqual(x, w);
    assert(xwEqual == false);
    return 0;
}

就“big-O”而言，这种解决方案不会更快。但它使用顺序容器作为中间存储，这在现代硬件上可能更有效率。与现在的优化一样，您必须使用具有代表性的数据进行衡量。

#include <vector>
#include <list>
#include <algorithm>
#include <string>

template<typename T>
auto areListsAsSetsEqual(const std::list<T> &a, const std::list<T> &b) -> bool {
    auto keyMap = std::vector<bool>(a.size(), false);
    const auto srcEndA = a.end();
    const auto srcEndB = b.end();
    std::size_t keyIndex = 0;
    for (auto it = a.begin(); it != srcEndA; ++it, ++keyIndex) {
        if (std::find(a.begin(), it, *it) == it) {
            keyMap[keyIndex] = true;
        }
    }
    for (const auto &element : b) {
        auto foundIt = std::find(a.begin(), a.end(), element);
        if (foundIt == a.end()) {
            return false;
        }
        keyMap[std::distance(a.begin(), foundIt)] = false;
    }
    return std::all_of(keyMap.begin(), keyMap.end(), [](bool flag) -> bool { return !flag; });
}

auto main() -> int {
    auto x = std::list<std::string>{"red", "blue", "yellow", "green", "green"};
    auto y = std::list<std::string>{"blue", "green", "yellow", "red", "red"};
    auto z = std::list<std::string>{"green", "red", "yellow"};

    auto xyEqual = areListsAsSetsEqual(x, y);
    assert(xyEqual == true);
    auto xzEqual = areListsAsSetsEqual(x, z);
    assert(xzEqual == false);
    return 0;
}

使用我广泛的测试数据，该数据使用大小从 4 到 128 个字符的键和 0 到 32 个元素的键集。对具有小突变的集合进行 200'000 次比较，我得到以下结果：std::string

实现	每次通话的平均时间	相等时的平均时间	速度增益
源语言	0.005232 毫秒	0.005444 毫秒	1×
卡巴	0.004337 毫秒	0.004275 毫秒	1.316
斯威尼什语	0.002796 毫秒	0.003919 毫秒	1.87×
这个解决方案	0.000566 毫秒	0.001305 毫秒	9.24×

该算法还具有最低的内存使用率。

上一个：使用 push_back 添加指向空 std：：list 的指针的正确方法

下一个：为什么错误：C++ 列表中的调用没有匹配的函数

测试两个 std：：lists 是否包含相同的唯一元素的最省资源的方法是什么？

What is the most resource efficient way to test if two std::lists contain the same unique elements?

评论

评论

评论