使用具有并行执行策略的 C++ transform_reduce（）函数的 MapReduce 字数统计-解网

问：

我有一串单词，我想计算每个单词的出现次数，并将结果存储在地图中。我想使用 std：：transform_reduce（）来利用它的并行处理选项并在更大的数据集上使用它。

例如：

std::string text = "apple orange banana apple apple orange";

std::istringstream iss(text);

// Put these words into a vector
// Define a vector (CTAD), use its range constructor and the std::istream_iterator as iterator
std::vector words(std::istream_iterator<std::string>(iss), {});

// Aim: Use transform_reduce() to populate an unordered_map that maps each word to its word count

std::unordered_map<std::string, int> wordCount;

// Count the occurrences using transform_reduce
std::transform_reduce(
    words.begin(),
    words.end(), 
    // use the unordered_map wordCount somehow?
    // [](){ lambda for reduce to populate the unordered_map wordCount}
    // [](){ lambda for transforming the words vector of data to pairs: e.g. ("apple", 1) }
);

如何在 C++17 或更高版本中使用 transform_reduce（）实现此目的？

注意：chatGPT 对它进行了尝试，但它的代码没有编译，我看不出如何让它工作。

C++ MapReduce C++17 标准

chatGPT 对它进行了尝试，但它的代码没有编译。它不能编写任何代码，它会在其代码集合中搜索，随机排列和组合找到的代码片段并给出结果，如果 ChatGPT 找不到与问题完全匹配的代码，则很少能奏效。没有人用于计算向量中的单词，因此你得到了一个被 ChatGPT 猜到的不起作用的代码。std::transform_reduce

0赞 Will 5/29/2023

我希望它能够并行地为更大的数据集进行计算，因此希望使用transform_reduce。

0赞 273K 5/29/2023

这个问题与“我希望它为更大的数据集并行进行计算”的目标完全不同。至少输入不会是向量，而是词频的哈希图。

0赞 Will 6/1/2023

我假设MapReduce的定义是一个编程模型和一个相关的实现，用于使用并行的分布式算法处理和生成大数据集。我认为这就是它的重点，我认为 transform_reduce（）是一样的——它确实有一个将 ExecutionPolicy 作为第一个参数的版本。

答：

3赞 Martin York 5/29/2023 #1

我会使用一个然后使用一个基于 for 循环的范围。std::ranges::subrange

#include <unordered_map>
#include <string>
#include <sstream>
#include <iostream>
#include <ranges>
#include <iterator>



int main()
{
    using StreamWordIter = std::istream_iterator<std::string>;
    using SubRange       = std::ranges::subrange;

    std::string        text = "apple orange banana apple apple orange";    
    std::istringstream iss(text);

    std::unordered_map<std::string, int> wordCount;
    for (auto const& word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
        ++wordCount[word];
    }

    for (auto const& [word, count]: wordCount) {
        std::cout << "  Word: " << word << " => " << count << "\n";
    }
}

上一个：为什么 C++ 标准中没有平行的“std：：accumulate”？

下一个：使用 STL 排序功能对列表进行排序

使用具有并行执行策略的 C++ transform_reduce（）函数的 MapReduce 字数统计

MapReduce word count using C++ transform_reduce() function with a parallel execution policy

评论

使用具有并行执行策略的 C++ transform_reduce（） 函数的 MapReduce 字数统计

MapReduce word count using C++ transform_reduce() function with a parallel execution policy

评论

使用具有并行执行策略的 C++ transform_reduce（）函数的 MapReduce 字数统计