提问人:Will 提问时间:5/29/2023 最后编辑:Will 更新时间:6/6/2023 访问量:161
使用具有并行执行策略的 C++ transform_reduce() 函数的 MapReduce 字数统计
MapReduce word count using C++ transform_reduce() function with a parallel execution policy
问:
我有一串单词,我想计算每个单词的出现次数,并将结果存储在地图中。我想使用 std::transform_reduce() 来利用它的并行处理选项并在更大的数据集上使用它。
例如:
std::string text = "apple orange banana apple apple orange";
std::istringstream iss(text);
// Put these words into a vector
// Define a vector (CTAD), use its range constructor and the std::istream_iterator as iterator
std::vector words(std::istream_iterator<std::string>(iss), {});
// Aim: Use transform_reduce() to populate an unordered_map that maps each word to its word count
std::unordered_map<std::string, int> wordCount;
// Count the occurrences using transform_reduce
std::transform_reduce(
words.begin(),
words.end(),
// use the unordered_map wordCount somehow?
// [](){ lambda for reduce to populate the unordered_map wordCount}
// [](){ lambda for transforming the words vector of data to pairs: e.g. ("apple", 1) }
);
如何在 C++17 或更高版本中使用 transform_reduce()实现此目的?
注意:chatGPT 对它进行了尝试,但它的代码没有编译,我看不出如何让它工作。
答:
3赞
Martin York
5/29/2023
#1
我会使用一个然后使用一个基于 for 循环的范围。std::ranges::subrange
#include <unordered_map>
#include <string>
#include <sstream>
#include <iostream>
#include <ranges>
#include <iterator>
int main()
{
using StreamWordIter = std::istream_iterator<std::string>;
using SubRange = std::ranges::subrange;
std::string text = "apple orange banana apple apple orange";
std::istringstream iss(text);
std::unordered_map<std::string, int> wordCount;
for (auto const& word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
++wordCount[word];
}
for (auto const& [word, count]: wordCount) {
std::cout << " Word: " << word << " => " << count << "\n";
}
}
评论
transform_reduce
for(auto& word : words) ++wordCount[word];
std::transform_reduce