提问人:johnco3 提问时间:7/17/2023 最后编辑:康桓瑋johnco3 更新时间:7/18/2023 访问量:35
使用 Ranges-v3 将平面表转换为树形结构
converting a flat table to a tree structure using ranges-v3
问:
我有一个平面表示,如下表所示。 未排序的数据 std::vector 为:
unsorted vector
(id) (path) (fn) (line) (extra)
1 /abc/file3.c foo0 10 1
2 /abc/file3.c foo0 15 2
3 /abc/file3.c foo0 20 1
4 /abc/file3.c foo1 30 1
5 /abc/file3.c foo1 35 2
6 /abc/file3.c foo1 40 1
7 /abc/file1.c foo2 10 1
8 /abc/file1.c foo2 15 2
9 /abc/file1.c foo2 20 1
10 /abc/file3.c baz1 70 1
11 /abc/file3.c baz1 75 2
12 /abc/file3.c baz1 80 1
13 /abc/file2.c bat 10 1
14 /abc/file2.c bat 15 2
15 /abc/file2.c bat 17 2
16 /abc/file2.c bat 20 1
17 /def/file2.c baz 70 1
18 /def/file2.c baz 71 1
19 /def/file2.c baz 72 1
20 /def/file2.c baz 73 1
这些列表示“ID”、“path”、“function”、“linenumber”和“extra”。树形数据按层次结构排序为 path->funcion->lineNumber(每个路径包含多个函数,其中包含多个感兴趣线(探测点))。
此表中的每一行都用以下结构表示:
using Type = enum class Type : unsigned {
One = 1,
Two = 2
};
using MyStruct = struct MyStruct {
unsigned id;
std::string filename;
std::string function;
unsigned lineNum;
Type type;
};
使用上述层次结构对此数据进行排序后(通过以下比较器)
// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
};
我们最终得到正确排序的向量:
sorted vector
(id) (path) (fn) (line) (extra)
7 /abc/file1.c foo2 10 1
8 /abc/file1.c foo2 15 2
9 /abc/file1.c foo2 20 1
13 /abc/file2.c bat 10 1
14 /abc/file2.c bat 15 2
15 /abc/file2.c bat 17 2
16 /abc/file2.c bat 20 1
10 /abc/file3.c baz1 70 1
11 /abc/file3.c baz1 75 2
12 /abc/file3.c baz1 80 1
1 /abc/file3.c foo0 10 1
2 /abc/file3.c foo0 15 2
3 /abc/file3.c foo0 20 1
4 /abc/file3.c foo1 30 1
5 /abc/file3.c foo1 35 2
6 /abc/file3.c foo1 40 1
17 /def/file2.c baz 70 1
18 /def/file2.c baz 71 1
19 /def/file2.c baz 72 1
20 /def/file2.c baz 73 1
我需要使用新的 ranges 或 ranges-v3 API 解析此数据,以有效地重新创建表来源的树结构。我在这里指定范围首先是因为我正在学习这个复杂的 API,但也因为 API 似乎显示了一种通过延迟计算处理大型数据集的非常有效的方法)。
以下代码有效(也在 godbolt 中),但似乎是错误的。我正在使用一对嵌套范围chunk_by循环来解析数据。我需要提前中断外循环。
代码的主体如下:
// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
};
int
main() {
print("unsorted vector", structs);
// split the sorted probes into chunks
actions::sort(structs, customComp);
const auto outerComp = [](auto&& lhs, auto&& rhs) {
return lhs.filename == rhs.filename;
};
const auto innerComp = [](auto&& lhs, auto&& rhs) {
return lhs.function == rhs.function;
};
print("sorted vector", structs);
std::cout << std::endl;
// split sorted list of probes into chunks by filename
for (const auto& sources : structs | views::chunk_by(outerComp)) {
auto foo = sources.size();
for (const auto& next : sources) {
auto outcomes = 0;
for (const auto& functions : sources | views::chunk_by(innerComp)) {
for (const auto& probe : functions) {
outcomes += (probe.type == Type::Two) ? 2 : 1;
std::cout << std::format("{}\n", probe);
}
}
std::cout << next.filename << " outcomes [" << outcomes << "]\n";
break;
}
std::cout << "\n";
}
}
是否可以在单个 for 循环上执行排序和双分块?理想情况下,我想使用范围 API 的组合形式来获得最佳结果。
答:
1赞
Jerry Coffin
7/18/2023
#1
鉴于每条记录都包含执行此操作所需的所有信息,我只需创建一些东西来表示树结构,并在其中插入记录,而不是排序,然后从排序的记录中解析出范围。
#include <iostream>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <map>
#include <string>
// Keep the code self-contained, though in real use you undoubtedly want to
// read the raw data from a file, or something on that order.
char const *rawData = R"(
1 /abc/file3.c foo0 10 1
2 /abc/file3.c foo0 15 2
3 /abc/file3.c foo0 20 1
4 /abc/file3.c foo1 30 1
5 /abc/file3.c foo1 35 2
6 /abc/file3.c foo1 40 1
7 /abc/file1.c foo2 10 1
8 /abc/file1.c foo2 15 2
9 /abc/file1.c foo2 20 1
10 /abc/file3.c baz1 70 1
11 /abc/file3.c baz1 75 2
12 /abc/file3.c baz1 80 1
13 /abc/file2.c bat 10 1
14 /abc/file2.c bat 15 2
15 /abc/file2.c bat 17 2
16 /abc/file2.c bat 20 1
17 /def/file2.c baz 70 1
18 /def/file2.c baz 71 1
19 /def/file2.c baz 72 1
20 /def/file2.c baz 73 1
)";
struct record {
int id;
std::string path;
std::string fn;
int lineNumber;
int type;
bool operator<(record const &rhs) const {
return std::tie(path, fn, lineNumber, type) < std::tie(rhs.path, rhs.fn, rhs.lineNumber, rhs.type);
}
friend std::istream &operator>>(std::istream &is, record &r) {
return is >> r.id >> r.path >> r.fn >> r.lineNumber >> r.type;
}
friend std::ostream &operator<<(std::ostream &os, record const &r) {
return os << r.id << "\t" << r.path << "\t" << r.fn << "\t" << r.lineNumber << "\t" << r.type;
}
};
struct Probe {
int line;
int type;
friend std::ostream &operator<<(std::ostream &os, Probe const &p) {
return os << "\t\t" << p.line << " " << p.type;
}
};
class FuncRec {
std::vector<Probe> probes;
public:
void insert(record const &rec) {
probes.push_back(Probe{rec.lineNumber, rec.type});
}
friend std::ostream &operator<<(std::ostream &os, FuncRec const &f) {
for (auto const &p : f.probes) {
os << p << "\n";
}
return os;
}
};
class FileRec {
std::map<std::string, FuncRec> functions;
public:
void insert(record const &rec) {
functions[rec.fn].insert(rec);
}
friend std::ostream &operator<<(std::ostream &os, FileRec const &f) {
for (auto const &f : f.functions) {
os << "\t" << f.first << "\n";
os << f.second;
}
return os;
}
};
class Tree {
std::map<std::string, FileRec> files;
void insert(record const &rec) {
files[rec.path].insert(rec);
}
public:
Tree(std::vector<record> const &in) {
for (auto const &r : in)
insert(r);
}
friend std::ostream &operator<<(std::ostream &os, Tree const &t) {
for (auto const &f : t.files) {
os << f.first << "\n";
os << f.second;
}
return os;
}
};
int main() {
std::stringstream infile(rawData);
std::vector<record> recs { std::istream_iterator<record>(infile), {}};
Tree tree{recs};
std::cout << "Tree struture:\n";
std::cout << tree;
// In case you also want to show sorted structs:
std::cout << "\nSorted records:\n";
std::sort(recs.begin(), recs.end());
for (auto const &r : recs) {
std::cout << r << "\n";
}
std::cout << "\n";
}
这可能比实际需要的要复杂一些。例如,并没有真正完成太多。我们可以将 s 的向量嵌入到它的父级中(但我假设这是更复杂的东西的简化版本,其中可能有更多用途。FuncRec
Probe
FuncRec
评论
using Fn = std::map<std::string, int>; std::multimap<Path, Fn> records;
records