使用 Ranges-v3 将平面表转换为树形结构

converting a flat table to a tree structure using ranges-v3

提问人:johnco3 提问时间:7/17/2023 最后编辑:康桓瑋johnco3 更新时间:7/18/2023 访问量:35

问:

我有一个平面表示,如下表所示。 未排序的数据 std::vector 为:

unsorted vector
(id)    (path)          (fn)    (line)  (extra)
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1

这些列表示“ID”、“path”、“function”、“linenumber”和“extra”。树形数据按层次结构排序为 path->funcion->lineNumber(每个路径包含多个函数,其中包含多个感兴趣线(探测点))。

此表中的每一行都用以下结构表示:

using Type = enum class Type : unsigned {
    One = 1,
    Two = 2
};

using MyStruct = struct MyStruct {
    unsigned id;
    std::string filename;
    std::string function;
    unsigned lineNum;
    Type type;
};

使用上述层次结构对此数据进行排序后(通过以下比较器)

// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
    return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
        std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
    };

我们最终得到正确排序的向量:

sorted vector
(id)    (path)          (fn)    (line)  (extra)
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1

我需要使用新的 rangesranges-v3 API 解析此数据,以有效地重新创建表来源的树结构。我在这里指定范围首先是因为我正在学习这个复杂的 API,但也因为 API 似乎显示了一种通过延迟计算处理大型数据集的非常有效的方法)。

以下代码有效(也在 godbolt 中),但似乎是错误的。我正在使用一对嵌套范围chunk_by循环来解析数据。我需要提前中断外循环。

代码的主体如下:

// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
    return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
        std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
    };

int
main() {
    print("unsorted vector", structs);
    // split the sorted probes into chunks
    actions::sort(structs, customComp);
    const auto outerComp = [](auto&& lhs, auto&& rhs) {
            return lhs.filename == rhs.filename;
        };
    const auto innerComp = [](auto&& lhs, auto&& rhs) {
            return lhs.function == rhs.function;
        };
    print("sorted vector", structs);
    std::cout << std::endl;
    // split sorted list of probes into chunks by filename
    for (const auto& sources : structs | views::chunk_by(outerComp)) {
        auto foo = sources.size();
        for (const auto& next : sources) {
            auto outcomes = 0;
            for (const auto& functions : sources | views::chunk_by(innerComp)) {
                for (const auto& probe : functions) {
                    outcomes += (probe.type == Type::Two) ? 2 : 1;
                    std::cout << std::format("{}\n", probe);
                }
            }
            std::cout << next.filename << " outcomes [" << outcomes << "]\n";
            break;
        }
        std::cout << "\n";
    }
}

是否可以在单个 for 循环上执行排序和双分块?理想情况下,我想使用范围 API 的组合形式来获得最佳结果。

C++ 排序 C++-标准库 std-ranges range-v3

评论

1赞 Jerry Coffin 7/18/2023
如果要表示树结构,则更直接地表示它可能更有意义,例如:因此具有路径,每个路径都包含一些函数,每个函数都有一个名称和一些行号。有了这样的东西,你应该独立地插入每一行。using Fn = std::map<std::string, int>; std::multimap<Path, Fn> records;records
0赞 johnco3 7/18/2023
@JerryCoffin 谢谢 Jerry,这是一个很好的建议,但重点是我想使用范围 - 特别是如果数据集变得很大。此外,据我所知,范围具有懒惰评估的优势。我对范围很陌生,我在 api 上苦苦挣扎,同时也欣赏它对大容量数据的潜在能力。多地图将需要预先分配,这违背了我试图实现的目的。

答:

1赞 Jerry Coffin 7/18/2023 #1

鉴于每条记录都包含执行此操作所需的所有信息,我只需创建一些东西来表示树结构,并在其中插入记录,而不是排序,然后从排序的记录中解析出范围。

#include <iostream>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <map>
#include <string>

// Keep the code self-contained, though in real use you undoubtedly want to 
// read the raw data from a file, or something on that order.
char const *rawData = R"(
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1
)";

struct record {
    int id;
    std::string path;
    std::string fn;
    int lineNumber;
    int type;

    bool operator<(record const &rhs) const { 
        return std::tie(path, fn, lineNumber, type) < std::tie(rhs.path, rhs.fn, rhs.lineNumber, rhs.type);
    }

    friend std::istream &operator>>(std::istream &is, record &r) { 
        return is >> r.id >> r.path >> r.fn >> r.lineNumber >> r.type;
    }
    friend std::ostream &operator<<(std::ostream &os, record const &r) { 
        return os << r.id << "\t" << r.path << "\t" << r.fn << "\t" << r.lineNumber << "\t" << r.type;
    }
};

struct Probe {
    int line;
    int type;

    friend std::ostream &operator<<(std::ostream &os, Probe const &p) { 
        return os << "\t\t" << p.line << " " << p.type;
    }
};

class FuncRec { 
    std::vector<Probe> probes;
public:

    void insert(record const &rec) { 
        probes.push_back(Probe{rec.lineNumber, rec.type});
    }

    friend std::ostream &operator<<(std::ostream &os, FuncRec const &f) { 
        for (auto const &p : f.probes) {
            os << p << "\n";
        }
        return os;
    }
};

class FileRec { 
    std::map<std::string, FuncRec> functions;
public:
    void insert(record const &rec) { 
        functions[rec.fn].insert(rec);
    }

    friend std::ostream &operator<<(std::ostream &os, FileRec const &f) {
        for (auto const &f : f.functions) { 
            os << "\t" << f.first << "\n";
            os << f.second;
        }
        return os;
    }
};

class Tree {
    std::map<std::string, FileRec> files;

    void insert(record const &rec) { 
        files[rec.path].insert(rec);
    }

public:

    Tree(std::vector<record> const &in) {
        for (auto const &r : in)
            insert(r);
    }

    friend std::ostream &operator<<(std::ostream &os, Tree const &t) {
        for (auto const &f : t.files) { 
            os << f.first << "\n";
            os << f.second;
        }
        return os;
    }
};

int main() {
    std::stringstream infile(rawData);

    std::vector<record> recs { std::istream_iterator<record>(infile), {}};

    Tree tree{recs};

    std::cout << "Tree struture:\n";
    std::cout << tree;

    // In case you also want to show sorted structs:
    std::cout << "\nSorted records:\n";
    std::sort(recs.begin(), recs.end());
    for (auto const &r : recs) {
        std::cout << r << "\n";
    }
    std::cout << "\n";

}

这可能比实际需要的要复杂一些。例如,并没有真正完成太多。我们可以将 s 的向量嵌入到它的父级中(但我假设这是更复杂的东西的简化版本,其中可能有更多用途。FuncRecProbeFuncRec