如何在 boost::spirit 中实现解析语句,这本质上是切换解析器?

How could one implement parsing a statement in boost::spirit, which in essence switches parsers?

提问人:Frank Puck 提问时间:6/26/2023 最后编辑:seheFrank Puck 更新时间:6/27/2023 访问量:33

问:

我尝试为其编写解析器的语言有一个语句,该语句实质上为以下文本设置属性。这些属性包括

  • 区分大小写
  • 格式(包括不同的评论样式)

我只能想象通过切换到不同的解析器来实现这一点。我认为这需要成功终止当前解析器,并通过其属性返回如何处理其余不匹配的输入。如何做到这一点?

C++ 解析 boost-spirit 区分大小写

评论


答:

1赞 Syed Muhammad Ali Raza 6/26/2023 #1

在语句解析器中使用语义操作和 qi::lazy 指令,根据指定的属性调用相应的解析器

0赞 sehe 6/26/2023 #2

切换到其他解析器是一种方法。

与此相关的最显着模式是 Nabialek 技巧。这是建立在指令之上的。qi::lazy

但是,由于您已经提到了多个标志,因此可能无法扩展,因为它可能会导致不必要的重复和/或组合爆炸。

我建议使用一些解析器状态。你可以使用一些包含逻辑的语义操作来做到这一点,但它意味着你的解析器内部存在可变状态,这可能会损害可重入性、线程安全性和可重用性。这些是语义动作的非常普遍的缺点

相反,Qi 提供了本地属性,这些属性位于运行时解析器上下文中。

例如,让我们切换区分大小写:

样品来了,也做晚餐

餐后更新

一如既往,时间是好老师。我已经尝试过实际使局部/继承属性为重入工作,但它并没有像我记忆中的那样工作

因此,让我们采用可变状态,并将选项状态放在语法实例中。这样一来,事情就保持在可行的复杂程度上,尽管你不能总是共享解析器实例。

在 Coliru 上直播

// #define BOOST_SPIRIT_DEBUG
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
using namespace std::string_literals;

template <typename It> struct DemoParser : qi::grammar<It> {
    DemoParser() : DemoParser::base_type(start) {
        using namespace qi::labels;

        // shorthand mnemonics for accessing option state
        auto _case_option   = px::ref(case_opt);
        auto _strict_option = px::ref(strict_opt);
        qi::_r1_type kw_text; // another mnemonic, for the inherited attribute

        // handy phoenix actor (custom "directives")
        auto const _cs = qi::eps(_case_option == Sensitive);
        auto const _ci = qi::eps(_case_option == Insensitive);
     // auto const _sm = qi::eps(_strict_option == StrictOn);

        start = qi::skip(qi::space)[demo];

        demo = qi::eps[_case_option = Case::Sensitive]    // initialize
                      [_strict_option = Strict::StrictOn] // defaults?
            >> -(option | hello) % ';'                    //
            >> qi::eoi;

        option = kw("Option"s) >> (switch_case | switch_strict);
        hello                             //
            = _cs >> "Hello"              //
            | _ci >> qi::no_case["hello"] //
            ;

        _case_sym.add("sensitive", Case::Sensitive)("insensitive", Case::Insensitive);
        _strict_sym.add("on", Strict::StrictOn)("off", Strict::StrictOff);

        _case         = _cs >> _case_sym | _ci >> qi::no_case[_case_sym];
        _strict       = _cs >> _strict_sym | _ci >> qi::no_case[_strict_sym];
        switch_case   = kw("case"s) >> _case[_case_option = _1];
        switch_strict = kw("strict"s) >> _strict[_strict_option = _1];

        px::function c_str = [](std::string const& s) { return s.c_str(); };

        kw = (_cs >> qi::lit(c_str(kw_text))                 // case sensitive
              | _ci >> qi::no_case[qi::lit(c_str(kw_text))]) // case insensitive
            >> !qi::char_("a-zA-Z0-9._"); // lookahead assertion to avoid parsing partial identifiers

        BOOST_SPIRIT_DEBUG_NODES((start)(demo)(option)(hello)(switch_case)(switch_strict)(_case)(_strict)(kw))
    }

  private:
    qi::rule<It> start;

    enum Case { Sensitive, Insensitive } case_opt = Sensitive;
    enum Strict { StrictOff, StrictOn } strict_opt        = StrictOn;
    qi::symbols<char, Case>   _case_sym;
    qi::symbols<char, Strict> _strict_sym;

    using Skipper = qi::space_type;
    qi::rule<It, Skipper> demo, hello, option, switch_case, switch_strict;

    // lexeme
    qi::rule<It, Case()> _case;
    qi::rule<It, Strict()> _strict;
    qi::rule<It, std::string(std::string kw_text)> kw; // using inherited attribute
};

int main() {
    for (std::string_view input :
         {
             "",
             "bogus;", // FAIL
             "Hello;",
             "hello;",
             "Option case insensitive; heLlO;",
             "Option strict off;",
             "Option STRICT off;",
             "Option case insensitive; Option STRICT off;",
             "Option case insensitive; oPTION STRICT off;",
             "Option case insensitive; oPTION STRICT ON;",
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;", // FAIL
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;",
         }) //
    {
        DemoParser<std::string_view::const_iterator> p; // mutable instance now
                                                        //
        bool ok = parse(begin(input), end(input), p);
        std::cout << quoted(input) << " -> " << (ok ? "PASS" : "FAIL") << std::endl;
    }
}

打印测试用例的预期输出:

"" -> PASS
"bogus;" -> FAIL
"Hello;" -> PASS
"hello;" -> FAIL
"Option case insensitive; heLlO;" -> PASS
"Option strict off;" -> PASS
"Option STRICT off;" -> FAIL
"Option case insensitive; Option STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT ON;" -> PASS
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;" -> FAIL
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;" -> PASS

改进编译时间:X3

老实说,我认为对于动态参数化/组合规则,X3 更方便一些。它的编译速度也快得多,如果需要,更容易添加一些调试副作用:

在 Coliru 上直播

// #define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace DemoParser {
    enum Case { Insensitive, Sensitive };
    enum Strict { StrictOff, StrictOn };
    struct Options {
        enum Case   case_opt   = Sensitive;
        enum Strict strict_opt = StrictOn;
    };

    // custom "directives"
    auto const _cs = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).case_opt == Sensitive; })];
    auto const _ci = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).case_opt == Insensitive; })];
 // auto const _sm = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).strict_opt == StrictOn; })];

    auto set_opt = [](auto member) {
        return [member](auto& ctx) {
            auto& opt = get<Options>(ctx).*member;
            x3::traits::move_to(_attr(ctx), opt); 
        };
    };

    static inline auto variable_case(auto p, char const* name = "variable_case") {
        using Attr = x3::traits::attribute_of<decltype(p), x3::unused_type, void>::type;
        return x3::rule<struct _, Attr, true>{name} = //
            (_cs >> x3::as_parser(p) |                //
             _ci >> x3::no_case[x3::as_parser(p)]);
    }

    static inline auto kw(char const* kw_text) {
        // using lookahead assertion to avoid parsing partial identifiers
        return x3::rule<struct kw, std::string>{kw_text} = x3::lexeme[ //
                   variable_case(x3::lit(kw_text), kw_text)            //
                   >> !x3::char_("a-zA-Z0-9._")                        //
        ];
    }

    auto _case_sym = x3::symbols<Case>{}.add("sensitive", Case::Sensitive)("insensitive", Case::Insensitive).sym;
    auto _strict_sym = x3::symbols<Strict>{}.add("on", Strict::StrictOn)("off", Strict::StrictOff).sym;

    auto switch_case   = kw("case") >> variable_case(_case_sym)[set_opt(&Options::case_opt)];
    auto switch_strict = kw("strict") >> variable_case(_strict_sym)[set_opt(&Options::strict_opt)];

    auto option = kw("Option") >> (switch_case | switch_strict);
    auto hello  = _cs >> "Hello"      //
        | _ci >> x3::no_case["hello"] //
        ;

    auto demo  = -(option | hello) % ';' >> x3::eoi;
    auto start = x3::skip(x3::space)[demo];
}

int main() {
    auto const p = DemoParser::start; // stateless parser
    using DemoParser::Options;

    for (std::string_view input :
         {
             "",
             "bogus;", // FAIL
             "Hello;",
             "hello;",
             "Option case insensitive; heLlO;",
             "Option strict off;",
             "Option STRICT off;",
             "Option case insensitive; Option STRICT off;",
             "Option case insensitive; oPTION STRICT off;",
             "Option case insensitive; oPTION STRICT ON;",
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;", // FAIL
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;",
         }) //
    {
        Options opts;

        bool ok = parse(begin(input), end(input), x3::with<Options>(opts)[p]);
        std::cout << quoted(input) << " -> " << (ok ? "PASS" : "FAIL") << std::endl;
    }
}

仍然打印相同的测试输出:

"" -> PASS
"bogus;" -> FAIL
"Hello;" -> PASS
"hello;" -> FAIL
"Option case insensitive; heLlO;" -> PASS
"Option strict off;" -> PASS
"Option STRICT off;" -> FAIL
"Option case insensitive; Option STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT ON;" -> PASS
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;" -> FAIL
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;" -> PASS

对于这种方法,我有点没有时间了,我想我会参考我在这个网站上现有的例子。qi::lazy

评论

0赞 Frank Puck 6/26/2023
从本质上讲,这只是 3 种格式:format_1&&case_sensitive、format_1&&!case_sensitive、format_2&&!case_sensitive。因此,如果您能展示懒惰的用法,我将不胜感激。
0赞 sehe 6/27/2023
是的。我发现执行整个局部/继承属性变得笨拙。在我的记忆中,当地人会“自动”从父规则传递到子规则。因为这显然没有发生(?!coliru.stacked-crooked.com/a/99bbba6d8cfb3a0b)我确实建议懒惰的规则。由于其他工作,我必须在几个小时内回到这里。同时,当然,您可以在我现有的答案中搜索示例
0赞 sehe 6/27/2023
我用我从那以后所做的工作更新了我的答案。希望对您有所帮助