ANTLR4:在语法文件中的空格条件处理方面需要帮助

ANTLR4: Need assistance with conditional handling of whitespace within grammar files

提问人:84292048274956 提问时间:7/12/2023 更新时间:7/14/2023 访问量:53

问:

我是 ANTLR 的新手,但已经为特定的访问控制语法编写了一个功能强大的 (4.13.0) 解析器/词法分析器。

介绍

仅就上下文而言,此语法实现的最终目标是促进将合规的文本表达式解析为基于对象的表示形式,这些表示形式可由管理员以编程方式进行分析。

设置/环境

目前,我的工作仅限于通过 antlr4 bash 别名在 (Ubuntu) 命令行上使用 java,如 ANTLR 站点上所述。

我正在使用随附的语法文件进行测试,并使用 echocat 将测试输入“管道”到 grun。从本质上讲,我就像以下示例一样(其他值驻留在附加的解析器语法文件中):

$ antlr4 ACI*.g4 ; javac *.java
$ echo -n '(ssf>="128")' | grun ACI bindSecurityStrengthFactor -tree -trace
<desirable result output>

这个解决方案作为一个整体非常(非常)接近我想要的。考虑到我大约三天前才开始学习 ANTLR,我对结果非常满意。工作的东西绝对是坚如磐石的。

然而,有一个相当重要的问题。

空白

我明白我已经阅读了(我认为是)PDF中标题为“The Definitive ANTLR4 Reference - By Terence Parr”的相关部分,并且理解我在这个论坛上阅读了许多类似的帖子,我发现自己无法为手头的问题设计一个可靠的解决方案。

总的来说,在这个解析器的术语中,空格应该在很大程度上被原谅。当然,也有一些例外(例如:关键字不得包含空格),尽管有几个“值定义”确实需要允许空格,例如 和 (在值的上下文中使用时)。好消息是(价值方面)我已经实现了我需要的东西。例如,我可以指定一个 DN,例如:distinguishedNameaVAOrRDNattributeTypeOrValue

ldap:///cn=Courtney Tolana,ou=People,dc=example,dc=com

...它按预期工作(上面的名字和姓氏 - 包括空格 - 保留为所谓的“通用名称”字符串值)。

问题

不起作用的是无处不在的空间不应该/无关紧要。具体来说,我指的是以下内容:

     this too
     v
key = "value"
   ^
   this

用户应该能够草率地指定 ,以及任何其他涉及字段间空格的排列。然而,如果不事先删除空格,将以失败告终:key="value"key = "value"

$ echo -n 'userdn = "ldap:///uid=Courtney Tolana,ou=People,dc=example,dc=com"' | grun ACI bindUserDN -tree

...生产:

line 1:0 mismatched input 'userdn ' expecting {'(', 'userdn'}
(bindUserDN userdn  =   " ldap:/// uid = Courtney Tolana , ou = People , dc = example , dc = com ")

当主动清理上述空白时,它像玻璃一样光滑:

$ echo -n 'userdn="ldap:///uid=Courtney Tolana,ou=People,dc=example,dc=com"' | grun ACI  bindUserDN -tree

...生产:

(bindUserDN userdn (equalTo =) (distinguishedNames " (distinguishedName ldap:/// (aVAOrRDN (attributeTypeOrValue uid) (attributeComparisonOperator (equalTo =)) (attributeTypeOrValue Courtney Tolana)) , (aVAOrRDN (attributeTypeOrValue ou) (attributeComparisonOperator (equalTo =)) (attributeTypeOrValue People)) , (aVAOrRDN (attributeTypeOrValue dc) (attributeComparisonOperator (equalTo =)) (attributeTypeOrValue example)) , (aVAOrRDN (attributeTypeOrValue dc) (attributeComparisonOperator (equalTo =)) (attributeTypeOrValue com))) "))

补救尝试

自然而然地,我突然想到,WHSP重定向的简单放置可能是罪魁祸首。

// this.
WHITESPACE
  : [ \t\r\n\u000C]+ -> skip
  ;

所以我移动了它,首先是在文件的开头,然后是在结尾。似乎我尝试过的每一个建议要么没有明显的区别,要么导致以前有效的规则突然被打破。

我还尝试切换到默认频道而不是“跳过”(例如:),但这没有帮助。channel(HIDDEN);

我还尝试过在已知组件标记之间使用可选的空格表达式来污染语法(例如:),虽然这在某些时候有效,但它并没有提供全局解决方案(而且它很丑陋,我很高兴它没有工作)。KEYWORD WHSP? OPERATOR ...

似乎每次我试图解决这个具体问题时(诚然,有时是在绝望中),我最终都会逆转之前取得的大量进展,因为我在尝试一些我能够收集到的解决方案时不知不觉地引入了新问题。我想我的轮子在泥泞中旋转,我只是想知道发生了什么,这样我就可以完成这个项目。我离得太近了!

解析器语法

这是解析器语法,由于 SO 的字符限制和缺少文件上传,我不得不从中删除许多有用的注释。

// ACIv3 Parser Grammar - work in progress

parser grammar ACIParser;
options { tokenVocab=ACILexer; }
parse
  : instruction EOF
  ;

instruction
  : targetRules LPAREN ANCHOR DQUOTE attributeTypeOrValue DQUOTE WHSP*? SEMI WHSP*? permissionBindRules WHSP? RPAREN # aci
  ;

// permissionBindRules describes one (1) or more permissionBindRule
// values. Values of this kind appear within the top-level of an ACI.
permissionBindRules
  : permissionBindRule*     # permission_bind_rules
  ;

// permissionBindRule describes a permission and Bind Rule pair.
// Values of this kind appear within permissionBindRules values.
permissionBindRule
  : permission bindRule WHSP? SEMI WHSP?    # permission_and_bind_rule_pair
  ;

// permission describes a complete permissive statement for an ACI, which
// may either grant or deny certain privileges.
//
// e.g.: allow(read,search,compare)
permission
  : permissionDisposition WHSP*? LPAREN ( WHSP*? accessPrivileges ( COMMA accessPrivileges WHSP*?)* ) WHSP*? RPAREN # permission_expression
  ;

// permissionDisposition describes the disposition of a given ACI permission
// statement, which may be either 'allow' or 'deny'.
permissionDisposition
  : ALLOW_ACCESS    # allow_access
  | DENY_ACCESS     # deny_access
  ;

accessPrivileges
  : SEARCH_PRIVILEGE    # search_privilege
  | READ_PRIVILEGE  # read_privilege
  | COMPARE_PRIVILEGE   # compare_privilege
  | ADD_PRIVILEGE   # add_privilege
  | DELETE_PRIVILEGE    # delete_privilege
  | SELFWRITE_PRIVILEGE # selfwrite_privilege
  | PROXY_PRIVILEGE # proxy_privilege
  | IMPORT_PRIVILEGE    # import_privilege
  | EXPORT_PRIVILEGE    # export_privilege
  | ALL_PRIVILEGES  # all_privileges
  ;

///////////////////////////////////////////////////////////////////////////////
// Begin TARGET RULES

// targetRules defines a sequence of zero (0) or more
// targetRule instances; use of Target Rules is optional
// in ACIs, however Target Rule statements are *ALWAYS*
// parenthetical, unlike Bind Rules which may be either.
targetRules
  : targetRule*              # target_rules
  ;

// targetRule defines any one (1) of nine (9) possible
// Target Rule types.
targetRule
  : targetControl       # rule_is_targetcontrol
  | targetExtendedOperation # rule_is_extop
  | targetFilter        # rule_is_targetfilter
  | targetAttrFilters       # rule_is_targattrfilters
  | targetScope         # rule_is_targetscope
  | targetAttributes        # rule_is_targetattr
  | target          # rule_is_target
  | targetTo            # rule_is_target_to
  | targetFrom          # rule_is_target_from
  ;

target
  : LPAREN TARGET (equalTo|notEqualTo) distinguishedNames RPAREN                        # target_dn_rule
  ;

targetTo
  : LPAREN TARGET_TO (equalTo|notEqualTo) DQUOTE distinguishedName DQUOTE RPAREN                # target_to_rule
  ;

targetFrom
  : LPAREN TARGET_FROM (equalTo|notEqualTo) DQUOTE distinguishedName DQUOTE RPAREN              # target_from_rule
  ;

targetFilter
  : LPAREN TARGET_FILTER (equalTo|notEqualTo) DQUOTE lDAPFilter DQUOTE RPAREN                   # targetfilter_rule
  ;

targetAttrFilters
  : LPAREN TARGET_ATTR_FILTERS equalTo DQUOTE targetAttrFiltersValue DQUOTE RPAREN              # targattrfilters_rule
  ;

targetScope
  : LPAREN TARGET_SCOPE equalTo DQUOTE targetSearchScopes DQUOTE RPAREN                 # targetscope_rule
  ;

targetAttributes
  : LPAREN TARGET_ATTR (equalTo|notEqualTo) targetedAttributes RPAREN                   # targetattr_rule
  ;

targetControl
  : LPAREN TARGET_CONTROL (equalTo|notEqualTo) objectIdentifiers RPAREN                 # targetcontrol_rule
  ;

targetExtendedOperation
  : LPAREN TARGET_EXTENDED_OPERATION (equalTo|notEqualTo) objectIdentifiers RPAREN          # targetextop_rule
  ;

targetSearchScopes
  : BASE_OBJECT_SCOPE       # base_object_targetscope
  | ONE_LEVEL_TARGET_SCOPE      # one_level_targetscope
  | SUB_TREE_TARGET_SCOPE   # sub_tree_targetscope
  | SUBORDINATE_TARGET_SCOPE    # subordinate_targetscope
  ;

objectIdentifiers
  : DQUOTE ( objectIdentifier ( oRDelimiter objectIdentifier )* ) DQUOTE                        # quoted_object_identifier_list
  | ( DQUOTE objectIdentifier DQUOTE ( oRDelimiter (DQUOTE objectIdentifier DQUOTE) )* )        # list_of_quoted_object_identifiers
  ;

targetedAttributes
  : DQUOTE ( attributeTypeOrValue ( oRDelimiter attributeTypeOrValue )* ) DQUOTE                # quoted_targeted_attributes_list
  | ( DQUOTE attributeTypeOrValue DQUOTE ( oRDelimiter (DQUOTE attributeTypeOrValue DQUOTE) )* )    # list_of_quoted_attributes
  | DQUOTE STAR DQUOTE                                          # all_attributes
  ;

// e.g.: 2.16.840.1.113730.3.4.18
objectIdentifier
  : ( numberForm ( DOT numberForm )+ )                          # object_identifier
  ;

numberForm
  : INT                                                                 # number_form
  ;

// Note this is behaving a little quirky, not sure I've got it nailed down yet ;/
targetAttrFiltersValue
  : attributeFilters           # attribute_filters_sets
  | attributeFilterSet         # attribute_filters_set
  | attributeFilter            # attribute_filter_single
  ;

attributeFilters
  : attributeFilterSet (COMMA|SEMI) attributeFilterSet     # attribute_filters
  ;

attributeFilterSet
  : attributeFilterOperation ( attributeFilter ( aNDDelimiter attributeFilter )* )? # attribute_filter_set
  ;

attributeFilterOperation
  : ADD_PRIVILEGE equalTo   # add_filter_operation
  | DELETE_PRIVILEGE equalTo    # delete_filter_operation
  ;

attributeFilter
  : attributeTypeOrValue COLON lDAPFilter   # attribute_filter
  ;

// Bind Rule Boolean statements
//
// e.g.:
//  - (timeofday >= "1730" AND timeofday < "2400")
//  - authmethod = "SASL"
//
bindRule
  : bindRuleExpr                                                        # bind_rule
  | bindRuleExprParen ((BOOLEAN_AND|BOOLEAN_OR|BOOLEAN_NOT) bindRuleExprParen)* # parenthetical_bind_rule
  ;

// Parenthetical Bind Rule expressions
bindRuleExprParen
  : LPAREN bindRuleExpr ((BOOLEAN_AND|BOOLEAN_OR|BOOLEAN_NOT) bindRuleExpr)* RPAREN # parenthetical_bind_rule_req_bool_op
  | <assoc=right> BOOLEAN_NOT bindRuleExpr                      # negated_bind_rule_expression
  | LPAREN bindRuleExpr RPAREN                                                  # parenthetical_bind_rule_expression
  | bindRuleExpr                                                                # bind_rule_expression_recursion
  ;

// bindRuleExpr contains a single Bind Rule
//
// e.g.: ssf >= "128"
bindRuleExpr
  : LPAREN bindRuleExpr RPAREN   # rule_is_parenthetical
  | bindUserDN                   # rule_is_userdn
  | bindUserAttr                 # rule_is_userattr
  | bindGroupDN                  # rule_is_groupdn
  | bindGroupAttr                # rule_is_groupattr
  | bindRoleDN                   # rule_is_roledn
  | bindDNS                      # rule_is_dns
  | bindIP                       # rule_is_ip
  | bindTimeOfDay                # rule_is_timeofday
  | bindDayOfWeek                # rule_is_dayofweek
  | bindSecurityStrengthFactor   # rule_is_ssf
  | bindAuthMethod               # rule_is_authmethod
  ;

// 'dayofweek' Bind Rule syntax
//
// e.g.: dayofweek="Mon,Tues,Fri"
bindDayOfWeek
  : LPAREN bindDayOfWeek RPAREN                                         # parenthetical_dayofweek_bind_rule
  | BIND_DAY_OF_WEEK (equalTo|notEqualTo) DQUOTE ( doW ( COMMA doW )* ) DQUOTE  # dayofweek_bind_rule
  ;

doW
  : SUNDAY  # Sun
  | MONDAY  # Mon
  | TUESDAY # Tues
  | WEDNESDAY   # Wed
  | THURSDAY    # Thur
  | FRIDAY  # Fri
  | SATURDAY    # Sat
  ;

// e.g.: authmethod != "none"
bindAuthMethod
  : LPAREN bindAuthMethod RPAREN                                            # parentheticalAuthenticationMethod
  | BIND_AUTH_METHOD (equalTo|notEqualTo) DQUOTE authenticationMethods DQUOTE   # authentication_method
  ;

authenticationMethods
  : ANONYMOUS   # none
  | SIMPLE  # simple
  | SSL     # ssl
  | SASL    # sasl
  ;

// e.g.: userdn="ldap:///uid=someone,ou=People,dc=example,dc=com"
bindUserDN
  : LPAREN bindUserDN RPAREN                                # parenthetical_bind_userdn 
  | BIND_USER_DN (equalTo|notEqualTo) WHSP? (distinguishedNames|DQUOTE lDAPURI DQUOTE)  # bind_userdn
  ;

// e.g.: roledn="ldap:///uid=someone,ou=People,dc=example,dc=com"
bindRoleDN
  : LPAREN bindRoleDN RPAREN                    # parenthetical_bind_roledn
  | BIND_ROLE_DN (equalTo|notEqualTo) distinguishedNames    # bind_roledn
  ;

// e.g.: groupdn="ldap:///cn=X.500 Administrators,ou=Groups,dc=example,dc=com"
bindGroupDN
  : LPAREN bindGroupDN RPAREN                                   # parenthetical_bind_groupdn
  | BIND_GROUP_DN (equalTo|notEqualTo) (distinguishedNames|DQUOTE lDAPURI DQUOTE)   # bind_groupdn
  ;

// e.g.: userattr="owner#USERDN"
bindUserAttr
  : LPAREN bindUserAttr RPAREN                                  # parenthetical_bind_userattr
  | BIND_USER_ATTR (equalTo|notEqualTo) DQUOTE (attributeBindTypeOrValue|inheritance) DQUOTE  # bind_userattr
  ;

// e.g.: groupattr="manager#LDAPURL"
bindGroupAttr
  : LPAREN bindGroupAttr RPAREN                                             # parenthetical_bind_groupattr
  | BIND_GROUP_ATTR (equalTo|notEqualTo) DQUOTE (attributeBindTypeOrValue|inheritance) DQUOTE   # bind_groupattr
  ;

// e.g.: ssf != "0"
bindSecurityStrengthFactor
  : LPAREN bindSecurityStrengthFactor RPAREN                                                # parenthetical_ssf
  | BIND_SSF (equalTo|notEqualTo|greaterThan|greaterThanOrEqual|lessThan|lessThanOrEqual) DQUOTE INT DQUOTE # bind_ssf
  ;

// e.g.: (timeofday >= "1730" AND timeofday < "2400")
bindTimeOfDay
  : LPAREN bindTimeOfDay RPAREN                                             # parenthetical_bind_timeofday
  | BIND_TIME_OF_DAY (equalTo|notEqualTo|greaterThan|greaterThanOrEqual|lessThan|lessThanOrEqual) DQUOTE INT DQUOTE # bind_timeofday
  ;

// 'ip' Bind Rule syntax
//
// e.g.: ip = "192.168.0,12.3.45.*,10.0.0.0/8"
bindIP
  : LPAREN bindIP RPAREN                                # parenthetical_bind_ip
  | BIND_IP (equalTo|notEqualTo) DQUOTE iPAddresses DQUOTE  # bind_ip
  ;

// e.g.: dns = "www.example.com"
bindDNS
  : LPAREN bindDNS RPAREN                                   # parenthetical_bind_dns
  | BIND_DNS (equalTo|notEqualTo) DQUOTE fQDN DQUOTE            # dns_bind_rule
  ;

// e.g.: '192.168.0,12.3.45.*,10.0.0.0/8'
iPAddresses
  : ( iPAddress ( COMMA iPAddress )* )+?                    # ips
  ;

// iPAddress describes any single IPv4 or IPv6 address, and may include
// STAR for octet wildcard statements.
iPAddress
  : iPv4Address                             # ipv4_address
  | iPv6Address                             # ipv6_address
  ;

// iPv4Address describes a single IPv4 address, which may include a
// STAR for octet wildcard statements.
//
// e.g.: '192.168.*'
iPv4Address
  : ( INT ( DOT (INT|STAR)* ) )                     # ipv4
  ;

// iPv6Address describes a single IPv6 address, which may include a
// STAR for octet wildcard statements.
//
// e.g.: '2001:470:dead:beef::'
iPv6Address
  : ( attributeTypeOrValue ( COLON attributeTypeOrValue )+ COLON? ) # ipv6
  ;

// fQDN describes a single fully-qualified domain name, which may 
// include a STAR for label wildcard statements.
//
// e.g.: 'www.example.com' or '*.example.com'
fQDN
  : ( attributeTypeOrValue ( DOT attributeTypeOrValue )+ )      # fqdn
  ;

///////////////////////////////////////////////////////////////////////////////
// Begin LDAP related rules

lDAPURI
  : distinguishedName uRIAttributeList uRISearchScopes uRISearchFilter  # fully_qualified_ldapuri
  | distinguishedName QMARK attributeBindTypeOrValue            # fully_qualified_ldapuri_attr_bindtype_or_value
  ;

uRISearchFilter
  : QMARK lDAPFilter                                                 # uriSearchFilter
  ;

uRISearchScopes
  : QMARK (BASE_OBJECT_SCOPE|ONE_LEVEL_SCOPE|SUB_TREE_SCOPE)?        # uriSearchScopes
  ;

uRIAttributeList
  : QMARK ( attributeTypeOrValue ( COMMA attributeTypeOrValue )* )?  # uriAttributeList
  ;

distinguishedNames
  : DQUOTE ( distinguishedName ( oRDelimiter distinguishedName )* ) DQUOTE                  # quoted_distinguished_name_list
  | ( DQUOTE distinguishedName DQUOTE ( oRDelimiter (DQUOTE distinguishedName DQUOTE) )* )  # list_of_quoted_distinguished_names
  ;

// distinguishedName is a sequence of aVAOrRDN values. Macro
// variable declarations for [$dn], ($dn) and ($attr.<atname>)
// are supported.
//
// e.g.: "ldap:///uid=courtney,ou=People,dc=example,dc=com"
distinguishedName
  : ( LOCAL_LDAP_SCHEME aVAOrRDN ( COMMA (aVAOrRDN|rDNMacros) )* )  # dn
  | LOCAL_LDAP_SCHEME ANYONE                        # anonymous_dn_alias
  | LOCAL_LDAP_SCHEME ALL_USERS                     # any_user_dn_alias
  | LOCAL_LDAP_SCHEME SELF                      # self_dn_alias
  | LOCAL_LDAP_SCHEME PARENT                        # parent_dn_alias
  ;

// rDNMacros contains macro variables for DSA interpolation. 
// e.g.: "ldap:///uid=courtney,($attr.ou),ou=People,dc=example,dc=com"
rDNMacros
  : RDN_MACROS                              # rdn_macro
  ;

// e.g.: "(&(objectClass=employee)(terminated=FALSE))"
lDAPFilter
  : LPAREN lDAPFilterExpr RPAREN        # parenthetical_filter_expression
  | lDAPFilterExpr*                     # filter_expressions
  ;

lDAPFilterExpr
  : (LPAREN (FILTER_AND|FILTER_OR|FILTER_NOT)? lDAPFilterExpr RPAREN)+?    # parenthetical_filter_expression_opt_bool
  | <assoc=right> FILTER_NOT lDAPFilterExpr                            # not_filter_expression
  | aVAOrRDN                                                   # ava_expression
  ;

// This is an absolutely critical parser component.
aVAOrRDN
  : LPAREN attributeTypeOrValue attributeComparisonOperator attributeTypeOrValue RPAREN # parenthetical_ava_or_rdn
  | attributeTypeOrValue attributeComparisonOperator attributeTypeOrValue       # ava_or_rdn
  ;

// e.g.: 'parent[0,1,3].owner#USERDN'
inheritance
  : ( PARENT inheritanceLevels DOT attributeBindTypeOrValue )       # inheritance_expression
  ;

inheritanceLevels
  : LBRAK ( INT ( COMMA INT )* )+? RBRAK    # inheritance_levels
  ;

// e.g.: 'manager#GROUPDN' or 'nickname#squatcobbler'
attributeBindTypeOrValue
  : attributeTypeOrValue HASH (bindTypes|attributeTypeOrValue)      # attr_bind_type_or_value
  ;

bindTypes
  : BINDTYPE_USER_DN    # USERDN
  | BINDTYPE_GROUP_DN   # GROUPDN
  | BINDTYPE_ROLE_DN    # ROLEDN
  | BINDTYPE_SELF_DN    # SELFDN
  | BINDTYPE_LDAP_URL   # LDAPURL
  ;

// See the lexer KEY_OR_VALUE definition for additional notes
// on the topic of this value.
attributeTypeOrValue
  : KEY_OR_VALUE    # key_or_value
  | STAR        # presence_key_or_value
  ;

// attributeComparisonOperator describes one (1) of eight (8)
// possible comparison operators to be used in LDAP AVAs.
//
// Note that gt/lt are not valid operators for LDAP AVAs, and
// thus they are not listed here. Only ge/le are available for
// (numerical) ordering matches.
attributeComparisonOperator
  : equalTo     # equal_to
  | greaterThanOrEqual  # greater_than_or_equal
  | lessThanOrEqual # less_than_or_equal
  | approximate     # approx
  | extensibleRule  # extensible_rule
  | extensibleRuleDN    # extensible_rule_with_dn
  | extensibleRuleAttr  # extensible_rule_with_attrs
  | extensibleRuleDNOID # extensible_rule_with_dn_oid
  ;

equalTo             : EQ;
notEqualTo          : NE;
greaterThan         : GT;
lessThan            : LT;
greaterThanOrEqual      : GE;
lessThanOrEqual         : LE;
approximate         : APX;
extensibleRule          : COLON;
extensibleRuleDNOID     : EXO;
extensibleRuleDN        : EXD;
extensibleRuleAttr      : EXA;
oRDelimiter         : SYMBOLIC_OR;
aNDDelimiter            : SYMBOLIC_AND;

词法分析语法

// ACIv3 Lexer Grammar - work in progress

lexer grammar ACILexer;

WHSP: ' '+?;
QMARK: '?';
DQUOTE: '"';
LBRAK: '[';
LPAREN: '(';
RBRAK: ']';
RPAREN: ')';
DOT: '.';
COLON: ':';
TILDE: '~';
EQ: '=';
NE: BANG EQ;
GT: '>';
LT: '<';
APX: TILDE EQ;
GE: GT EQ;
LE: LT EQ;
EXA: COLON EQ;
EXO: COLON 'dn' COLON;
EXD: COLON 'dn' COLON EQ;
HASH: '#';

// Symbolic ANDs (&&) are used as delimiter literals within
// ANDed attributeFilterSet instances.
SYMBOLIC_AND: AMPERSAND AMPERSAND;

fragment AMPERSAND: '&';

// Symbolic ORs (||) are used as delimiter literals within
// ORed lists of attributeTypes, objectIdentifiers and 
// distinguishedNames.
SYMBOLIC_OR: PIPE PIPE;

fragment PIPE: '|';
fragment BANG: '!';

FILTER_AND: AMPERSAND;
FILTER_OR: PIPE;
FILTER_NOT: BANG;
FILTER_OPERATOR
  : FILTER_AND
  | FILTER_OR
  | FILTER_NOT
  ;

COMMA: ',';
SEMI: ';';
STAR: '*';

LOCAL_LDAP_SCHEME: 'ldap:///';
PARENT
  : [Pp][Aa][Rr][Ee][Nn][Tt]
  ;
ANYONE
  : [Aa][Nn][Yy][Oo][Nn][Ee]
  ;

ALL_USERS
  : [Aa][Ll][Ll]
  ;

SELF
  : [Ss][Ee][Ll][Ff]
  ;

ANCHOR
  : 'version 3.0; acl '
  ;

//////////////////////////////////////
// Day Of Week components

// Sunday is day one (1) and is used within 'dayofweek' Bind Rules.
SUNDAY
  : [Ss][Uu][Nn]
  ;

// Monday is day two (2) and is used within 'dayofweek' Bind Rules.
MONDAY
  : [Mm][Oo][Nn]
  ;

// Tuesday is day three (3) and is used within 'dayofweek' Bind Rules.
TUESDAY
  : [Tt][Uu][Ee][Ss]
  ;

// Wednesday is day four (4) and is used within 'dayofweek' Bind Rules
WEDNESDAY
  : [Ww][Ee][Dd]
  ;

// Thursday is day five (5) and is used within 'dayofweek' Bind Rules.
THURSDAY
  : [Tt][Hh][Uu][Rr]
  ;

// Friday is day six (6) and is used within 'dayofweek' Bind Rules.
FRIDAY
  : [Ff][Rr][Ii]
  ;

// Saturday is day seven (7) and is used within 'dayofweek' Bind Rules.
SATURDAY
  : [Ss][Aa][Tt]
  ;

//////////////////////////////////////
// Authentication Method string literals

// 'none' describes an ANONYMOUS LDAP bind.
ANONYMOUS
  : [Nn][Oo][Nn][Ee]
  ;

// 'simple' describes an authenticated LDAP bind
// using weak authentication (DN + clear-text).
SIMPLE
  : [Ss][Ii][Mm][Pp][Ll][Ee]
  ;

// 'ssl' describes an authenticated LDAP bind 
// using weak authentication (DN + clear-text)
// using TLS confidentiality.
SSL
  : [Ss][Ss][Ll]
  ;

// 'sasl' describes an authenticated LDAP bind
// using strong authentication (TLS mutual auth,
// Kerberos, et al) and (almost certainly) using
// TLS confidentiality.
SASL
  : [Ss][Aa][Ss][Ll]
  ;

//////////////////////////////////////
// Target Rule keywords

// 'target' Target Rule keyword
TARGET
  : [Tt][Aa][Rr][Gg][Ee][Tt]
  ;

// 'target_to' Target Rule keyword
TARGET_TO
  : [Tt][Aa][Rr][Gg][Ee][Tt] '_' [Tt][Oo]
  ;

// 'target_from' Target Rule keyword
TARGET_FROM
  : [Tt][Aa][Rr][Gg][Ee][Tt] '_' [Ff][Rr][Oo][Mm]
  ;

// 'targetscope' Target Rule keyword
TARGET_SCOPE
  : [Tt][Aa][Rr][Gg][Ee][Tt][Ss][Cc][Oo][Pp][Ee]
  ;

// 'targetattr' Target Rule keyword
TARGET_ATTR
  : [Tt][Aa][Rr][Gg][Ee][Tt][Aa][Tt][Tt][Rr]
  ;

// 'targetfilter' Target Rule keyword
TARGET_FILTER
  : [Tt][Aa][Rr][Gg][Ee][Tt][Ff][Ii][Ll][Tt][Ee][Rr]
  ;

// 'targattrfilters' Target Rule keyword
// NOTE: Yes, it (targ) is the correct Target keyword
// prefix, unlike most others.
TARGET_ATTR_FILTERS
  : [Tt][Aa][Rr][Gg][Aa][Tt][Tt][Rr][Ff][Ii][Ll][Tt][Ee][Rr][Ss]
  ;

// 'targetcontrol' Target Rule keyword
TARGET_CONTROL
  : [Tt][Aa][Rr][Gg][Ee][Tt][Cc][Oo][Nn][Tt][Rr][Oo][Ll]
  ;

// 'extop' keyword
TARGET_EXTENDED_OPERATION
  : [Ee][Xx][Tt][Oo][Pp]
  ;

//////////////////////////////////////
// Bind Rule keywords

// 'userdn' Bind Rule keyword
BIND_USER_DN
  : 'userdn'
  ;

// 'groupdn' Bind Rule keyword
BIND_GROUP_DN
  : 'groupdn'
  ;

// 'roledn' Bind Rule keyword
BIND_ROLE_DN
  : 'roledn'
  ;

// 'userattr' Bind Rule keyword
BIND_USER_ATTR
  : 'userattr'
  ;

// 'groupattr' Bind Rule keyword
BIND_GROUP_ATTR
  : 'groupattr'
  ;

// 'ssf' Bind Rule keyword
BIND_SSF
  : 'ssf'
  ;

// 'dns' Bind Rule keyword
BIND_DNS
  : 'dns'
  ;

// 'ip' Bind Rule keyword
BIND_IP
  : 'ip'
  ;

// 'authmethod' Bind Rule keyword
BIND_AUTH_METHOD
  : 'authmethod'
  ;

// 'timeofday' Bind Rule keyword
BIND_TIME_OF_DAY
  : 'timeofday'
  ;

// 'dayofweek' Bind Rule keyword
BIND_DAY_OF_WEEK
  : 'dayofweek'
  ;

//////////////////////////////////////
// Bind Type keywords

// USERDN string literal is used within 'userattr' and 'groupattr'
// Bind Rule statements.
BINDTYPE_USER_DN
  : 'USERDN'
  ;

// GROUPDN string literal is used within 'userattr' and 'groupattr'
// Bind Rule statements.
BINDTYPE_GROUP_DN
  : 'GROUPDN'
  ;

// ROLEDN string literal is used within 'userattr' and 'groupattr'
// Bind Rule statements.
BINDTYPE_ROLE_DN
  : 'ROLEDN'
  ;

// SELFDN string literal is used within 'userattr' and 'groupattr'
// Bind Rule statements.
BINDTYPE_SELF_DN
  : 'SELFDN'
  ;

// LDAPURL string literal is used within 'userattr' and 'groupattr'
// Bind Rule statements.
BINDTYPE_LDAP_URL
  : 'LDAPURL'
  ;

//////////////////////////////////////
// LDAP and Target Rule Search Scopes

// BASE is the same for 'targetscope' Target Rules as for lDAPURI
// search parameters and is used the same in either scenario.
BASE_OBJECT_SCOPE
  : [Bb][Aa][Ss][Ee]
  ;

// This is used exclusively within LDAP Search Parameter statements,
// such as those that appear within an lDAPURI. This is not used
// within 'targetscope' Target Rules.
ONE_LEVEL_SCOPE
  : [Oo][Nn][Ee]
  ;

// This is used exclusively within 'targetscope' Target
// Rules and NOT lDAPURI instances.
ONE_LEVEL_TARGET_SCOPE
  : [Oo][Nn][Ee][Ll][Ee][Vv][Ee][Ll]
  ;

// This is used exclusively within LDAP Search Parameter statements,
// such as those that appear within an lDAPURI. This is not used
// within 'targetscope' Target Rules.
SUB_TREE_SCOPE
  : [Ss][Uu][Bb]
  ;

// This is used exclusively within 'targetscope' Target
// Rules and NOT lDAPURI instances.
SUB_TREE_TARGET_SCOPE
  : [Ss][Uu][Bb][Tt][Rr][Ee][Ee]
  ;

// This is used exclusively within 'targetscope' Target
// Rules and NOT lDAPURI instances.
SUBORDINATE_TARGET_SCOPE
  : [Ss][Uu][Bb][Oo][Rr][Dd][Ii][Nn][Aa][Tt][Ee]
  ;

//////////////////////////////////////
// Permission and Access Rights components

// The disposition of a permission is to grant some level(s)
// of access to the directory.
ALLOW_ACCESS
  : WHSP? [Aa][Ll][Ll][Oo][Ww] WHSP?
  ;

// The disposition of a permission is to deny some level(s)
// of access to the directory.
DENY_ACCESS
  : [Dd][Ee][Nn][Yy]
  ;

// Grant or withhold LDAP search access to the DSA.
SEARCH_PRIVILEGE
  : [Ss][Ee][Aa][Rr][Cc][Hh]
  ;

// Grant or withhold LDAP read access to the DSA.
READ_PRIVILEGE
  : [Rr][Ee][Aa][Dd]
  ;

// Grant or withhold LDAP compare access to the DSA.
COMPARE_PRIVILEGE
  : [Cc][Oo][Mm][Pp][Aa][Rr][Ee]
  ;

// Grant or withhold LDAP entry-creation access to the DSA.
ADD_PRIVILEGE
  : [Aa][Dd][Dd]
  ;

// Grant or withhold LDAP entry-deletion access to the DSA.
DELETE_PRIVILEGE
  : [Dd][Ee][Ll][Ee][Tt][Ee]
  ;

// Grant or withhold LDAP modifications to ones own entry within the DSA.
SELFWRITE_PRIVILEGE
  : [Ss][Ee][Ll][Ff][Ww][Rr][Ii][Tt][Ee]
  ;

// Grant or withhold LDAP remote proxy capabilities within the DSA.
PROXY_PRIVILEGE
  : [Pp][Rr][Oo][Xx][Yy]
  ;

// Grant or withhold LDAP DIT import capabilities within the DSA.
IMPORT_PRIVILEGE
  : [Ii][Mm][Pp][Oo][Rr][Tt]
  ;

// Grant or withhold LDAP DIT export capabilities within the DSA.
EXPORT_PRIVILEGE
  : [Ee][Xx][Pp][Oo][Rr][Tt]
  ;

// Grant or withhold all privileges within the DSA **EXCEPT** for
// proxy privileges.
ALL_PRIVILEGES
  : [Aa][Ll][Ll]
  ;

RDN_MACROS
  : '[$dn]'
  | '($dn)'
  | '($attr' DOT KEY_OR_VALUE ')'
  ;

BOOLEAN_AND
  : [Aa][Nn][Dd]
  ;

BOOLEAN_OR
  : [Oo][Rr]
  ;

BOOLEAN_NOT
  : [Aa][Nn][Dd] ' ' [Nn][Oo][Tt]
  ;

// Whitespace characters are dumped from here on out. I
// know this is supposed to be at the bottom of the lexer
// file (or so I read somewhere), but all hell breaks loose
// when it is :(
WHITESPACE
  : [ \t\r\n\u000C]+ -> skip
  ;

INT
  : [0-9]+
  ;

// KEY_OR_VALUE can more or less be anything, but will be
// verified in the Go visitor.
//
// I REALLY wish I could split this into two (2) lexers that
// WON'T collide, e.g.:
//
// - KEY:   [a-z][a-zA-z0-9\-]* [a-z]*
//
//    ... and ...
//
// - VALUE: ~["\\,.:=!?[\]()#|&<>~\t\r\n]+
//
// ... but I've given up on that for the moment. Every attempt
// to do so wreaks havoc within this otherwise functional setup.
// 
// The (negated!) characters below are specified due to their
// special nature elsewhere in this implementation, i.e.: '&'
// in Boolean lists, and (probably?) shouldn't appear in values
// such as the 'acl' (ACI label), though I'm not 100% certain.
//
// To be honest, I'm quite sure this is NOT an ideal solution
// (likely will barf on certain otherwise harmless characters
// in a value), but it DOES seem to work for the moment ...
KEY_OR_VALUE
  : ~["\\,.:=!?[\]()#|&<>~\t\r\n]+
  ;
解析 空格 antlr4 语法

评论


答:

0赞 Mike Cargal 7/14/2023 #1

从对 ACI 规范的相当粗略的检查(这看起来当然像你试图解析的内容;希望我是对的)

  • 似乎任何具有明显空格的地方都嵌入在关闭中(即它们在字符串中)。"
  • 此外,字符串中的某些内容似乎将空格视为重要 (),而其他上下文可能具有字符串内容,其中空格在解析内容时并不重要。distinguishedName

鉴于这些与上下文相关,您将无法处理词法分析器中的差异(它不维护此类上下文)。

我相信这给你留下了 2 个选择:

1 - 使空格成为正常值(即不,不发送到频道),并继续覆盖所有可能允许空格的地方。
2 - 在语法中将这些“字符串”视为 ACTUAL s,并在后续阶段解析它们的内容。
skipHIDDENSTRING

正如你所注意到的,试图将可选的白步纳入你的解析器规则是“通往疯狂的道路”。它不仅会弄乱你的解析器规则,而且在某些情况下,可以包含空格的值将作为多个标记进入你的解析器(并由生成的解析树中的多个节点组成)。

我会通过将字符串标记化为标记来显着简化主要语法。然后,一旦你有了解析树,并且知道你应该使用哪种类型的字符串(取决于它的上下文),就使用特定于其内容的语法(例如:根据 RFC 2253 的可分辨名称)解析字符串的内容。(注意:可以肯定的是,这将消除对 Lexer 规则的需求,正如所写的那样,这可能会成为大量痛苦的根源)STRINGKEY_OR_VALUE

一个相当简单的例子,说明它如何开始简化事情:

bindDayOfWeek
    : LPAREN bindDayOfWeek RPAREN # parenthetical_dayofweek_bind_rule
    | BIND_DAY_OF_WEEK (equalTo | notEqualTo) DQUOTE (
        doW ( COMMA doW)*
    ) DQUOTE # dayofweek_bind_rule
    ;

doW
    : SUNDAY    # Sun
    | MONDAY    # Mon
    | TUESDAY   # Tues
    | WEDNESDAY # Wed
    | THURSDAY  # Thur
    | FRIDAY    # Fri
    | SATURDAY  # Sat
    ;

通过将 bindDayOfWeek 规则更改为:

bindDayOfWeek
    : LPAREN bindDayOfWeek RPAREN # parenthetical_dayofweek_bind_rule
    | BIND_DAY_OF_WEEK (equalTo | notEqualTo) STRING # dayofweek_bind_rule
    ;

您将消除对规则以及 、 FRIDAYSATURDAY 的 Lexer 规则的需求。你只需要一个字符串,你要验证这个字符串在解析树中有一个逗号分隔的这些值的列表。在这种情况下,甚至可能没有理由为此字符串内容提供完整的语法,并且在语义处理中很容易验证。(奖金...通过尝试将所有内容放入语法本身,您将能够提供比从 ANTLR 获得的更好的错误消息)doWSUNDAYMONDAYTUESDAYWEDNESDAYTHURSDAY, , and

实际上,您正在解析文档的顶层,但说每个字符串都是单独的内容,也就是说,只有解析规则,您可以通过它在解析树中的位置来识别这些规则。


我在查看您的语法时注意到的一些事情:

attributeFilter
  : attributeTypeOrValue COLON lDAPFilter   # attribute_filter
  ;

该语法仅对标记规则备选方案具有任何价值。对于只有一个备选方案(规则中没有)的规则,只会增加不必要的复杂性。# someID|

此外,您的词法分析器定义了很多关键字。如果这些值在所有上下文中都是关键字,这很好,但您可能会遇到该“关键字”在另一个上下文中有效但不是关键字的情况。(似乎其中许多 my 只发生在字符串中,所以这将从上面建议的重构中消失)

评论

0赞 84292048274956 7/16/2023
谢谢。我倾向于建议 #2,尽管我对此感到难过,因为我想扩展的功能之一是能够反编译某些值,即:DN 和(更重要的是)搜索过滤器。在我的测试中,ANTLR 似乎在这一点上做得非常出色。虽然我意识到我可以在包代码中手动“之后”执行 ANTLR 处理阶段,但出于几个不同的原因,我希望避免这种情况。无论如何,我衷心感谢您的见解:)
0赞 Mike Cargal 7/16/2023
如果解析字符串内容非常复杂(可能类似于可分辨名称语法),您可能仍然会发现 ANTLR 适用于它,并且这些语法比尝试将它们放在一起更易于管理。