提问人:Tyler Crompton 提问时间:12/2/2015 最后编辑:CinCoutTyler Crompton 更新时间:12/18/2021 访问量:412
如何在常规语法的模型中实现通配符、字符类、否定字符类等?
How does one implement wildcards, character classes, negated character classes, etc. in a model for a regular grammar?
问:
TL;博士:
如何对语法的产生进行计算建模,使同一左手边存在无限数量的乘积?
我正在做一个关于形式语言理论的项目,并试图编写一个用于构建常规语法对象的类,这些对象可以传递给有限状态机。我幼稚的尝试是创建一个 API,用于为每个允许的输入添加一个产品。我的尝试的精简版本如下(基于形式语法的形式定义):G = (N, Σ, P, S)
class ContextFreeGrammar:
def __init__(self, variables, alphabet, production_rules, start_variable):
self.variables = variables
self.alphabet = alphabet
self.production_rules = production_rules
self.start_variable = start_variable
def __repr__(self):
return '{}({}, {}, {}, {})'.format(
self.__class__.__name__,
self.variables,
self.alphabet,
self.production_rules,
self.start_variable
)
class RegularGrammar(ContextFreeGrammar):
_regular_expression_grammar = None # TODO
@classmethod
def from_regular_expression(cls, regular_expression):
raise NotImplementedError()
我还没有真正写有限状态自动机或下推自动机。
正则表达式的语法与上下文无关,因此我在下面的 WSN 中包含了我的定义:
syntax = expression .
expression = term "|" expression .
expression = term .
term = factor repetition term .
term = factor term .
term = .
repetition = "*" .
repetition = "+" .
repetition = "?" .
repetition = "{" nonnegative_integer "," nonnegative_integer "}" .
repetition = "{" nonnegative_integer ",}" .
repetition = "{," nonnegative_integer "}" .
nonnegative_integer = nonzero_arabic_numeral arabic_numerals .
nonnegative_integer = arabic_numeral .
nonzero_arabic_numeral = "1" .
nonzero_arabic_numeral = "2" .
nonzero_arabic_numeral = "3" .
nonzero_arabic_numeral = "4" .
nonzero_arabic_numeral = "5" .
nonzero_arabic_numeral = "6" .
nonzero_arabic_numeral = "7" .
nonzero_arabic_numeral = "8" .
nonzero_arabic_numeral = "9" .
arabic_numeral = nonzero_arabic_numeral .
arabic_numeral = "0" .
arabic_numerals = arabic_numeral .
arabic_numerals = arabic_numeral arabic_numerals .
factor = "(" expression ")" .
factor = character_class .
factor = character .
escaped_character = "\\." .
escaped_character = "\\(" .
escaped_character = "\\)" .
escaped_character = "\\+" .
escaped_character = "\\*" .
escaped_character = "\\?" .
escaped_character = "\\[" .
escaped_character = "\\]" .
escaped_character = "\\\\" .
escaped_character = "\\{" .
escaped_character = "\\}" .
escaped_character = "\\|" .
character -> TODO ;
character_class = TODO .
人们可以很容易地看出,我明确地将交替分成单独的作品。我这样做是为了便于实施。但是我被困在我应该如何去做角色课程之类的事情上。我想成为一张地图,从每个左手边到一组对应的右手边。但现在看来,这不可行。production_rules
答:
0赞
Enteleform
3/14/2016
#1
我不完全理解你的问题,但从评论来看,你似乎正在尝试在预定义的字符集中工作,该字符集不包括杂项Unicode和ASCII字符。
这是我最近实现的用于处理类似约束的方法:
[正则表达式]字符组
下面是实现上述定义的示例:
global rx_Trim_FromAlphaNumeric
rx_Trim_FromAlphaNumeric = \
"[" + rx_AlphaNumeric + "]+" + \
"[" + rx_ValidCharacters_WithLineSpace + "]*"
global rx_StartsWithSymbol
rx_StartsWithSymbol = \
"[^" + rx_AlphaNumeric + "]" + \
"[" + rx_Symbols + "]+" + \
"[" + rx_LineSpace + rx_Symbols + "]*" + \
"[" + rx_AlphaNumeric + "]+" + \
"[" + rx_ValidCharacters_WithLineSpace + "]*"
global rx_StartsWithLetter
rx_StartsWithLetter = \
"^[" + rx_Alphabetic + "]+" + \
"[" + rx_ValidCharacters_WithLineSpace + "]+"
global rx_StartsWithNumber
rx_StartsWithNumber = \
"^[" + rx_Numeric + "]+" + \
"[" + rx_ValidCharacters_WithLineSpace + "]+"
global rx_WordSegments
rx_WordSegments = \
"([" + rx_Symbols + "]+|" + \
"[" + rx_Numeric + "]+|" + \
"[" + rx_Alphabetic + "]+|" + \
"[" + rx_LineSpace + "]+)"
注意:我更喜欢转义所有符号,因为某些字符(例如 ^
)具有上下文转义要求。如果它们总是被转义,则遇到问题的可能性较小。
评论
ab*
a
b
ab
character
.
set