提问人:Tyler D 提问时间:11/20/2019 最后编辑:Michał TurczynTyler D 更新时间:11/20/2019 访问量:47
正则表达式:匹配任意单词前面任意数量的段落
Regex: Match arbitrary number of parantheses preceeding arbitrary word
问:
我有一堆具有以下形式的字符串,其中表示任意单词X
This is a string ((X.address)) test
This is a string ((X address)) test
This is a string (X address) test
This is a string (X.address) test
我想删除字符串的所有内容一次或已找到(包括前面的段落),让步X.address
X address
This is a string
This is a string
This is a string
This is a string
这是我的出发点:
regex = r"\(X.address"
s = "This is a string ((X.address)) test"
re.split(regex, s)[0]
>> 'This is a string ('
它有效,但我需要概括它,以便它搜索一个任意单词而不是,并且它考虑了单词前面的 1 个或多个段落。X
答:
2赞
Michał Turczyn
11/20/2019
#1
你可以.+(?=\s\(+X(?:\.|\s)address)
解释:
.+
- 匹配一个或多个字符
(?=...)
- 积极的展望
\s
-空白
\(+
- 多匹配一个矿石(
X
- 从字面上匹配X
(?:...)
- 非捕获组
\.|\s
- 匹配点或空格.
address
- 从字面上匹配address
评论
0赞
D.A.
11/20/2019
将 X 替换为 .+ 或 [a-zA-Z]+,这就是正确答案
0赞
ArunJose
11/20/2019
#2
用
regex = r"(This is a string)\s+\(+.+\)"
s = "This is a string ((X.address)) test"
re.split(regex, s)[1]
2赞
Wiktor Stribiżew
11/20/2019
#3
您可以使用
re.sub(r'\s*\(+[^()]*\baddress.*', '', s, flags=re.S)
详
\s*
- 0+ 空格\(+
- 1+ 字符(
[^()]*
- 除 和 以外的任何 0+ 字符(
)
\b
- 单词边界(不能以另一个字母、数字或下划线开头)address
address
- 一句话.*
- 字符串末尾的任何 0+ 个字符。
请参阅 Python 演示:
import re
strs = [ 'This is a string ((X.address)) test', 'This is a string ((X address)) test', 'This is a string (X address) test', 'This is a string (X.address) test', 'This is a string ((X and Y and Z address)) test' ]
for s in strs:
print(s, '=>', re.sub(r'\s*\(+[^()]*\baddress.*', '', s, flags=re.S))
输出:
This is a string ((X.address)) test => This is a string
This is a string ((X address)) test => This is a string
This is a string (X address) test => This is a string
This is a string (X.address) test => This is a string
This is a string ((X and Y and Z address)) test => This is a string
评论
0赞
Tyler D
11/20/2019
谢谢!是否可以概括它,以便它寻找除 ?因此,例如,或 .我尝试用 代替,但没有用address
address
house
address
(address|house)
1赞
Wiktor Stribiżew
11/20/2019
@TylerD 这对我来说似乎很有效。不过,我建议使用非捕获组。也许你也想检查单词的结尾:(?:address|house)
\b(?:address|house)\b
评论
(+\w+\W+address.*', '', s)
address
This is a string ((X and Y and Z address)) test
\w+\W+
[^()]*\b