正则表达式:匹配任意单词前面任意数量的段落

Regex: Match arbitrary number of parantheses preceeding arbitrary word

提问人:Tyler D 提问时间:11/20/2019 最后编辑:Michał TurczynTyler D 更新时间:11/20/2019 访问量:47

问:

我有一堆具有以下形式的字符串,其中表示任意单词X

This is a string ((X.address)) test
This is a string ((X address)) test
This is a string (X address) test
This is a string (X.address) test

我想删除字符串的所有内容一次或已找到(包括前面的段落),让步X.addressX address

This is a string
This is a string
This is a string
This is a string

这是我的出发点:

regex = r"\(X.address"
s = "This is a string ((X.address)) test"
re.split(regex, s)[0]

>> 'This is a string ('

它有效,但我需要概括它,以便它搜索一个任意单词而不是,并且它考虑了单词前面的 1 个或多个段落。X

Python 正则表达式

评论

2赞 Wiktor Stribiżew 11/20/2019
使用 re.sub(r'\s*\(+\w+\W+address.*', '', s)
0赞 Tyler D 11/20/2019
@WiktorStribiżew 是否有可能将其概括为前面的 1 个或多个单词?例如,addressThis is a string ((X and Y and Z address)) test
1赞 Wiktor Stribiżew 11/20/2019
是,请替换为\w+\W+[^()]*\b

答:

2赞 Michał Turczyn 11/20/2019 #1

你可以.+(?=\s\(+X(?:\.|\s)address)

解释:

.+- 匹配一个或多个字符

(?=...)- 积极的展望

\s-空白

\(+- 多匹配一个矿石(

X- 从字面上匹配X

(?:...)- 非捕获组

\.|\s- 匹配点或空格.

address- 从字面上匹配address

演示

评论

0赞 D.A. 11/20/2019
将 X 替换为 .+ 或 [a-zA-Z]+,这就是正确答案
0赞 ArunJose 11/20/2019 #2

regex = r"(This is a string)\s+\(+.+\)"
s = "This is a string ((X.address)) test"
re.split(regex, s)[1]
2赞 Wiktor Stribiżew 11/20/2019 #3

您可以使用

re.sub(r'\s*\(+[^()]*\baddress.*', '', s, flags=re.S)

  • \s*- 0+ 空格
  • \(+- 1+ 字符(
  • [^()]*- 除 和 以外的任何 0+ 字符()
  • \b- 单词边界(不能以另一个字母、数字或下划线开头)address
  • address- 一句话
  • .*- 字符串末尾的任何 0+ 个字符。

请参阅 Python 演示

import re
strs = [ 'This is a string ((X.address)) test', 'This is a string ((X address)) test', 'This is a string (X address) test', 'This is a string (X.address) test', 'This is a string ((X and Y and Z address)) test' ]
for s in strs:
    print(s, '=>', re.sub(r'\s*\(+[^()]*\baddress.*', '', s, flags=re.S))

输出:

This is a string ((X.address)) test => This is a string
This is a string ((X address)) test => This is a string
This is a string (X address) test => This is a string
This is a string (X.address) test => This is a string
This is a string ((X and Y and Z address)) test => This is a string

评论

0赞 Tyler D 11/20/2019
谢谢!是否可以概括它,以便它寻找除 ?因此,例如, .我尝试用 代替,但没有用addressaddresshouseaddress(address|house)
1赞 Wiktor Stribiżew 11/20/2019
@TylerD 这对我来说似乎很有效。不过,我建议使用非捕获组。也许你也想检查单词的结尾:(?:address|house)\b(?:address|house)\b