提问人:AshRoss 提问时间:11/6/2023 最后编辑:ddaAshRoss 更新时间:11/7/2023 访问量:83
如何在 Python 中使用正则表达式匹配整个字符串?
How to match an entire string using Regexes in Python?
问:
我正在尝试在 Python 中构建一个正则表达式模式,该模式将匹配如下字符串:
“机动车盗窃 - 盛大(950.01 美元及以上)”、“车辆 - 被盗”、“交通设施(机场)”、“5600 N FIGUEROA”和“400 WORLD WY”街。
import re
hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400 WORLD WY", "33.9433", "-118.4072" ] ,
[ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N FIGUEROA ST", "34.114", "-118.1949" ]]}
crime = []
for items in hello["reza"]:
for item in items:
pattern = re.compile(r'[A-Z].*')
crime = re.findall(pattern,str(item))
print(crime)
答:
1赞
Tranbi
11/6/2023
#1
代码中最明显的问题是,在嵌套循环的每次迭代中都会覆盖。因此,您将打印上次调用的结果。由于返回一个列表(包含所有匹配项),因此最终会得到一个空列表(因为上一项中没有匹配项)。crime
findall
findall
str(item)
此外,您没有描述您希望如何筛选结果。您的模式将匹配以大写字母开头的字符串,但它显然会排除 .[A-Z].*
5600 N FIGUEROA
这里有一个建议,检查字符串是否至少有三个大写字母,而不是以数字开头,紧随其后(也用一个空格替换多个空格):-
import re
hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400 WORLD WY", "33.9433", "-118.4072" ] ,
[ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N FIGUEROA ST", "34.114", "-118.1949" ]]}
crime = []
pattern = re.compile(r'(?!\d+-).*[A-Z]{3,}')
for items in hello["reza"]:
for item in items:
if isinstance(item, str) and re.match(pattern, item):
crime.append(re.sub(r'\s+', ' ', item))
print(crime)
输出:
['THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)', 'TRANSPORTATION FACILITY (AIRPORT)', '400 WORLD WY', 'VEHICLE - STOLEN', 'PARKING LOT', '5600 N FIGUEROA ST']
评论