如何在 Python 中使用正则表达式匹配整个字符串?

How to match an entire string using Regexes in Python?

提问人:AshRoss 提问时间:11/6/2023 最后编辑:ddaAshRoss 更新时间:11/7/2023 访问量:83

问:

我正在尝试在 Python 中构建一个正则表达式模式,该模式将匹配如下字符串:

“机动车盗窃 - 盛大(950.01 美元及以上)”、“车辆 - 被盗”、“交通设施(机场)”、“5600 N FIGUEROA”和“400 WORLD WY”街。

import re

hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400    WORLD                        WY", "33.9433", "-118.4072" ] ,
        [ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N  FIGUEROA                     ST", "34.114", "-118.1949" ]]}
crime = []
for items in hello["reza"]:
    for item in items:
        pattern = re.compile(r'[A-Z].*')
        crime = re.findall(pattern,str(item))

print(crime)
python-3.x 正则表达式 列表 python-re

评论

0赞 Community 11/6/2023
请澄清您的具体问题或提供其他详细信息以准确说明您的需求。正如目前所写的那样,很难确切地说出你在问什么。

答:

1赞 Tranbi 11/6/2023 #1

代码中最明显的问题是,在嵌套循环的每次迭代中都会覆盖。因此,您将打印上次调用的结果。由于返回一个列表(包含所有匹配项),因此最终会得到一个空列表(因为上一项中没有匹配项)。crimefindallfindallstr(item)

此外,您没有描述您希望如何筛选结果。您的模式将匹配以大写字母开头的字符串,但它显然会排除 .[A-Z].*5600 N FIGUEROA

这里有一个建议,检查字符串是否至少有三个大写字母,而不是以数字开头,紧随其后(也用一个空格替换多个空格):-

import re

hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400    WORLD                        WY", "33.9433", "-118.4072" ] ,
        [ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N  FIGUEROA                     ST", "34.114", "-118.1949" ]]}
crime = []
pattern = re.compile(r'(?!\d+-).*[A-Z]{3,}')
for items in hello["reza"]:
    for item in items:
        if isinstance(item, str) and re.match(pattern, item):
            crime.append(re.sub(r'\s+', ' ', item))

print(crime)

输出:

['THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)', 'TRANSPORTATION FACILITY (AIRPORT)', '400 WORLD WY', 'VEHICLE - STOLEN', 'PARKING LOT', '5600 N FIGUEROA ST']