我想从这个文本文件中提取所有JSON对象并创建一个字典。正如你所看到的,在我的文本中,有嵌套对象作为键值

I want to extract all JSON objects from this text file and create a dictionary. As you can see, in my text there are nested objects as a key value

提问人:Mukhammadsodik Khabibulloev 提问时间:9/27/2023 最后编辑:Wiktor StribiżewMukhammadsodik Khabibulloev 更新时间:9/27/2023 访问量:60

问:

text = Autotune exists! Hoorah! You can use microbolus-related features. {"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z",

"iobWithZeroTemp":
{"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z"},
"lastBolusTime":1671999216000,

"lastTemp":
{"rate":0,
"timestamp":"2022-12-25T23:56:14+03:00",
"started_at":"2022-12-25T20:56:14.000Z",
"date":1672001774000,
"duration":22.52}}
# Regular expression pattern to match nested JSON objects
pattern = r'(?<=\{)\s*[^{]*?(?=[\},])'


matches = re.findall(pattern, text)


parsed_objects = [json.loads(match) for match in matches]


for obj in parsed_objects:
    print(obj)

JSONDecodeError:额外数据:第 1 行第 6 列(字符 5)

python json 解析 文本

评论

2赞 Tim Roberts 9/27/2023
这是无望的。您可以轻松地在首字母之前去除任何文本,但扫描随机文本以查找 JSON 并非易事。{
0赞 Barmar 9/27/2023
您的 JSON 具有嵌套对象。正则表达式仅匹配没有嵌套的对象。
0赞 Barmar 9/27/2023
使用环视来匹配 and 意味着你只会得到对象的中间。但这本身并不是有效的 JSON。{}
0赞 Mukhammadsodik Khabibulloev 9/27/2023
@Barmar你能帮忙解决模式问题吗?
0赞 Barmar 9/27/2023
不,这不是正则表达式的适当用法,原因@TimRoberts解释。

答:

1赞 Andrej Kesely 9/27/2023 #1

以下是使用 JSONDecoder.raw_decode() 从文本中获取所有有效 JSON 字典的尝试:

text = """\
text = Autotune exists! Hoorah! You can use microbolus-related features. {"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z",

"iobWithZeroTemp":
{"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z"},
"lastBolusTime":1671999216000,

"lastTemp":
{"rate":0,
"timestamp":"2022-12-25T23:56:14+03:00",
"started_at":"2022-12-25T20:56:14.000Z",
"date":1672001774000,
"duration":22.52}}

This is some other text with { not valid JSON }

{"another valid JSON object": [1, 2, 3]}
"""

import json

decoder = json.JSONDecoder()

decoded_objs, idx = [], 0
while True:
    try:
        idx = text.index("{", idx)
    except ValueError:
        break

    while True:
        try:
            obj, new_idx = decoder.raw_decode(text[idx:])
            decoded_objs.append(obj)
            idx += new_idx
        except json.decoder.JSONDecodeError:
            idx += 1
            break


print(decoded_objs)

指纹:

[
    {
        "iob": 0.121,
        "activity": 0.0079,
        "basaliob": -1.447,
        "bolusiob": 1.568,
        "netbasalinsulin": -1.9,
        "bolusinsulin": 6.5,
        "time": "2022-12-25T21:17:45.000Z",
        "iobWithZeroTemp": {
            "iob": 0.121,
            "activity": 0.0079,
            "basaliob": -1.447,
            "bolusiob": 1.568,
            "netbasalinsulin": -1.9,
            "bolusinsulin": 6.5,
            "time": "2022-12-25T21:17:45.000Z",
        },
        "lastBolusTime": 1671999216000,
        "lastTemp": {
            "rate": 0,
            "timestamp": "2022-12-25T23:56:14+03:00",
            "started_at": "2022-12-25T20:56:14.000Z",
            "date": 1672001774000,
            "duration": 22.52,
        },
    },
    {"another valid JSON object": [1, 2, 3]},
]