有没有比“for”循环更好的方法来访问 JSON 文件中的值？-解网

问：

我有一个JSON文件，看起来像这样：

[{'data': [{'text': 'add '},
   {'text': 'Stani, stani Ibar vodo', 'entity': 'entity_name'},
   {'text': ' songs in '},
   {'text': 'my', 'entity': 'playlist_owner'},
   {'text': ' playlist '},
   {'text': 'música libre', 'entity': 'playlist'}]},
 {'data': [{'text': 'add this '},
   {'text': 'album', 'entity': 'music_item'},
   {'text': ' to '},
   {'text': 'my', 'entity': 'playlist_owner'},
   {'text': ' '},
   {'text': 'Blues', 'entity': 'playlist'},
   {'text': ' playlist'}]},
 {'data': [{'text': 'Add the '},
   {'text': 'tune', 'entity': 'music_item'},
   {'text': ' to the '},
   {'text': 'Rage Radio', 'entity': 'playlist'},
   {'text': ' playlist.'}]}]

我想在此列表中的每个“数据”中附加“文本”中的值。

我尝试了以下方法：

lst = []

for item in data:
    p = item['data']
    p_st = ''
    for item_1 in p:
        p_st += item_1['text'] + ' '
    lst.append(p_st)

print(lst)

Out: ['add  Stani, stani Ibar vodo  songs in  my  playlist  música libre ', 'add this  album  to  my   Blues  playlist ', 'Add the  tune  to the  Rage Radio  playlist. ']

它有效，但我是 JSON 的新手，想知道是否有更好的方法可以做到这一点？也许是 JSON 的一些内置方法或库？

python json 列表读取行

内置模块提供了一种将 json 文件解析为 python 字典的方法，您似乎已经使用过该字典。一旦你有了字典，对该字典的所有元素进行操作的唯一方法是使用循环。为什么你认为这不是一个好方法？当你说“更好的方式”时，你具体在寻找什么？json

0赞 pho 10/5/2023

我能想到的唯一变化是用来连接字符串，所以用替换你的内部循环，但这在列表推导式中仍然有一个“循环”str.joinp_st = ' '.join([item_1['text'].strip() for item_1 in p])

答：

2赞 Nidheesh R 10/5/2023 #1

您的代码非常适合从 JSON 数据中提取文本值。但是，如果您想要一种更简洁的方式来实现相同的结果，则可以在 Python 中使用列表推导式，这可以使您的代码更短、更具可读性。这是你如何做到的：

使用 JSON 模块和列表推导式：

import json

data = [{'data': [{'text': 'add '}, {'text': 'Stani, stani Ibar vodo', 'entity': 'entity_name'}, {'text': ' songs in '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' playlist '}, {'text': 'música libre', 'entity': 'playlist'}]},
        {'data': [{'text': 'add this '}, {'text': 'album', 'entity': 'music_item'}, {'text': ' to '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' '}, {'text': 'Blues', 'entity': 'playlist'}, {'text': ' playlist'}]},
        {'data': [{'text': 'Add the '}, {'text': 'tune', 'entity': 'music_item'}, {'text': ' to the '}, {'text': 'Rage Radio', 'entity': 'playlist'}, {'text': ' playlist.'}]}]

text_values = [' '.join(item['text'] for item in entry['data']) for entry in data]

print(text_values)

使用 pandas：

import pandas as pd

data = [{'data': [{'text': 'add '}, {'text': 'Stani, stani Ibar vodo', 'entity': 'entity_name'}, {'text': ' songs in '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' playlist '}, {'text': 'música libre', 'entity': 'playlist'}]},
        {'data': [{'text': 'add this '}, {'text': 'album', 'entity': 'music_item'}, {'text': ' to '}, {'text': 'my', 'entity': 'playlist_owner'}, {'text': ' '}, {'text': 'Blues', 'entity': 'playlist'}, {'text': ' playlist'}]},
        {'data': [{'text': 'Add the '}, {'text': 'tune', 'entity': 'music_item'}, {'text': ' to the '}, {'text': 'Rage Radio', 'entity': 'playlist'}, {'text': ' playlist.'}]}]

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Extract and join the 'text' values for each 'data' entry
text_values = df['data'].apply(lambda x: ' '.join(item['text'] for item in x))

print(text_values.tolist())

如果您计划对 JSON 数据执行额外的数据分析或操作，则 pandas 方法更合适，因为它提供了一种强大而灵活的方法来处理结构化数据。

-1赞 Christopher Hatton 10/5/2023 #2

这将起作用：

with open(filename,'r+') as file:
    #open and load json file into dict
    file_data = json.load(file)
    #append new data to dict
    file_data[].append(new_data)
    #sets file's current position at offset
    file.seek(0)
    #convert back to json
    json.dump(file_data, file, indent = 4)

它使用列表推导来生成外部列表，其中生成的每个元素都是该的所有值的空格分隔连接。在外部部分使用 listcomp 可以使事情变得更快一些（这是一种利用 listcomp 的解释器优化的微优化，但它不是 big-O 改进）。不过，使用 是 big-O 算法的改进;重复的字符串连接是（CPython 几乎有时将其优化，但不是那么好，也不可靠），而批量串联 via 是有保证的。如果数据只是少量字符串，如图所示，差异可以忽略不计，但代码更简单，更易于阅读/维护。如果数据有许多字符串要连接，这可能会显著加快速度。'text'item'data'' '.joinO(n²)O(n)' '.joinO(n)

注意：这确实意味着连接的字符串不会以空格结尾。无论如何，你很可能不想要那个尾随空间，但如果你真的想要，你可以随时把它加回来;单个额外的串联不会破坏 big-O。

出于好奇：作为 CPython 的实现细节（可能会改变，但几年内没有改变），将 listcomp 的结果传递给比传递原始生成器表达式更快，因为传递给它的任何可迭代对象，除了 or ，都会在连接开始之前被啃食到内部，因此您需要支付 genexpr 的 CPU 开销，而无需在峰值内存或临时 s 上节省任何费用。工作正常，只是速度慢了一点。str.joinlisttuplelistlist' '.join(item_1['text'] for item_1 in item['data'])

上一个：读取文件以检查是否存在多个字符串

下一个：解析文件并创建数据结构

有没有比“for”循环更好的方法来访问 JSON 文件中的值？

Is there a better way to access values in a JSON file than a 'for' loop?

评论

评论

评论