如何从 python 中的数组列表中更快地访问字典的值？-解网

问：

我想知道如何从数组列表中快速访问字典的值。

这是我的玩具示例：

my_list = [np.array([ 1,  2,  3,  4,  5,  6,  8,  9, 10]), np.array([ 1,  3,  5,  6,  7, 10]), np.array([ 1,  2,  3,  4,  6,  8,  9, 10]), np.array([ 1,  3,  4,  7, 15]), np.array([ 1,  2,  4,  5, 10, 16]), np.array([6, 10, 15])]

my_dict = {1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 2, 11: 2, 12: 2, 13: 2, 14: 2, 15: 3, 16: 3}

my_dict中的每个键都对应于名为 my_list 的列表中的值

我使用以下代码来获得所需的输出集：

unique_string_parents = {' '.join(map(str, set(map(my_dict.get, sublist)))) for sublist in my_list}
# output: {'0 1 2', '0 1 3', '0 1 2 3', '1 2 3'}

假设我有一个巨大的my_list维度，可以在这里找到my_dict：https://gitlab.com/Schrodinger168/practice/-/tree/master/practice_dictionary

以下是读取真实文件的代码：

import ast

file_list = 'list_array.txt' 
with open(file_list, 'r') as file:
    lines = file.readlines()

my_list= [np.array(list(map(int, line.split()))) for line in lines]

file_dictionary = "dictionary_example.txt"
with open(file_dictionary, 'r') as file_dict:
    content = file_dict.read()

my_dict = ast.literal_eval(content)

我使用了上面提供的单行代码;获得所需的输出大约需要 16.70 秒。我想知道如何加快此算法的速度，或者是否有任何其他算法，以便在这种情况下我可以更快地获得结果？

请提供任何帮助或建议！非常感谢！

python 数组 numpy 字典

import ast

file_list = "list_array.txt"
file_dictionary = "dictionary_example.txt"

with open(file_dictionary, "r") as file_dict:
    my_dict = ast.literal_eval(file_dict.read())

# convert keys back to string (to not convert the lines to int)
my_dict = {str(k): v for k, v in my_dict.items()}

out = set()
with open(file_list, "r") as file:
    for line in file:
        out.add(frozenset(my_dict[i] for i in line.split()))

out = [" ".join(map(str, s)) for s in out]
print(out)

编辑2：两种方法的比较：

import ast
from timeit import timeit

# read sample data:

file_list = "list_array.txt"
file_dictionary = "dictionary_example.txt"

with open(file_dictionary, "r") as file_dict:
    my_dict = ast.literal_eval(file_dict.read())

my_list = []
with open(file_list, "r") as file:
    for line in file:
        my_list.append(np.array(list(map(int, line.split()))))


def fn1(my_list, my_dict):
    arr = np.array([my_dict[i] for i in range(1, max(my_dict) + 1)])
    out = {frozenset(np.unique(arr[l - 1])) for l in my_list}
    return [" ".join(map(str, s)) for s in out]


def fn2(my_list, my_dict):
    out = {frozenset(np.unique(np.vectorize(my_dict.get)(l))) for l in my_list}
    return [" ".join(map(str, s)) for s in out]


assert len(fn1(my_list, my_dict)) == len(fn2(my_list, my_dict))

t1 = timeit("fn1(my_list, my_dict)", number=1, globals=globals())
t2 = timeit("fn2(my_list, my_dict)", number=1, globals=globals())

print(t1)
print(t2)

指纹：

1.7291587160434574
11.316183750052005

@--Andrej Kesely 我提供的文本文件只是为了访问真实数据。我不想直接从中读取并执行操作。我开始计算我my_dict和my_list的时间。我提供的代码只是为了读取真实数据，以便获得my_dict和my_list。计算我得到my_dict和my_list时间的重要步骤。

0赞 Andrej Kesely 11/11/2023

@Erwin我已经更新了我的答案，比较了两种方法。似乎版本最快（关键是使用fn1()np.unique())

1赞 Erwin 11/11/2023

--@Andrej Kesely 非常感谢！就您而言，这个似乎工作得最快。我接受了答案。

上一个：为什么在numpy数组中更改从numpy转换而来的张量？

下一个：Numpy 比较嵌套数组

如何从 python 中的数组列表中更快地访问字典的值？

How to access the value of a dictionary faster from a list of arrays in python?

评论

评论