比较列表中的对象,以识别具有某些相同键/值对的对象和没有键/值对的对象

Compare objects in a list to identify those with certain identical key/value pairs and those without

提问人:David Gard 提问时间:4/6/2023 更新时间:4/6/2023 访问量:43

问:

使用 Python,如何在列表中找到共享某些键/值对的对象,然后创建两个单独的列表 - 一个用于共享这些特定键/值对的对象,另一个用于不共享这些特定键/值对的对象?

例如,以以下简单列表为例 -

[
    {
        "id": "111",
        "host": "aaa",
        "path": "/b/c/d"
    },
    {
        "id": "222",
        "host": "bbb",
        "path": "/x/y/z"
    },
    {
        "id": "333",
        "host": "aaa",
        "path": "/b/c/d"
    },
    {
        "id": "444",
        "host": "aaa",
        "path": "/b/c/d"
    }
]

我想最后列出两个清单——

  • 具有 duplicate 和 .hostpath

    [
        {
            "host": "aaa",
            "path": "/b/c/d"
            "ids": [
                "111",
                "333",
                "444",
        }
    ]
    
  • 没有重复项和 .hostpath

    [
        {
            "id": "222",
            "host": "bbb",
            "path": "/x/y/z"
        }
    ]
    
    

到目前为止,我最好的尝试产生了两个列表,但是原始列表中的所有对象都被添加到 ,无论它们是否真的是重复的。dups_list

请注意,我尝试在第二个语句中使用 of,但这产生了完全相同的结果。deepcopymain_listfor

>>> import jsonpickle
>>> main_list = list((dict(Id="111",host="aaa",path="/b/c/d"),dict(Id="222",host="bbb",path="/x/y/z"),dict(Id="333",host="aaa",path="/b/c/d"),dict(Id="444",host="aaa",path="/b/c/d")))
>>> dups_list = list()
>>> non_dups_list = list()
>>> for o in main_list:
...     is_duplicate = False
...     for o2 in main_list:
...         if o2['host'] == o['host'] and o2['path'] == o['path']:
...             is_duplicate = True
...             break
...     if is_duplicate:
...         dups_list.append(o)
...     else:
...         non_dups_list.append(o)
... 
>>> print(jsonpickle.encode(non_dups_list, indent=4))
[]
>>> print(jsonpickle.encode(dups_list, indent=4))
[
    {
        "Id": "111",
        "host": "aaa",
        "path": "/b/c/d"
    },
    {
        "Id": "222",
        "host": "bbb",
        "path": "/x/y/z"
    },
    {
        "Id": "333",
        "host": "aaa",
        "path": "/b/c/d"
    },
    {
        "Id": "444",
        "host": "aaa",
        "path": "/b/c/d"
    }
]
Python 列表 对象 比较

评论


答:

1赞 JLamHK 4/6/2023 #1

我建议使用 itertools。

from itertools import groupby

def get_key(d):
    # Define a custom key function to group by multiple keys
    return d['host'], d['path']

data = [...your data here]

grouped_data = []
for k, g in groupby(sorted(data, key=get_key), key=get_key):
    grouped_data.append({'host': k[0], 'path': k[1], 'ids': [i['id'] for i in list(g)]})

评论

0赞 David Gard 4/6/2023
Thanks for your reply. While this works for the duplicates, it also includes the duplicates, all non-duplicates are also included in the list, rather than being in a separate list. Additionally, other keys are excluded from both results (not included in my examples, will add a note to say it exists), and the non-duplicates are transformed where they should not be (see my example output). But thanks for introducing my to .itertools
1赞 Timeless 4/6/2023 #2

I would use defaultdict with dict/listcomps :

from collections import defaultdict

g = defaultdict(list)

for obj in list_objs:
    g[(obj["host"], obj["path"])].append(obj["id"])

dups = [{"host": k[0], "path": k[1], "ids": v} for k, v in g.items() if len(v) > 1]

uniqs = [obj for obj in list_objs if (obj["host"], obj["path"])
          not in [k for k, v in g.items() if len(v) > 1]]

#12.2 µs ± 316 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Output :

>>> print(dups)
#[{'host': 'aaa', 'path': '/b/c/d', 'ids': ['111', '333', '444']}]

>>> print(uniqs)
#[{'id': '222', 'host': 'bbb', 'path': '/x/y/z'}]

评论

1赞 Timeless 4/6/2023
NB : is the variable holding the list of objects/dictionnaries.list_objs
1赞 David Gard 4/6/2023
Perfect, and so much simpler than what I was trying. Thanks very much.