没有交换元组的笛卡尔积-解网

问：

给出了 N 个迭代器。任务是获取这些迭代器的笛卡尔乘积（以每个 N 个元素宽的元组集合的形式），同时丢弃可以通过简单地重新排列其中的元素来从其他元组派生的元组。例如，如果我们有和，我们可能会得到和。这些元组具有相同的元素，只是顺序不同。任务是只保留其中一个并丢弃所有其他。(1, 2, 3)(1, 2)(7, 8, 9)(1, 2, 7)(2, 1, 7)

我熟悉其他组合函数，但在测试了所有这些函数之后，它们并没有提供我想要的结果。我希望我能原谅我没有附上我尝试这些函数并观察结果中存在交换对的代码示例。itertools.product

我还需要说，涉及我回顾集合并搜索交换对的各种方法都是不可行的。所有输入集都是发电机，笛卡尔积也必须是发电机。我处理过巨大的笛卡尔产品（带有 3 个元素元组的列表，在磁盘上占据 70+GB），不用说，无法加载到内存中并进行搜索。

所以我希望的是一些我不知道的聪明的数学技巧，或者也许是一些输入迭代器next()

编辑：根据问题的注释添加必要的代码示例：

from itertools import product as p
one, two, three = (1, 2, 3), (1, 2), (7, 8, 9)
output = list(p(one, two, three))

输出：

(1, 1, 7)
[...]
(1, 2, 7)
[...]
(2, 1, 7)
[...]

正如我们所看到的，我们在实例化后至少得到 2 个交换对

蟒 python-itertools 组合数学

对不起，我有点慢，但只是为了确认。是否给出了所需的结果（在本例中，它会导致一组 15 个唯一元组）？因此，目标是计算此结果，但效率更高，即无需在内存中存储一个大集合，或在计算过程中重复检查集合隶属度，或者通过找到解析解？output = {tuple(sorted(comb)) for comb in p(one, two, three)}

1赞 Mateen Ulhaq 9/20/2023

据我所知，用很少的内存使用量来做到这一点似乎是合理的。i.imgur.com/Ded3PjA.png只需预先计算长度≥为 2 的所有成对子集，将集合排序为规范排序序列，然后暂时禁止在相应的成对子集中已经完成的重复项，同时按规范顺序遍历它们。它有一些空间/时间开销，随着您添加更多序列而增加，但我认为这无论如何都不是您的瓶颈。从理论上讲，一个时空更优的解决方案可以使用这种方法和典型的内存使用方法的混合......

答：

1赞 Kenny Ostrom 9/20/2023 #1

你说产品，但订单在输出中并不重要。将其限制为三个可迭代对象会有所帮助，我假设巨大的尺寸仅来自组合它们。

假设每个输入本身都很容易适应内存，那么创建每个输入的集合并计算它们的交集是公平的。我们可以检查这些交叉点上的重复项。

但我们也发现，这三个值中的循环将导致重复，而不会触发这些交叉检查。我通过计算每个期望周期值并消除它来消除它们，从而只保留其重复项。

将其扩展到更多可迭代对象是......这不是我现在想做的事情。

from itertools import product

def unique_combinations(one, two, three):
    x_values = set(one)
    y_values = set(two)
    z_values = set(three)
    xy = x_values & y_values
    xz = x_values & z_values
    yz = y_values & z_values

    # possible memory concern here, need OP's input
    cycles = set(product(xy, yz, xz))

    for candidate in product(x_values, y_values, z_values):
        x, y, z = candidate
        # check for violation of intersection constraint
        if x in xy and y in xy and x > y:
            continue
        if x in xz and z in xz and x > z:
            continue
        if y in yz and z in yz and y > z:
            continue
        # check violation of cycle constraint
        if candidate in cycles:
            continue
        yield candidate

data = [
    (1, 2, 3),
    (1, 2),
    (7, 8, 9),
]

data_with_cycle = [
    (1, 2, 3),
    (1, 4),
    (1, 2, 4),
]

def test_case(data):
    results = list(unique_combinations(*data))
    check = set(tuple(sorted(result)) for result in results)
    assert len(results) == len(check)

test_case(data)
test_case(data_with_cycle)
print('pass')

感谢 Mateen Ulhaq（可能还有 Bill）对交集约束有同样的想法，而 Bill 发现了最初想法的问题，所以我不得不添加循环约束。

from itertools import combinations, product

def is_unique(comb, shared_values):
    for (a, b), values in shared_values.items():
        if comb[a] in values and comb[b] in values:
            if values.index(comb[a]) > values.index(comb[b]):
                return False
    return True

def non_comm_combs_generator(iterables):

    # Find the values that appear in two of the iterables
    shared_values = {
        (a, b): tuple(set(iterables[a]).intersection(iterables[b]))
        for a, b in combinations(range(len(iterables)), 2)
    }

    # Generate all cartesian products
    all_combs_generator = product(* iterables)

    # Iterate over all but only print the unique ones
    return (comb for comb in all_combs_generator 
            if is_unique(comb, shared_values))

# Example
iterables = [(1, 2, 3), (1, 2), (7, 8, 9)]
output = list(non_comm_combs_generator(iterables))
[print(comb) for comb in output];

# This should be the right answer (see comment on the question)
desired_output = list(set(tuple(sorted(comb)) for comb in product(*iterables)))

# Check they are essentially the same
assert sorted(tuple(sorted(x)) for x in output) == sorted(desired_output)

输出：

(1, 1, 7)
(1, 1, 8)
(1, 1, 9)
(1, 2, 7)
(1, 2, 8)
(1, 2, 9)
(2, 2, 7)
(2, 2, 8)
(2, 2, 9)
(3, 1, 7)
(3, 1, 8)
(3, 1, 9)
(3, 2, 7)
(3, 2, 8)
(3, 2, 9)

虽然它适用于该示例和我尝试过的其他一些示例，但它并非在所有情况下都有效，例如：

iterables = [(1, 2, 3), (1, 4), (2, 4)]

在本例中，它会返回两次。[1, 2, 4]

没有交换元组的笛卡尔积

Cartesian product without commutative tuples

评论

评论

评论