BigQuery 比较两个列表

BigQuery compare two lists

提问人:user19212632 提问时间:7/4/2023 更新时间:7/5/2023 访问量:72

问:

我想使用 BigQuery 比较两个字符串列表。

声明列表数组 ; DECLARE list2 数组 ;

SET list1 = ['橙色','红色','黑色' ]; SET list2 = ['红色','黑色' ];

因此,我想得到“橙色”,因为它是 list2 中缺少的值。

谁能帮我?

我已经尝试了这个解决方案,但需要太长时间。我的列表实际上包含数百个值,所以我想要比在每个值前面添加“作为值并集全部选择”更有效的东西。

WITH list1 AS (
  SELECT 'orange' AS value UNION ALL
  SELECT 'red' AS value UNION ALL
  SELECT 'black' AS value
)
,
list2 AS (
  SELECT 'red' AS value UNION ALL
  SELECT 'black' AS value 
)

SELECT value
FROM (
  SELECT value, 'list1' AS list_name
  FROM list1
  UNION ALL
  SELECT value, 'list2' AS list_name
  FROM list2
)
GROUP BY value
HAVING COUNT(*) = 1
ORDER BY value
列出 Google-BigQuery 的比较

评论


答:

1赞 Mikhail Berlyant 7/4/2023 #1

请考虑以下内容 (BigQuery Standard SQL)

DECLARE list1, list2 ARRAY<string>;
SET list1 = ['orange','red','black' ]; SET list2 = ['red','black' ];
SELECT * FROM (
  SELECT DISTINCT el FROM UNNEST(list1) el
  UNION ALL
  SELECT DISTINCT el FROM UNNEST(list2) el
)
GROUP BY el
HAVING COUNT(*) = 1;    

带输出

enter image description here

还可以考虑以下方法

DECLARE list1, list2 ARRAY<string>;
SET list1 = ['orange','red','black', 'green' ]; SET list2 = ['red','black', 'pink' ];
SELECT list, ARRAY_AGG(el) el FROM (
  SELECT el, MIN(list) list FROM (
    SELECT DISTINCT el, 'list1' list FROM UNNEST(list1) el
    UNION ALL
    SELECT DISTINCT el, 'list2' FROM UNNEST(list2) el
  )
  GROUP BY el
  HAVING COUNT(*) = 1
)
GROUP BY list

带输出

enter image description here

0赞 Oleg Solovyev 7/5/2023 #2

上一个答案的一点变化,不确定它会比前面的代码更快,因为您的列表包含 100+ 个值。

DECLARE list1, list2 ARRAY<string>;
SET list1 = ['orange','red','black' ]; SET list2 = ['red','black' ];
SELECT *
FROM (
  SELECT *
  FROM UNNEST(list1) as list1
)
WHERE list1 not in (
  SELECT * FROM UNNEST(list2)
)