Bigquery,查找组的滚动最新值并查找组的最小值

Bigquery, find rolling latest value of groups and find minimum of group

提问人:armamut 提问时间:11/13/2023 最后编辑:armamut 更新时间:11/15/2023 访问量:72

问:

假设我们在 bigquery 中有一个产品、卖家、价格和库存数量变化数据,如下所示。此数据来自产品列表表的更改 (CDC)。

一个产品可能有多个卖家,不同的卖家对该产品有不同的价格。有时卖家缺货,我们会从最低限度的计算中省略该卖家。因此,当时该产品的最低价格取决于有库存的卖家。


WITH a AS
(
  SELECT 1 AS dt, 'p' as product_id, 'A' AS seller, 10 as stock, 100 as price
  UNION ALL
  SELECT 2 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 120 as price
  UNION ALL
  SELECT 3 AS dt, 'p' as product_id, 'C' AS seller, 10 as stock, 150 as price
  UNION ALL
  SELECT 4 AS dt, 'p' as product_id, 'D' AS seller, 10 as stock, 300 as price
  UNION ALL
  SELECT 5 AS dt, 'p' as product_id, 'E' AS seller, 10 as stock, 400 as price
  UNION ALL
  SELECT 6 AS dt, 'p' as product_id, 'F' AS seller, 10 as stock, 500 as price
  UNION ALL
  SELECT 7 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 600 as price
  UNION ALL
  SELECT 8 AS dt, 'p' as product_id, 'A' AS seller, 0 as stock, 100 as price
  UNION ALL
  SELECT 9 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 110 as price
  UNION ALL
  SELECT 10 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 190 as price
  UNION ALL
  SELECT 11 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 800 as price
  UNION ALL
  SELECT 12 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 100 as price
)

SELECT *
FROM a

我想每次计算该产品的最低价格和卖家的最低价格,例如:

期望输出:

DT的 product_id minimum_price seller_with_minimum_price
1 p 100 一个
2 p 100 一个
3 p 100 一个
4 p 100 一个
5 p 100 一个
6 p 100 一个
7 p 100 一个
8 p 120 B
9 p 110 B
10 p 150 C
11 p 150 C
12 p 100 G

在时间 1 时,该商品只有一个卖家。所以最低价格是 100,最低价格的卖家是 A。在第 2 时间,出现了第二个卖家,但卖家 A 的最低价格仍然是 100。 直到第 7 次,状态是相同的。

在第 8 时间,卖家 A 缺货,因此卖家 B 的此产品的最低价格为 120。在第 9 时间,卖家 B 降低价格,因此卖家 B 的最低价格为 110。在第 10 时间,B 缺货。在第 11 时间,G 提高了价格,因此没有效果。在第 12 时间,卖家 G 降低了价格。因此,时间 12 的最低价格(对于手头有库存的卖家)是 100,卖家是 G。

简而言之,我想为有库存的卖家找到产品在不同时间的最低价格。

为了计算这个逻辑,我找到了一个包含交叉连接的解决方案,但它需要太长时间和太多资源。我想知道是否有更好的解决方案。我搜索了stackoverflow + google,但找不到一个好的解决方案。

sql google-bigquery 窗口函数 聚合滚动 计算

评论

0赞 Mikhail Berlyant 11/14/2023
请解释逻辑 - 不清楚!请从 dt=8 开始!
0赞 armamut 11/14/2023
谢谢,你是对的。我试图解释更多
0赞 Mikhail Berlyant 11/15/2023
是的。现在有意义了。看我的答案:o)

答:

1赞 Mikhail Berlyant 11/15/2023 #1

使用以下方法

WITH previous_prices AS (
  SELECT *, ARRAY_AGG(STRUCT(dt, seller, stock, price)) OVER win prev
  FROM your_data  
  WINDOW win AS (PARTITION BY product_id ORDER BY dt)
), last_qualifying_prices AS (
  SELECT dt, product_id, 
    ARRAY(
      SELECT AS STRUCT price, seller
      FROM t.prev 
      QUALIFY ROW_NUMBER() OVER win = 1 AND stock > 0
      WINDOW win AS (PARTITION BY seller ORDER BY dt DESC)
    ) candidates
  FROM previous_prices t
)
SELECT dt, product_id, candidate.*
FROM last_qualifying_prices, 
UNNEST(candidates) candidate
QUALIFY ROW_NUMBER() OVER win = 1
WINDOW win AS (PARTITION BY dt, product_id ORDER BY candidate.price)
ORDER BY dt    

如果应用于问题中的示例数据 - 输出为

enter image description here