提问人:armamut 提问时间:11/13/2023 最后编辑:armamut 更新时间:11/15/2023 访问量:72
Bigquery,查找组的滚动最新值并查找组的最小值
Bigquery, find rolling latest value of groups and find minimum of group
问:
假设我们在 bigquery 中有一个产品、卖家、价格和库存数量变化数据,如下所示。此数据来自产品列表表的更改 (CDC)。
一个产品可能有多个卖家,不同的卖家对该产品有不同的价格。有时卖家缺货,我们会从最低限度的计算中省略该卖家。因此,当时该产品的最低价格取决于有库存的卖家。
WITH a AS
(
SELECT 1 AS dt, 'p' as product_id, 'A' AS seller, 10 as stock, 100 as price
UNION ALL
SELECT 2 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 120 as price
UNION ALL
SELECT 3 AS dt, 'p' as product_id, 'C' AS seller, 10 as stock, 150 as price
UNION ALL
SELECT 4 AS dt, 'p' as product_id, 'D' AS seller, 10 as stock, 300 as price
UNION ALL
SELECT 5 AS dt, 'p' as product_id, 'E' AS seller, 10 as stock, 400 as price
UNION ALL
SELECT 6 AS dt, 'p' as product_id, 'F' AS seller, 10 as stock, 500 as price
UNION ALL
SELECT 7 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 600 as price
UNION ALL
SELECT 8 AS dt, 'p' as product_id, 'A' AS seller, 0 as stock, 100 as price
UNION ALL
SELECT 9 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 110 as price
UNION ALL
SELECT 10 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 190 as price
UNION ALL
SELECT 11 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 800 as price
UNION ALL
SELECT 12 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 100 as price
)
SELECT *
FROM a
我想每次计算该产品的最低价格和卖家的最低价格,例如:
期望输出:
DT的 | product_id | minimum_price | seller_with_minimum_price |
---|---|---|---|
1 | p | 100 | 一个 |
2 | p | 100 | 一个 |
3 | p | 100 | 一个 |
4 | p | 100 | 一个 |
5 | p | 100 | 一个 |
6 | p | 100 | 一个 |
7 | p | 100 | 一个 |
8 | p | 120 | B |
9 | p | 110 | B |
10 | p | 150 | C |
11 | p | 150 | C |
12 | p | 100 | G |
在时间 1 时,该商品只有一个卖家。所以最低价格是 100,最低价格的卖家是 A。在第 2 时间,出现了第二个卖家,但卖家 A 的最低价格仍然是 100。 直到第 7 次,状态是相同的。
在第 8 时间,卖家 A 缺货,因此卖家 B 的此产品的最低价格为 120。在第 9 时间,卖家 B 降低价格,因此卖家 B 的最低价格为 110。在第 10 时间,B 缺货。在第 11 时间,G 提高了价格,因此没有效果。在第 12 时间,卖家 G 降低了价格。因此,时间 12 的最低价格(对于手头有库存的卖家)是 100,卖家是 G。
简而言之,我想为有库存的卖家找到产品在不同时间的最低价格。
为了计算这个逻辑,我找到了一个包含交叉连接的解决方案,但它需要太长时间和太多资源。我想知道是否有更好的解决方案。我搜索了stackoverflow + google,但找不到一个好的解决方案。
答:
1赞
Mikhail Berlyant
11/15/2023
#1
使用以下方法
WITH previous_prices AS (
SELECT *, ARRAY_AGG(STRUCT(dt, seller, stock, price)) OVER win prev
FROM your_data
WINDOW win AS (PARTITION BY product_id ORDER BY dt)
), last_qualifying_prices AS (
SELECT dt, product_id,
ARRAY(
SELECT AS STRUCT price, seller
FROM t.prev
QUALIFY ROW_NUMBER() OVER win = 1 AND stock > 0
WINDOW win AS (PARTITION BY seller ORDER BY dt DESC)
) candidates
FROM previous_prices t
)
SELECT dt, product_id, candidate.*
FROM last_qualifying_prices,
UNNEST(candidates) candidate
QUALIFY ROW_NUMBER() OVER win = 1
WINDOW win AS (PARTITION BY dt, product_id ORDER BY candidate.price)
ORDER BY dt
如果应用于问题中的示例数据 - 输出为
上一个:大查询 SQL 中的数组拆分
下一个:创建总和为 12 的数字组
评论