提问人:Paul Samsotha 提问时间:11/14/2023 最后编辑:Paul Samsotha 更新时间:11/19/2023 访问量:243
是否有 SQL 窗口函数可以使用上一行的计算值?
Is there a SQL window function to use a calculated value from a previous row?
问:
我需要根据上一行的计算值来计算一行的值。我正在寻找一些窗口函数(或其他解决方案),如果有的话,可以帮助我做到这一点。对于上下文,我们有以下数据:
CREATE OR REPLACE TABLE example_data (
SALES_DATE DATE,
CURR_DATE_UNITS_PURCHASED NUMBER(10,1),
LTD_UNITS_PURCHASED NUMBER(10,1),
CURR_DATE_EXTRAP_NUM_SOLD NUMBER(10,1),
LTD_EXTRAP_NUM_SOLD NUMBER(10,1)
);
INSERT INTO example_data VALUES
('2023-11-01', 1000, 1000, 0, 0),
('2023-11-02', 0, 1000, 0, 0),
('2023-11-03', 0, 1000, 0, 0),
('2023-11-04', 200, 1200, 0, 0),
('2023-11-05', 0, 1200, 0, 0),
('2023-11-06', 0, 1200, 0, 0),
('2023-11-07', 50, 1250, 0, 0),
('2023-11-08', 0, 1250, 0, 0);
SALES_DATE | CURR_DATE_UNITS_PURCHASED | LTD_UNITS_PURCHASED | CURR_DATE_EXTRAP_NUM_SOLD | LTD_EXTRAP_NUM_SOLD |
---|---|---|---|---|
2023-11-01 | 1,000 | 1,000 | 0 | 0 |
2023-11-02 | 0 | 1,000 | 0 | 0 |
2023-11-03 | 0 | 1,000 | 0 | 0 |
2023-11-04 | 200 | 1,200 | 0 | 0 |
2023-11-05 | 0 | 1,200 | 0 | 0 |
2023-11-06 | 0 | 1,200 | 0 | 0 |
2023-11-07 | 50 | 1,250 | 0 | 0 |
2023-11-08 | 0 | 1,250 | 0 | 0 |
每天我们都会购买一些单位。目前购买的总单位数为 。CURR_DATE_UNITS_PURCHASED
LTD_UNITS_PURCHASED
注意:代表 live-to-date,意思是运行总数。
LTD
我正在做一些推断,需要计算 和 字段(这些是预测的销售单位,基于以下规则)。CURR_DATE_EXTRAP_NUM_SOLD
LTD_EXTRAP_NUM_SOLD
CURR_DATE_EXTRAP_NUM_SOLD
- 当日预计售出的商品数量 = 剩余商品的 10%。LTD_EXTRAP_NUM_SOLD
- 这是外推单位的运行总数。- 剩余单位 = -(前一天)。
LTD_UNITS_PURCHASED
LTD_EXTRAP_NUM_SOLD
以下是我正在寻找的结果(注意:这是需要创建的数据,它不可用。我刚刚创建了一个表格,以便您更轻松地查看)。example_data_exected
CREATE OR REPLACE TABLE example_data_expected (
SALES_DATE DATE,
CURR_DATE_UNITS_PURCHASED NUMBER(10,1),
LTD_UNITS_PURCHASED NUMBER(10,1),
CURR_DATE_EXTRAP_NUM_SOLD NUMBER(10,1),
LTD_EXTRAP_NUM_SOLD NUMBER(10,1)
);
INSERT INTO example_data_expected VALUES
('2023-11-01', 1000, 1000, 100, 100), -- 1,000 left
('2023-11-02', 0, 1000, 90, 190), -- 900 left (1,000 - 100)
('2023-11-03', 0, 1000, 81, 271), -- 810 left (1,000 - 190)
('2023-11-04', 200, 1200, 92.9, 363.9), -- 929 left (1,200 - 271)
('2023-11-05', 0, 1200, 83.6, 447.5), -- 863.1 left (1,200 - 363.9)
('2023-11-06', 0, 1200, 75.3, 522.8), -- 752.5 left (1,200 - 447.5)
('2023-11-07', 50, 1250, 72.7, 595.5), -- 727.2 left (1,250 - 522.8)
('2023-11-08', 0, 1250, 65.5, 661.0); -- 654.5 left (1,250 - 595.5)
SALES_DATE | CURR_DATE_UNITS_PURCHASED | LTD_UNITS_PURCHASED | CURR_DATE_EXTRAP_NUM_SOLD | LTD_EXTRAP_NUM_SOLD |
---|---|---|---|---|
2023-11-01 | 1,000 | 1,000 | 100 | 100 |
2023-11-02 | 0 | 1,000 | 90 | 190 |
2023-11-03 | 0 | 1,000 | 81 | 271 |
2023-11-04 | 200 | 1,200 | 92.9 | 363.9 |
2023-11-05 | 0 | 1,200 | 83.6 | 447.5 |
2023-11-06 | 0 | 1,200 | 75.3 | 552.8 |
2023-11-07 | 50 | 1,250 | 72.7 | 595.5 |
2023-11-08 | 0 | 1,250 | 65.5 | 661.0 |
你可以看到,在第一天,剩余的单位是第一天购买的任何东西 - 1,000 - 所以其中的 10% 是 100,而运行的额外实时 (LTD) 是 100。
第二天,没有购买任何单位,因此剩余单位为 1,000 - 100(前一天 LTD_EXTRAP)= 900。因此,当天的外推单位为 900 * 0.1 = 90。今天的LTD_EXTRAP是 100 + 90 = 190。
它一直这样下去。
我无法找出任何 SQL 来推断这两个字段(和 )。我尝试使用窗口函数,但这要求前一个数字是静态的(我认为),或者我遗漏了一些东西。或者有没有某种方法可以使用来更新这些字段?CURR_DATE_EXTRAP_NUM_SOLD
LTD_EXTRAP_NUM_SOLD
SELECT
LAG
UPDATE
任何帮助将不胜感激。
注意:我使用的是 Databricks SQL,它支持所有 ANSI-SQL,但也有自己的一组可用函数。因此,使用 Databricks SQL 的解决方案也可行。
更新
仅供参考,这是我尝试过的,但它似乎不起作用,因为它使用的是当前EXTRAP_LTD (0.00)。LAG
SELECT
sales_date,
curr_date_units_purchased,
ltd_units_purchased,
(ltd_units_purchased - LAG(ltd_extrap_num_sold, 1, 0) OVER (ORDER BY sales_date)) AS remaining_units,
(remaining_units * 0.10) as curr_date_extrap_num_sold,
SUM(curr_date_extrap_num_sold) OVER (ORDER BY sales_date) AS ltd_extrap_num_sold
FROM example_data;
这就是结果
SALES_DATE | CURR_DATE_UNITS_PURCHASED | LTD_UNITS_PURCHASED | REMAINING_UNITS | CURR_DATE_EXTRAP_NUM_SOLD | LTD_EXTRAP_NUM_SOLD |
---|---|---|---|---|---|
2023-11-01 | 1,000 | 1,000 | 1,000 | 100 | 0 |
2023-11-02 | 0 | 1,000 | 1,000 | 100 | 0 |
2023-11-03 | 0 | 1,000 | 1,000 | 100 | 0 |
2023-11-04 | 200 | 1,200 | 1,200 | 120 | 0 |
2023-11-05 | 0 | 1,200 | 1,200 | 120 | 0 |
2023-11-06 | 0 | 1,200 | 1,200 | 120 | 0 |
2023-11-07 | 50 | 1,250 | 1,250 | 125 | 0 |
2023-11-08 | 0 | 1,250 | 1,250 | 125 | 0 |
答:
这是在 SQL Server 中使用递归 CTE 的解决方案。但我不确定您在第 6 行的值。值 552.8 是正确的吗?如果是,您能否提供解释?请参阅以下方法。如果要针对任何数据库运行,可以调整代码。
with base_cte as
(
select min(SALES_DATE) as minDate , max(SALES_DATE) as maxdate from example_data
) ,
recursive_cte as
(
select
SALES_DATE ,
CURR_DATE_UNITS_PURCHASED ,
LTD_UNITS_PURCHASED ,
cast(CURR_DATE_UNITS_PURCHASED as decimal(10,2)) as PREV_DAY_LTD_EXTRAP ,
cast(CURR_DATE_UNITS_PURCHASED*0.1 as decimal(10,1)) as
CURR_DATE_EXTRAP_NUM_SOLD
from example_data where SALES_DATE = (select minDate from base_cte)
union all
select
b.SALES_DATE ,
b.CURR_DATE_UNITS_PURCHASED ,
b.LTD_UNITS_PURCHASED ,
cast(a.PREV_DAY_LTD_EXTRAP+b.CURR_DATE_UNITS_PURCHASED-a.CURR_DATE_EXTRAP_NUM_SOLD as decimal(10,2)) as PREV_DAY_LTD_EXTRAP ,
cast((a.PREV_DAY_LTD_EXTRAP+b.CURR_DATE_UNITS_PURCHASED-a.CURR_DATE_EXTRAP_NUM_SOLD)*0.1 as decimal(10,1)) as CURR_DATE_EXTRAP_NUM_SOLD
from recursive_cte as a inner join example_data as b on dateadd(day,1,a.SALES_DATE) = b.SALES_DATE
where b.SALES_DATE <= (select maxdate from base_cte)
)
select SALES_DATE , CURR_DATE_UNITS_PURCHASED , LTD_UNITS_PURCHASED , CURR_DATE_EXTRAP_NUM_SOLD ,
sum(CURR_DATE_EXTRAP_NUM_SOLD) over(order by SALES_DATE) as LTD_EXTRAP_NUM_SOLD from recursive_cte;
评论
the millions of rows
请看另一个解决方案。
with params as(select cast(0.9 as decimal(38,10)) x)
,cte1 as(
select t.*
,row_number()over(order by sales_date) k
,sum(CURR_DATE_UNITS_PURCHASED)over(order by sales_date) roll_sum_purchased
,params.x
from example_data_expected t
cross apply params
)
,cte2 as(
select *
,roll_sum_purchased
- (sum(CURR_DATE_UNITS_PURCHASED*power(x,1-k))over(order by sales_date))
*power(x,k) extrap_sum
from cte1
)
select SALES_DATE,k,CURR_DATE_UNITS_PURCHASED,roll_sum_purchased
,extrap_sum-lag(extrap_sum,1,0)over(order by SALES_DATE) curr_day_extrap
,CURR_DATE_EXTRAP_NUM_SOLD
,extrap_sum
,LTD_EXTRAP_NUM_SOLD
,LTD_EXTRAP_NUM_SOLD - extrap_sum check_extrap_sum
from cte2
参数可以以任何其他方式传递,也可以作为常量写入请求中的任何位置。
还可以改进计算的操作。x=0.9
power(x,k)
我在测试表中添加了一些数据
INSERT INTO example_data_expected VALUES
('2023-11-09', 0, 1250, 58.906521, 719.841311)
,('2023-11-10', 0, 1250, 53.0158689, 772.8571799)
,('2023-11-11', 0, 1250, 47.71428201, 820.5714619)
建议另外检查此值。我会检查@Saikat查询。差异是令人满意的。
包含您和其他数据的查询结果草稿
SALES_DATE | k | roll_sum_purchased | curr_day_extrap | CURR_DATE_EXTRAP_NUM_SOLD | extrap_sum | LTD_EXTRAP_NUM_SOLD | check_extrap_sum | CURR_DATE_UNITS_PURCHASED |
---|---|---|---|---|---|---|---|---|
2023-11-01 | 1 | 1000.0000000000 | 100.000000 | 100.0000000000 | 100.000000 | 100.0000000000 | 0.000000 | 1000.0000000000 |
2023-11-02 | 2 | 1000.0000000000 | 90.000000 | 90.0000000000 | 190.000000 | 190.0000000000 | 0.000000 | 0.0000000000 |
2023-11-03 | 3 | 1000.0000000000 | 81.000000 | 81.0000000000 | 271.000000 | 271.0000000000 | 0.000000 | 0.0000000000 |
2023-11-04 | 4 | 1200.0000000000 | 92.900000 | 92.9000000000 | 363.900000 | 363.9000000000 | 0.000000 | 200.0000000000 |
2023-11-05 | 5 | 1200.0000000000 | 83.610000 | 83.6000000000 | 447.510000 | 447.5000000000 | -0.010000 | 0.0000000000 |
2023-11-06 | 6 | 1200.0000000000 | 75.249000 | 75.3000000000 | 522.759000 | 522.8000000000 | 0.041000 | 0.0000000000 |
2023-11-07 | 7 | 1250.0000000000 | 72.724100 | 72.7000000000 | 595.483100 | 595.5000000000 | 0.016900 | 50.0000000000 |
2023-11-08 | 8 | 1250.0000000000 | 65.451690 | 65.5000000000 | 660.934790 | 661.0000000000 | 0.065210 | 0.0000000000 |
2023-11-09 | 9 | 1250.0000000000 | 58.906521 | 58.9065210000 | 719.841311 | 719.8413110000 | 0.000000 | 0.0000000000 |
2023-11-10 | 10 | 1250.0000000000 | 53.015869 | 53.0158689000 | 772.857180 | 772.8571799000 | 0.000000 | 0.0000000000 |
2023-11-11 | 11 | 1250.0000000000 | 47.714282 | 47.7142820100 | 820.571462 | 820.5714619000 | 0.000000 | 0.0000000000 |
Upd1 中。
关于使用 power()。
购买 1000 台,外推销售
(extrap)=1000*0.1=100 and remains =1000*(1-0.1)=1000*0.9=1000*power(0.9,1)=900.
第二天
extrap=900*0.1=90 and remains 900*(1-0.1)=900*0.9=1000*0.9*0.9=1000*power(0.9,2)=1000*0.81=810
对于白天剩余k
1000*power(0.9,k)
tod ->a1,a2,a3 - current day purchased values(CURR_DATE_UNITS_PURCHASED)
a1k=1,a2k=4,a3k=7
rest - remains from purchased
LTD_EXTRAP_NUM_SOLD - rolling sum =(rolling sum purcased)-rest
xk=power(x,k), where k - day number
rest from a1 for first day -> a1*power(x,1) ->1000*power(0.9,1)->1000*0.9=900
extrap for first day for a1=1000.0-900.0=100
rest from a1 for day num=k =a1*power(x,k) ->1000*power(0.9,k)
rest from a2 for day num=k =a2*power(x,k-4+1) ->200*power(0.9,k)
rest from a3 for day num=k =a3*power(x,k-7+1) ->50*power(0.9,k)
rest from sum of (a1+a2) for day number k=4 is
a1*power(x,k)+a2*power(x,1)
->a1*power(x,k)+a2*power(x,k)/power(x,1-k)
->(a1+a2/power(x,1-k))*power(x,k)
->((a1+a2*power(x,k-1))*power(x,k)
->((a1*power(x,a1k-1)+a2*power(x,a2k-1))*power(x,k)
Note that for a1 with k=1 a1*power(x,1-k)*power(x,k)
->a1*power(x,0)*power(x,1)->a1*1.0*0.9
For row with k=7
((a1*power(x,a1k-1)+a2*power(x,a2k-1)+a3*power(x,a3k-1))*power(x,k)
Sou: sum(tod*power(x,todK))over(order by ...)*power(x,k)
SALES_DATE | k | XK系列 | 托德 | 从 A1 开始休息 | 从 A2 开始休息 | 从 A3 开始休息 | rest_sum |
---|---|---|---|---|---|---|---|
01.11.2023 | 1 | 0,9 | a1=1000 | a1*xk=900 | 0 | 0 | 900 |
02.11.2023 | 2 | 0,81 | 0 | 810 | 0 | 0 | 810 |
03.11.2023 | 3 | 0,729 | 0 | 729 | 0 | 0 | 729 |
04.11.2023 | 4 | 0,6561 | a2=200 | 656,1 | 180 | 0 | 836,1 |
05.11.2023 | 5 | 0,59049 | 0 | 590,49 | 162 | 0 | 752,49 |
06.11.2023 | 6 | 0,531441 | 0 | 531,441 | 145,8 | 0 | 677,241 |
07.11.2023 | 7 | 0,4782969 | a3=50 | 478,2969 | 131,22 | 45 | 654,5169 |
08.11.2023 | 8 | 0,43046721 | 0 | 430,46721 | 118,098 | 40,5 | 589,06521 |
09.11.2023 | 9 | 0,387420489 | 0 | 387,420489 | 106,2882 | 36,45 | 530,158689 |
10.11.2023 | 10 | 0,34867844 | 0 | 348,6784401 | 95,65938 | 32,805 | 477,14282 |
11.11.2023 | 11 | 0,313810596 | 0 | 313,8105961 | 86,093442 | 29,5245 | 429,428538 |
请注意,要使用此方法,您需要研究行序列长度的影响以进行计算。随着 值的增加,乘数尾数的使用部分会减少。例如,power(0.9,80)=0.000218474500528393。
这可能会降低计算的准确性(如果需要准确性)。
例如,购买 1000 个单位,80 天后仍为 0.218474500528393 个单位,额外为 0,0218474500528393 个单位。
似乎您不需要使用所有数量 - 之前的所有行。您可以。在这种情况下,必须对计算进行相同的更改。k
sum(...)over (order by date of sale)
sum(...)over(order by sales_date rows ... nearest 100)
下一个变体。计算的准确性应该没有问题。
with cte0 as(
select t.*
,row_number()over(order by sales_date) k
,cast(0.9 as decimal(38,20)) x
from example_data_expected t
)
,cte1 as(
select t.*
,cast(CURR_DATE_UNITS_PURCHASED*power(x,1-k) as decimal(38,20)) dayP1_K
,cast(k*log(x) as decimal(38,20)) kxL
from cte0 t
)
,cte2 as(
select *
,sum(CURR_DATE_UNITS_PURCHASED)over(order by sales_date) -- roll_sum_purchased
-cast(exp(log((sum(dayP1_K)over(order by sales_date)))+kxL) as decimal(38,20)) extrap_sum
from cte1
)
,cte3 as(
select *
,extrap_sum-lag(extrap_sum,1,0)over(order by SALES_DATE) curr_day_extrap
from cte2
)
那里用过equality
x=exp(log(x))
log(x*y)=log(x)+log(y)
log(power(x,k))=k*log(x)
评论
corss apply
example_data_expected
cross apply
cross join
params.x
cast(0.9 as decimal(38,10)) x
评论
LAG
partition by sales_date
sales_date
LEAD
LAG