SQL通过AVG在每个组中填充缺失值

SQL to fill missing value by AVG in each Group

提问人:PyBoss 提问时间:6/12/2023 最后编辑:PyBoss 更新时间:6/12/2023 访问量:115

问:

我想用下表的平均值填充缺失的日期和价格

enter image description here

示例数据

if OBJECT_ID('tempdb..#mytable') is not null DROP TABLE #mytable
CREATE TABLE #mytable (CreateDate datetime, Product Varchar(4), Price money)
INSERT INTO #mytable VALUES
('2023-06-09 17:01:00.000','Tree',1),
('2023-06-09 17:02:00.000','Tree',2),
('2023-06-09 17:03:00.000','Tree',3),
('2023-06-09 17:04:00.000','Tree',4),
('2023-06-09 17:01:00.000','BB',20),
('2023-06-09 17:02:00.000','BB',40),
('2023-06-09 17:04:00.000','BB',60),
('2023-06-09 17:01:00.000','Car',20),
('2023-06-09 17:03:00.000','Car',30),
('2023-06-09 17:04:00.000','Car',50)

第 1 步是创建一个完整的日期时间集

if OBJECT_ID('tempdb..#DateRange') is not null DROP TABLE #DateRange
Create Table #DateRange(CreateDate datetime Primary key Clustered)
GO

Declare @startdate datetime = '2023-06-09 17:01:00', @endtime datetime = '2023-06-09 17:04:00'
While (@startdate <= @endtime)
BEGIN
Insert into #DateRange values (@startdate)
set @startdate = DATEADD(MINUTE, 1, @startdate)
END

第 2 步用 Only ONE upperOnly ONE lower number 的平均值填充 NULL,而不是所有行的平均值

SELECT d.CreateDate,
COALESCE(Price, AVG(Price) OVER ()) as New_Price,
m.*
From #DateRange d
LEFT OUTER JOIN #mytable m on d.CreateDate = m.CreateDate
and Product = 'BB'

我的问题是,如何遍历每个产品组,这样我就不需要显式设置 product = BB?

我的预期结果是

enter image description here

sql sql-server while-loop

评论


答:

1赞 abolfazl sadeghi 6/12/2023 #1

您可以使用 CTE。 首先创建列表日期并创建列表唯一产品

然后将列表Date_uniqueProduct与表mytable联接,以查找与日期的差距

Declare @startdate datetime = (select  min(CreateDate) from #mytable)
, @endtime datetime =(select  max(CreateDate) from #mytable)


;with Listunique as (
select   distinct Product
from #mytable
),

_List (date_,Product) as ( 
        select @startdate as date_,Product
        from (select distinct Product from  Listunique)a
        union all
        select  DATEADD(minute,1, date_) as date_,Product
        from _List
        where date_ <@endtime

)
select a.*,f.price
from _List a

full join #mytable b on a.date_=b.CreateDate and a.Product=b.Product
outer apply (
                select sum(ISNULL(p,0) +ISNULL(n,0)) /2 as price
                from (
                        select  Price as p,0 as n from #mytable l 
                        where l.Product=a.Product and l.CreateDate=DATEADD(minute,-1, a.date_)
                        union 
                        select 0 as p,Price as n from #mytable l 
                        where l.Product=a.Product and l.CreateDate=DATEADD(minute,1, a.date_)
                )d
            )f
where b.CreateDate is null

演示

评论

0赞 PyBoss 6/12/2023
@abolfazi您错过了只有一个上限和只有一个下限的平均价格
0赞 abolfazl sadeghi 6/12/2023
@PyBoss,我编辑了我的代码并添加了 Price
1赞 SelVazi 6/12/2023 #2

用于创建日期范围,将此日期列表联接到表中以识别缺失值,然后仅使用每个组的第一条和最后一条记录的平均值( 和 )填充缺失值:GENERATE_SERIESFIRST_VALUELAST_VALUE

Declare @startdate datetime = (select min(CreateDate) from #mytable)
,@endtime datetime =(select max(CreateDate) from #mytable);

with daterange as (
  SELECT DATEADD(minute,value,@startdate) as CreateDate, Product
  FROM GENERATE_SERIES(datepart(minute, @startdate) - 1, 
                     datepart(minute, @endtime) - 1,1)
  CROSS JOIN (SELECT DISTINCT Product FROM #mytable) p
),
cte as (
  SELECT d.*, t.Price
  FROM daterange d
  LEFT JOIN #mytable t on d.CreateDate = t.CreateDate AND d.Product = t.Product
),
avg_cte as (
  select *,(COALESCE(first_value(price) ignore nulls over (partition by product order by CreateDate range between current row and unbounded following), 0) +
           COALESCE(last_value(price) ignore nulls over (partition by product order by CreateDate range between unbounded preceding and current row), 0)) /2.0 as avg_price
  from cte
)
select CreateDate, Product, avg_price
from avg_cte
where price is null

结果:

CreateDate              Product Price
2023-06-09 17:03:00.000 BB      50.0000
2023-06-09 17:02:00.000 Car     25.0000

演示在这里

评论

0赞 PyBoss 6/12/2023
数学是错误的,用上限和下限的平均值填写 NULL,你取所有行的平均值
0赞 SelVazi 6/12/2023
我已经编辑了我的答案,你能看看吗
2赞 ahmed 6/12/2023 #3

请尝试以下操作(如果使用的是 SQL Server 2022):

declare @start_time datetime = '2023-06-09 17:01:00';
declare @end_time datetime = '2023-06-09 17:04:00';

with Calendar as 
(-- step1: build a calender table based on your start and end time.
  select top(datediff(minute, @start_time, @end_time) +1)
         dateadd(minute, row_number() over (order by number) -1, @start_time) dt
  from master..spt_values
)
select all_dts.dt as CreateDate,
       p.product,
       -- use first_value and last_value functions with the (ignore nulls) option to get the next and previous not null price value.
       isnull(tbl.price, 
              (isnull(first_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between current row and unbounded following), 0) +
              isnull(last_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between unbounded preceding and current row), 0)) /2.0
            ) as price 
from Calendar all_dts
-- step2: do a cross join between the generated dates in step1 and the distinct values of products to get all possisble combinations of (dates and products)
cross join (select distinct product from #mytable) p
-- step3: do a left join with the table to get the missing dates
left join #mytable tbl
on all_dts.dt = tbl.CreateDate and
   p.product = tbl.product
order by p.product, all_dts.dt

演示

对于以前版本的 SQL Server,我们可以实现一个子查询,而不是 and 函数,其中包含:first_valuelast_valueignore null

declare @start_time datetime = '2023-06-09 17:01:00';
declare @end_time datetime = '2023-06-09 17:04:00';

with Calendar as 
(-- step1: build a calender table based on your start and end time.
  select top(datediff(minute, @start_time, @end_time) +1)
         dateadd(minute, row_number() over (order by number) -1, @start_time) dt
  from master..spt_values
)
select all_dts.dt as CreateDate,
       p.product,
       -- Use subquery to get the next and previous not null price value.
       isnull(tbl.price, 
              (isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate > all_dts.dt order by CreateDate), 0) +
               isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate < all_dts.dt order by CreateDate desc), 0)) /2.0
            ) as price 
from Calendar all_dts
-- step2: do a cross join between the generated dates in step1 and the distinct values of products to get all possisble combinations of (dates and products)
cross join (select distinct product from #mytable) p
-- step3: do a left join with the table to get the missing dates
left join #mytable tbl
on all_dts.dt = tbl.CreateDate and
   p.product = tbl.product
order by p.product, all_dts.dt

演示

评论

0赞 ahmed 6/12/2023
如果只想要缺少的行,则可以将查询包装在另一个 CTE 中并仅选择行(在演示中进行了说明)where tbl.product is null
0赞 PyBoss 6/12/2023
我应该替换什么师父..spt_values
0赞 ahmed 6/12/2023
您仍然可以使用 .您可以通过多种方式构建自己的数字表,它可能只是一个包含一列和特定数量的行的表。master..spt_values
0赞 ahmed 6/12/2023
例如,您可以尝试这种方式 dbfiddle.uk/9dCjdaYt 并以这种方式使用它 dbfiddle.uk/eZhRXkgq
0赞 PyBoss 6/12/2023
对于 AVG 价格,我想计算 Only One upper 和 Only ONE lower 数字的平均值,而不是整行 AVG