提问人:PyBoss 提问时间:6/12/2023 最后编辑:PyBoss 更新时间:6/12/2023 访问量:115
SQL通过AVG在每个组中填充缺失值
SQL to fill missing value by AVG in each Group
问:
我想用下表的平均值填充缺失的日期和价格
示例数据
if OBJECT_ID('tempdb..#mytable') is not null DROP TABLE #mytable
CREATE TABLE #mytable (CreateDate datetime, Product Varchar(4), Price money)
INSERT INTO #mytable VALUES
('2023-06-09 17:01:00.000','Tree',1),
('2023-06-09 17:02:00.000','Tree',2),
('2023-06-09 17:03:00.000','Tree',3),
('2023-06-09 17:04:00.000','Tree',4),
('2023-06-09 17:01:00.000','BB',20),
('2023-06-09 17:02:00.000','BB',40),
('2023-06-09 17:04:00.000','BB',60),
('2023-06-09 17:01:00.000','Car',20),
('2023-06-09 17:03:00.000','Car',30),
('2023-06-09 17:04:00.000','Car',50)
第 1 步是创建一个完整的日期时间集
if OBJECT_ID('tempdb..#DateRange') is not null DROP TABLE #DateRange
Create Table #DateRange(CreateDate datetime Primary key Clustered)
GO
Declare @startdate datetime = '2023-06-09 17:01:00', @endtime datetime = '2023-06-09 17:04:00'
While (@startdate <= @endtime)
BEGIN
Insert into #DateRange values (@startdate)
set @startdate = DATEADD(MINUTE, 1, @startdate)
END
第 2 步用 Only ONE upper 和 Only ONE lower number 的平均值填充 NULL,而不是所有行的平均值
SELECT d.CreateDate,
COALESCE(Price, AVG(Price) OVER ()) as New_Price,
m.*
From #DateRange d
LEFT OUTER JOIN #mytable m on d.CreateDate = m.CreateDate
and Product = 'BB'
我的问题是,如何遍历每个产品组,这样我就不需要显式设置 product = BB?
我的预期结果是
答:
1赞
abolfazl sadeghi
6/12/2023
#1
您可以使用 CTE。 首先创建列表日期并创建列表唯一产品
然后将列表Date_uniqueProduct与表mytable联接,以查找与日期的差距
Declare @startdate datetime = (select min(CreateDate) from #mytable)
, @endtime datetime =(select max(CreateDate) from #mytable)
;with Listunique as (
select distinct Product
from #mytable
),
_List (date_,Product) as (
select @startdate as date_,Product
from (select distinct Product from Listunique)a
union all
select DATEADD(minute,1, date_) as date_,Product
from _List
where date_ <@endtime
)
select a.*,f.price
from _List a
full join #mytable b on a.date_=b.CreateDate and a.Product=b.Product
outer apply (
select sum(ISNULL(p,0) +ISNULL(n,0)) /2 as price
from (
select Price as p,0 as n from #mytable l
where l.Product=a.Product and l.CreateDate=DATEADD(minute,-1, a.date_)
union
select 0 as p,Price as n from #mytable l
where l.Product=a.Product and l.CreateDate=DATEADD(minute,1, a.date_)
)d
)f
where b.CreateDate is null
评论
0赞
PyBoss
6/12/2023
@abolfazi您错过了只有一个上限和只有一个下限的平均价格
0赞
abolfazl sadeghi
6/12/2023
@PyBoss,我编辑了我的代码并添加了 Price
1赞
SelVazi
6/12/2023
#2
用于创建日期范围,将此日期列表联接到表中以识别缺失值,然后仅使用每个组的第一条和最后一条记录的平均值( 和 )填充缺失值:GENERATE_SERIES
FIRST_VALUE
LAST_VALUE
Declare @startdate datetime = (select min(CreateDate) from #mytable)
,@endtime datetime =(select max(CreateDate) from #mytable);
with daterange as (
SELECT DATEADD(minute,value,@startdate) as CreateDate, Product
FROM GENERATE_SERIES(datepart(minute, @startdate) - 1,
datepart(minute, @endtime) - 1,1)
CROSS JOIN (SELECT DISTINCT Product FROM #mytable) p
),
cte as (
SELECT d.*, t.Price
FROM daterange d
LEFT JOIN #mytable t on d.CreateDate = t.CreateDate AND d.Product = t.Product
),
avg_cte as (
select *,(COALESCE(first_value(price) ignore nulls over (partition by product order by CreateDate range between current row and unbounded following), 0) +
COALESCE(last_value(price) ignore nulls over (partition by product order by CreateDate range between unbounded preceding and current row), 0)) /2.0 as avg_price
from cte
)
select CreateDate, Product, avg_price
from avg_cte
where price is null
结果:
CreateDate Product Price
2023-06-09 17:03:00.000 BB 50.0000
2023-06-09 17:02:00.000 Car 25.0000
评论
0赞
PyBoss
6/12/2023
数学是错误的,用上限和下限的平均值填写 NULL,你取所有行的平均值
0赞
SelVazi
6/12/2023
我已经编辑了我的答案,你能看看吗
2赞
ahmed
6/12/2023
#3
请尝试以下操作(如果使用的是 SQL Server 2022):
declare @start_time datetime = '2023-06-09 17:01:00';
declare @end_time datetime = '2023-06-09 17:04:00';
with Calendar as
(-- step1: build a calender table based on your start and end time.
select top(datediff(minute, @start_time, @end_time) +1)
dateadd(minute, row_number() over (order by number) -1, @start_time) dt
from master..spt_values
)
select all_dts.dt as CreateDate,
p.product,
-- use first_value and last_value functions with the (ignore nulls) option to get the next and previous not null price value.
isnull(tbl.price,
(isnull(first_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between current row and unbounded following), 0) +
isnull(last_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between unbounded preceding and current row), 0)) /2.0
) as price
from Calendar all_dts
-- step2: do a cross join between the generated dates in step1 and the distinct values of products to get all possisble combinations of (dates and products)
cross join (select distinct product from #mytable) p
-- step3: do a left join with the table to get the missing dates
left join #mytable tbl
on all_dts.dt = tbl.CreateDate and
p.product = tbl.product
order by p.product, all_dts.dt
对于以前版本的 SQL Server,我们可以实现一个子查询,而不是 and 函数,其中包含:first_value
last_value
ignore null
declare @start_time datetime = '2023-06-09 17:01:00';
declare @end_time datetime = '2023-06-09 17:04:00';
with Calendar as
(-- step1: build a calender table based on your start and end time.
select top(datediff(minute, @start_time, @end_time) +1)
dateadd(minute, row_number() over (order by number) -1, @start_time) dt
from master..spt_values
)
select all_dts.dt as CreateDate,
p.product,
-- Use subquery to get the next and previous not null price value.
isnull(tbl.price,
(isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate > all_dts.dt order by CreateDate), 0) +
isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate < all_dts.dt order by CreateDate desc), 0)) /2.0
) as price
from Calendar all_dts
-- step2: do a cross join between the generated dates in step1 and the distinct values of products to get all possisble combinations of (dates and products)
cross join (select distinct product from #mytable) p
-- step3: do a left join with the table to get the missing dates
left join #mytable tbl
on all_dts.dt = tbl.CreateDate and
p.product = tbl.product
order by p.product, all_dts.dt
评论
0赞
PyBoss
6/12/2023
我应该替换什么师父..spt_values
0赞
ahmed
6/12/2023
您仍然可以使用 .您可以通过多种方式构建自己的数字表,它可能只是一个包含一列和特定数量的行的表。master..spt_values
0赞
PyBoss
6/12/2023
对于 AVG 价格,我想计算 Only One upper 和 Only ONE lower 数字的平均值,而不是整行 AVG
评论