提问人:justnewbie89 提问时间:3/24/2019 更新时间:3/25/2019 访问量:1775
如何根据百分位数筛选 SQL 中的数据
how to filter data in sql based on percentile
问:
我有 2 个表,第一个包含客户信息,例如 id、age 和 name。第二个表格包含他们的ID,他们购买的产品信息和purchase_date(日期是从2016年到2018年)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
我想要的结果是生成包含 2017 年购买的customer_name和产品以及超过 75% 的 2016 年购买的客户的表格。
答:
0赞
radbrt
3/24/2019
#1
根据您的 SQL 风格,您可以使用更通用的分析函数获得四分位数。这基本上会向查询添加一个新列。ntile
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
这将返回第 4 个四分位数客户的最低年龄,可用于针对 2017 年进行购买的客户的子查询。
参数是您要划分为多少个桶。在这种情况下,75%+ 等于第 4 个四分位数,因此 4 个桶就可以了。该子句指定了您要按什么方式排序(在我们的例子中为 customer_age),并且还允许我们对数据进行分区(分组),例如,为不同的年份或国家创建多个排名。ntile
OVER()
0赞
Gordon Linoff
3/24/2019
#2
年龄是一个可怕的字段,可以包含在数据库中。每天都在变化。您应该有出生日期或类似日期。
要在 2016 年获得 75% 的最早值,有几种可能性。我通常会选择:row_number()
count(*)
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
然后,将其用于 2017 年的查询:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;
评论
0赞
justnewbie89
3/25/2019
感谢您的回复。在我运行您的查询后,我收到此通知如果它来自 seqnum >= 0.75 * cnt,Incorrect syntax near the keyword 'where'.
评论