提问人:Wells 提问时间:1/25/2010 最后编辑:Salman AWells 更新时间:11/10/2022 访问量:540598
在 GROUP BY 中使用 LIMIT 获取每组 N 个结果?
Using LIMIT within GROUP BY to get N results per group?
问:
以下查询:
SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC
收益 率:
year id rate
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2009 p01 4.4
2002 p01 3.9
2004 p01 3.5
2005 p01 2.1
2000 p01 0.8
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
2006 p02 4.6
2007 p02 3.3
我想要的只是每个 id 的前 5 个结果:
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
有没有办法使用某种在 GROUP BY 中起作用的类似 LIMIT 的修饰符来做到这一点?
答:
不,您不能任意限制子查询(您可以在较新的 MySQL 中有限地限制子查询,但不能限制每组 5 个结果)。
这是一个 groupwise-maximum 类型查询,这在 SQL 中并非易事。有多种方法可以解决某些情况,这些方法可能更有效,但对于 top-n 一般来说,您需要查看 Bill 对之前类似问题的回答。
与此问题的大多数解决方案一样,如果有多行具有相同的值,它可能会返回五行以上,因此您可能仍然需要大量的后处理来检查这一点。rate
下面的帖子:sql:选择每组的前 N 条记录描述了在没有子查询的情况下实现此目的的复杂方法。
它通过以下方式改进了此处提供的其他解决方案:
- 在单个查询中完成所有操作
- 能够正确使用索引
- 避免子查询,众所周知,子查询会在MySQL中产生糟糕的执行计划
然而,它并不漂亮。如果在MySQL中启用窗口函数(又名分析函数),一个很好的解决方案是可以实现的 - 但事实并非如此。 该帖子中使用的技巧利用了GROUP_CONCAT,有时被描述为“穷人的MySQL窗口函数”。
这需要一系列子查询来对值进行排名、限制,然后在分组时执行求和
@Rnk:=0;
@N:=2;
select
c.id,
sum(c.val)
from (
select
b.id,
b.bal
from (
select
if(@last_id=id,@Rnk+1,1) as Rnk,
a.id,
a.val,
@last_id=id,
from (
select
id,
val
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
SELECT year, id, rate
FROM (SELECT
year, id, rate, row_number() over (partition by id order by rate DESC)
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5
子查询与查询几乎相同。唯一的变化是添加
row_number() over (partition by id order by rate DESC)
对于那些像我这样有查询超时的人。我做了以下操作,以使用特定组的限制和其他任何内容。
DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
DECLARE a INT Default 0;
DECLARE stop_loop INT Default 0;
DECLARE domain_val VARCHAR(250);
DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;
OPEN domain_list;
SELECT COUNT(DISTINCT(domain)) INTO stop_loop
FROM db.one;
-- BEGIN LOOP
loop_thru_domains: LOOP
FETCH domain_list INTO domain_val;
SET a=a+1;
INSERT INTO db.two(book,artist,title,title_count,last_updated)
SELECT * FROM
(
SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW()
FROM db.one
WHERE book = domain_val
GROUP BY artist,title
ORDER BY book,titleCount DESC
LIMIT 200
) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();
IF a = stop_loop THEN
LEAVE loop_thru_domain;
END IF;
END LOOP loop_thru_domain;
END $$
它循环遍历域列表,然后每个域仅插入 200 个限制
试试这个:
SELECT h.year, h.id, h.rate
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx
FROM (SELECT h.year, h.id, h.rate
FROM h
WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
GROUP BY id, h.year
ORDER BY id, rate DESC
) h, (SELECT @lastid:='', @index:=0) AS a
) h
WHERE h.indx <= 5;
您可以使用GROUP_CONCAT聚合函数将所有年份合并到一列中,按以下方式分组和排序:id
rate
SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM yourtable
GROUP BY id
结果:
-----------------------------------------------------------
| ID | GROUPED_YEAR |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007 |
-----------------------------------------------------------
然后你可以使用 FIND_IN_SET,它返回第一个参数在第二个参数中的位置,例如。
SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1
SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6
使用 和 的组合,并按 find_in_set 返回的位置进行筛选,然后可以使用以下查询,该查询仅返回每个 ID 的前 5 年:GROUP_CONCAT
FIND_IN_SET
SELECT
yourtable.*
FROM
yourtable INNER JOIN (
SELECT
id,
GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM
yourtable
GROUP BY id) group_max
ON yourtable.id = group_max.id
AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
yourtable.id, yourtable.year DESC;
请注意,如果多行可以具有相同的速率,则应考虑在列上而不是列上使用。GROUP_CONCAT(DISTINCT rate ORDER BY rate)
rate
year
返回的字符串的最大长度是有限的,因此,如果您需要为每个组选择几条记录,这很有效。GROUP_CONCAT
对我来说,类似的东西
SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N)
效果很好。没有复杂的查询。
例如:获取每个组的前 1 名
SELECT
*
FROM
yourtable
WHERE
id IN (SELECT
SUBSTRING_INDEX(GROUP_CONCAT(id
ORDER BY rate DESC),
',',
1) id
FROM
yourtable
GROUP BY year)
ORDER BY rate DESC;
试试这个:
SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
@num := if(@type = `id`, @num + 1, 1) AS `row_number`,
@type := `id` AS `dummy`
FROM (
SELECT *
FROM `h`
WHERE (
`year` BETWEEN '2000' AND '2009'
AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
)
ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;
您希望找到每个组的前 n 行。此答案使用与 OP 不同的示例数据提供了一个通用解决方案。
在 MySQL 8 或更高版本中,您可以使用 ROW_NUMBER
、RANK
或 DENSE_RANK
函数,具体取决于前 5 名的确切定义。以下是这些函数根据降序排序生成的数字。请注意领带的处理方式:value
PKID公司 | 卡蒂德 | 价值 | row_number | 排 | dense_rank |
---|---|---|---|---|---|
1 | 第01页 | 100 | *1 | *1 | *1 |
2 | 第01页 | 90 | *2 | *2 | *2 |
3 | 第01页 | 90 | *3 | *2 | *2 |
4 | 第01页 | 80 | *4 | *4 | *3 |
5 | 第01页 | 80 | *5 | *4 | *3 |
6 | 第01页 | 80 | 6 | *4 | *3 |
7 | 第01页 | 70 | 7 | 7 | *4 |
8 | 第01页 | 60 | 8 | 8 | *5 |
9 | 第01页 | 50 | 9 | 9 | 6 |
10 | 第01页 | 40 | 10 | 10 | 7 |
选择函数后,请按以下方式使用:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value DESC) AS n
FROM t
) AS x
WHERE n <= 5
在MySQL 5.x中,您可以使用穷人的分区等级来实现预期的结果:外部将表与自身连接起来,对于每一行,计算它前面的行数(例如,前一行可能是具有更高值的行)。
以下将产生类似于函数的结果:RANK
SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid
进行以下更改以生成类似于函数的结果:DENSE_RANK
COUNT(DISTINCT b.value)
或者进行以下更改以生成类似于函数的结果:ROW_NUMBER
ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)
构建虚拟列(如 Oracle 中的 RowID)
桌子:
CREATE TABLE `stack`
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
数据:
insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);
SQL 是这样的:
select t3.year,t3.id,t3.rate
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3
where rownum <=3 order by id,rate DESC;
如果删除 t3 中的 where 子句,则如下所示:
GET “TOP N Record” --> 添加 in 子句(t3 的 where 子句);rownum <=3
where
选择 “the year” -->添加 in 子句(t3 的 where-子句);BETWEEN 2000 AND 2009
where
请尝试以下存储过程。我已经验证过了。我得到了正确的结果,但没有使用 .groupby
CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
SET @query_string='';
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO tenants ;
IF done THEN
LEAVE read_loop;
END IF;
SET @datasource1 = tenants;
SET @query_string = concat(@query_string,'(select * from demo where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');
END LOOP;
close cur1;
SET @query_string = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));
select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
花了一些工作,但我认为我的解决方案将是值得分享的,因为它看起来很优雅,而且速度很快。
SELECT h.year, h.id, h.rate
FROM (
SELECT id,
SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
FROM h
WHERE year BETWEEN 2000 AND 2009
GROUP BY id
ORDER BY id
) AS h_temp
LEFT JOIN h ON h.id = h_temp.id
AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l
请注意,此示例是为问题的目的而指定的,并且可以很容易地修改以用于其他类似目的。
我刚刚为MYSQL创建了一个顶级操作。代码很简单。
drop table if exists h;
create table h(id varchar(5), year int, rate numeric(8,2), primary key(id,year));
insert into h(year, id, rate) values
(2006,'p01',8),
(2003,'p01',7.4),
(2008,'p01',6.8),
(2001,'p01',5.9),
(2007,'p01',5.3),
(2009,'p01',4.4),
(2002,'p01',3.9),
(2004,'p01',3.5),
(2005,'p01',2.1),
(2000,'p01',0.8),
(2001,'p02',12.5),
(2004,'p02',12.4),
(2002,'p02',12.2),
(2003,'p02',10.3),
(2000,'p02',8.7),
(2006,'p02',4.6),
(2007,'p02',3.3);
select id, year, rate
from
(
select id, year, rate, @last, if(@last=id,@top:=@top+1, @top:=0) as ztop, @last:=id update_last
from h
order by id, rate desc, year desc
) t2
where ztop<5
评论