在 GROUP BY 中使用 LIMIT 获取每组 N 个结果?

Using LIMIT within GROUP BY to get N results per group?

提问人:Wells 提问时间:1/25/2010 最后编辑:Salman AWells 更新时间:11/10/2022 访问量:540598

问:

以下查询:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

收益 率:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

我想要的只是每个 id 的前 5 个结果:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

有没有办法使用某种在 GROUP BY 中起作用的类似 LIMIT 的修饰符来做到这一点?

MySQL的 sql 分组依据 每组最大 n-n- 排名

评论

0赞 Mixcoatl 2/4/2016
SELECT * FROM (SELECT year, id, rate FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN (SELECT RID from table2) GROUP BY ID, year ORDER BY id, rate DESC) LIMIT 5

答:

14赞 bobince 1/25/2010 #1

不,您不能任意限制子查询(您可以在较新的 MySQL 中有限地限制子查询,但不能限制每组 5 个结果)。

这是一个 groupwise-maximum 类型查询,这在 SQL 中并非易事。有多种方法可以解决某些情况,这些方法可能更有效,但对于 top-n 一般来说,您需要查看 Bill 对之前类似问题的回答

与此问题的大多数解决方案一样,如果有多行具有相同的值,它可能会返回五行以上,因此您可能仍然需要大量的后处理来检查这一点。rate

2赞 Shlomi Noach 7/7/2012 #2

下面的帖子:sql:选择每组的前 N 条记录描述了在没有子查询的情况下实现此目的的复杂方法。

它通过以下方式改进了此处提供的其他解决方案:

  • 在单个查询中完成所有操作
  • 能够正确使用索引
  • 避免子查询,众所周知,子查询会在MySQL中产生糟糕的执行计划

然而,它并不漂亮。如果在MySQL中启用窗口函数(又名分析函数),一个很好的解决方案是可以实现的 - 但事实并非如此。 该帖子中使用的技巧利用了GROUP_CONCAT,有时被描述为“穷人的MySQL窗口函数”。

10赞 Brian L Cartwright 11/2/2012 #3

这需要一系列子查询来对值进行排名、限制,然后在分组时执行求和

@Rnk:=0;
@N:=2;
select
  c.id,
  sum(c.val)
from (
select
  b.id,
  b.bal
from (
select   
  if(@last_id=id,@Rnk+1,1) as Rnk,
  a.id,
  a.val,
  @last_id=id,
from (   
select 
  id,
  val 
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
14赞 Ricky Moreno 11/29/2012 #4
SELECT year, id, rate
FROM (SELECT
  year, id, rate, row_number() over (partition by id order by rate DESC)
  FROM h
  WHERE year BETWEEN 2000 AND 2009
  AND id IN (SELECT rid FROM table2)
  GROUP BY id, year
  ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5

子查询与查询几乎相同。唯一的变化是添加

row_number() over (partition by id order by rate DESC)
1赞 Dev-Ria 12/18/2012 #5

对于那些像我这样有查询超时的人。我做了以下操作,以使用特定组的限制和其他任何内容。

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
    DECLARE a INT Default 0;
    DECLARE stop_loop INT Default 0;
    DECLARE domain_val VARCHAR(250);
    DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;

    OPEN domain_list;

    SELECT COUNT(DISTINCT(domain)) INTO stop_loop 
    FROM db.one;
    -- BEGIN LOOP
    loop_thru_domains: LOOP
        FETCH domain_list INTO domain_val;
        SET a=a+1;

        INSERT INTO db.two(book,artist,title,title_count,last_updated) 
        SELECT * FROM 
        (
            SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() 
            FROM db.one 
            WHERE book = domain_val
            GROUP BY artist,title
            ORDER BY book,titleCount DESC
            LIMIT 200
        ) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();

        IF a = stop_loop THEN
            LEAVE loop_thru_domain;
        END IF;
    END LOOP loop_thru_domain;
END $$

它循环遍历域列表,然后每个域仅插入 200 个限制

9赞 Saharsh Shah 1/5/2013 #6

试试这个:

SELECT h.year, h.id, h.rate 
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx 
      FROM (SELECT h.year, h.id, h.rate 
            FROM h
            WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
            GROUP BY id, h.year
            ORDER BY id, rate DESC
            ) h, (SELECT @lastid:='', @index:=0) AS a
    ) h 
WHERE h.indx <= 5;
158赞 fthiella 3/23/2013 #7

您可以使用GROUP_CONCAT聚合函数将所有年份合并到一列中,按以下方式分组和排序:idrate

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

结果:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

然后你可以使用 FIND_IN_SET,它返回第一个参数在第二个参数中的位置,例如。

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1

SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

使用 和 的组合,并按 find_in_set 返回的位置进行筛选,然后可以使用以下查询,该查询仅返回每个 ID 的前 5 年:GROUP_CONCATFIND_IN_SET

SELECT
  yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT
      id,
      GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
    FROM
      yourtable
    GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
  yourtable.id, yourtable.year DESC;

请看小提琴 这里.

请注意,如果多行可以具有相同的速率,则应考虑在列上而不是列上使用。GROUP_CONCAT(DISTINCT rate ORDER BY rate)rateyear

返回的字符串的最大长度是有限的,因此,如果您需要为每个组选择几条记录,这很有效。GROUP_CONCAT

23赞 Vishal Kumar 10/4/2013 #8

对我来说,类似的东西

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

效果很好。没有复杂的查询。


例如:获取每个组的前 1 名

SELECT 
    *
FROM
    yourtable
WHERE
    id IN (SELECT 
            SUBSTRING_INDEX(GROUP_CONCAT(id
                            ORDER BY rate DESC),
                        ',',
                        1) id
        FROM
            yourtable
        GROUP BY year)
ORDER BY rate DESC;
1赞 MLFR2kx 12/24/2014 #9

试试这个:

SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
    @num := if(@type = `id`, @num + 1, 1) AS `row_number`,
    @type := `id` AS `dummy`
FROM (
    SELECT *
    FROM `h`
    WHERE (
        `year` BETWEEN '2000' AND '2009'
        AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
    )
    ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;
166赞 Salman A 5/16/2015 #10

您希望找到每个组的前 n 行。此答案使用与 OP 不同的示例数据提供了一个通用解决方案。

在 MySQL 8 或更高版本中,您可以使用 ROW_NUMBERRANKDENSE_RANK 函数,具体取决于前 5 名的确切定义。以下是这些函数根据降序排序生成的数字。请注意领带的处理方式:value

PKID公司 卡蒂德 价值 row_number dense_rank
1 第01页 100 *1 *1 *1
2 第01页 90 *2 *2 *2
3 第01页 90 *3 *2 *2
4 第01页 80 *4 *4 *3
5 第01页 80 *5 *4 *3
6 第01页 80 6 *4 *3
7 第01页 70 7 7 *4
8 第01页 60 8 8 *5
9 第01页 50 9 9 6
10 第01页 40 10 10 7

选择函数后,请按以下方式使用:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value DESC) AS n
    FROM t
) AS x
WHERE n <= 5

DB<>小提琴


在MySQL 5.x中,您可以使用穷人的分区等级来实现预期的结果:外部将表与自身连接起来,对于每一行,计算它前面的行数(例如,前一行可能是具有更高值的行)。

以下将产生类似于函数的结果:RANK

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid

进行以下更改以生成类似于函数的结果:DENSE_RANK

COUNT(DISTINCT b.value)

或者进行以下更改以生成类似于函数的结果:ROW_NUMBER

ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)

DB<>小提琴

6赞 Wang Wen'an 5/9/2016 #11

构建虚拟列(如 Oracle 中的 RowID)

桌子:

CREATE TABLE `stack` 
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

数据:

insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);

SQL 是这样的:

select t3.year,t3.id,t3.rate 
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3 
where rownum <=3 order by id,rate DESC;

如果删除 t3 中的 where 子句,则如下所示:

enter image description here

GET “TOP N Record” --> 添加 in 子句(t3 的 where 子句);rownum <=3where

选择 “the year” -->添加 in 子句(t3 的 where-子句);BETWEEN 2000 AND 2009where

1赞 Himanshu Patel 7/15/2016 #12

请尝试以下存储过程。我已经验证过了。我得到了正确的结果,但没有使用 .groupby

CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    SET @query_string='';

      OPEN cur1;
      read_loop: LOOP

      FETCH cur1 INTO tenants ;

      IF done THEN
        LEAVE read_loop;
      END IF;

      SET @datasource1 = tenants;
      SET @query_string = concat(@query_string,'(select * from demo  where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');

       END LOOP; 
      close cur1;

    SET @query_string  = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));  
  select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END
3赞 John 10/25/2016 #13

花了一些工作,但我认为我的解决方案将是值得分享的,因为它看起来很优雅,而且速度很快。

SELECT h.year, h.id, h.rate 
  FROM (
    SELECT id, 
      SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
      FROM h
      WHERE year BETWEEN 2000 AND 2009
      GROUP BY id
      ORDER BY id
  ) AS h_temp
    LEFT JOIN h ON h.id = h_temp.id 
      AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l

请注意,此示例是为问题的目的而指定的,并且可以很容易地修改以用于其他类似目的。

0赞 user19714507 11/10/2022 #14

我刚刚为MYSQL创建了一个顶级操作。代码很简单。

drop table if exists h;
create table h(id varchar(5), year int, rate numeric(8,2), primary key(id,year));
insert into h(year, id, rate) values
(2006,'p01',8),
(2003,'p01',7.4),
(2008,'p01',6.8),
(2001,'p01',5.9),
(2007,'p01',5.3),
(2009,'p01',4.4),
(2002,'p01',3.9),
(2004,'p01',3.5),
(2005,'p01',2.1),
(2000,'p01',0.8),
(2001,'p02',12.5),
(2004,'p02',12.4),
(2002,'p02',12.2),
(2003,'p02',10.3),
(2000,'p02',8.7),
(2006,'p02',4.6),
(2007,'p02',3.3);

select id, year, rate
from 
(
    select id, year, rate, @last, if(@last=id,@top:=@top+1, @top:=0) as ztop, @last:=id update_last
    from h
    order by id, rate desc, year desc
) t2
where ztop<5