提问人:Alex 提问时间:4/8/2010 最后编辑:SeReGaAlex 更新时间:11/14/2023 访问量:3648251
在 SQL 表中查找重复值
Finding duplicate values in a SQL table
问:
使用一个字段很容易找到重复项:
SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
因此,如果我们有一张桌子
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
这个查询会给我们约翰、山姆、汤姆、汤姆,因为他们都有相同的.email
但是,我想要的是获得具有相同和 .email
name
也就是说,我想得到“汤姆”,“汤姆”。
我需要这个的原因:我犯了一个错误,并允许插入重复项和值。现在我需要删除/更改重复项,所以我需要先找到它们。name
email
答:
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
只需对两列进行分组即可。
注意:较旧的 ANSI 标准是将所有非聚合列都放在 GROUP BY 中,但这已随着“功能依赖”的想法而改变:
在关系数据库理论中,功能依赖关系是数据库关系中两组属性之间的约束。换句话说,函数依赖关系是描述关系中属性之间关系的约束。
支持不一致:
- 最近的 PostgreSQL 支持它。
- SQL Server(在 SQL Server 2017 中)仍需要 GROUP BY 中的所有非聚合列。
- MySQL是不可预测的,您需要:
sql_mode=only_full_group_by
- GROUP BY lname ORDER BY 显示错误的结果;
- 在没有 ANY() 的情况下,这是最便宜的聚合函数(请参阅已接受答案中的注释)。
- 甲骨文还不够主流(警告:幽默,我不了解甲骨文)。
评论
>1
=1
试试这个:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
试试这个:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
输出:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
如果您想要 dups 的 ID,请使用:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
输出:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
要删除重复项,请尝试:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
输出:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
评论
与其他答案相比,您可以查看包含所有列的整个记录(如果有)。在row_number函数部分,选择所需的唯一/重复列。PARTITION BY
SELECT *
FROM (
SELECT a.*
, Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
FROM Customers AS a
) AS b
WHERE r > 1;
当您想选择带有 ALL 字段的所有重复记录时,您可以这样写
CREATE TABLE test (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, c1 integer
, c2 text
, d date DEFAULT now()
, v text
);
INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;
SELECT *
FROM test
WHERE (c1, c2) IN (
SELECT c1, c2
FROM test
GROUP BY 1,2
HAVING count(*) > 1
)
ORDER BY 1;
在 PostgreSQL 中测试。
评论
如果您使用 Oracle,则最好采用这种方式:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
如果您想查看表中是否有任何重复的行,我使用了下面的查询:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (2, 'Aman', '[email protected]');
insert into my_table values (3, 'Tom', '[email protected]');
insert into my_table values (4, 'Raj', '[email protected]');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
试试这段代码
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
我们如何计算重复的值?? 重复 2 次或大于 2 次。 只是计算它们,而不是分组。
就这么简单
select COUNT(distinct col_01) from Table_01
评论
如果要查找重复数据(按一个或多个条件)并选择实际行。
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
我认为这将适用于搜索特定列中的重复值。
评论
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
聚会有点晚了,但我发现了一个非常酷的解决方法来查找所有重复的 ID:
SELECT email, GROUP_CONCAT(id)
FROM users
GROUP BY email
HAVING COUNT(email) > 1;
评论
GROUP_CONCAT
id
如果要删除重复项,这里有一种比在三重子选择中查找偶数/奇数行更简单的方法:
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
所以删除:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
恕我直言,更容易阅读和理解
注意:唯一的问题是您必须执行请求,直到没有删除任何行,因为每次只删除每个重复项中的 1 个
评论
You can't specify target table 'users' for update in FROM clause
这也应该有效,也许可以尝试一下。
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
在您的情况下特别好:如果您搜索具有某种前缀或一般更改的重复项,例如邮件中的新域。然后,您可以在这些列中使用 replace()
这是我想出的简单方法。它使用公用表表达式 (CTE) 和分区窗口(我认为这些功能在 SQL 2008 及更高版本中都有)。
本示例查找具有重复姓名和 dob 的所有学生。要检查重复的字段位于 OVER 子句中。您可以在投影中包括所需的任何其他字段。
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
评论
select id,name,COUNT(*) from user group by Id,Name having COUNT(*)>1
通过使用 CTE,我们还可以找到这样的重复值
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
这将从每组重复项中选择/删除除一条记录之外的所有重复记录。因此,删除将保留所有唯一记录 + 每组重复项中的一条记录。
选择重复项:
SELECT *
FROM <table>
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY <column1>, <column2>
);
删除重复项:
DELETE FROM <table>
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY <column1>, <column2>
);
请注意,如果记录量较大,可能会导致性能问题。
评论
SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;
如何在表中获取重复记录
SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
我们可以在这里使用have,它适用于聚合函数,如下所示
create table #TableB (id_account int, data int, [date] date)
insert into #TableB values (1 ,-50, '10/20/2018'),
(1, 20, '10/09/2018'),
(2 ,-900, '10/01/2018'),
(1 ,20, '09/25/2018'),
(1 ,-100, '08/01/2018')
SELECT id_account , data, COUNT(*)
FROM #TableB
GROUP BY id_account , data
HAVING COUNT(id_account) > 1
drop table #TableB
这里作为两个字段 id_account 和 data 与 Count(*) 一起使用。因此,它将给出两列中具有超过一倍相同值的所有记录。
我们错误地错过了在SQL Server表中添加任何约束,并且记录已使用前端应用程序在所有列中重复插入。然后我们可以使用下面的查询从表中删除重复的查询。
SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
TRUNCATE TABLE #OriginalTable
INSERT INTO #OriginalTable SELECT * FROM #TemNewTable
DROP TABLE #TemNewTable
在这里,我们获取了原始表的所有不同记录,并删除了原始表的记录。同样,我们将新表中的所有非重复值插入到原始表中,然后删除新表。
删除名称重复的记录
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM @YourTable
)
DELETE FROM CTE WHERE T > 1
评论
检查表中的重复记录。
select * from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
或
select * from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
删除表中的重复记录。
delete from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
或
delete from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
您可以使用 SELECT DISTINCT 关键字来删除重复项。您还可以按名称进行筛选,并在表格中获取具有该名称的每个人。
你可能想试试这个
SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1
SELECT name, email,COUNT(email)
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(email) > 1)
评论
COUNT
GROUP BY
确切的代码会有所不同,具体取决于您是要查找重复的行,还是只想查找具有相同电子邮件和名称的不同 ID。如果 id 是主键或具有唯一约束,则不存在此区别,但问题未指定此区别。在前一种情况下,您可以使用其他几个答案中给出的代码:
SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1
在后一种情况下,您将使用:
SELECT name, email, COUNT(DISTINCT id)
FROM users
GROUP BY name, email
HAVING COUNT(DISTINCT id) > 1
ORDER BY COUNT(DISTINCT id) DESC
这里最重要的是拥有最快的功能。此外,还应确定重复项的索引。自连接是一个不错的选择,但要获得更快的功能,最好先找到有重复项的行,然后与原始表联接以查找重复行的 id。最后,按除 id 之外的任何列排序,以使重复的行彼此靠近。
SELECT u.*
FROM users AS u
JOIN (SELECT username, email
FROM users
GROUP BY username, email
HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;
您也可以使用分析函数尝试此操作的另一种简单方法:
SELECT * from
(SELECT name, email,
COUNT(name) OVER (PARTITION BY name, email) cnt
FROM users)
WHERE cnt >1;
表结构:
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
解决方案 1:
SELECT *,
COUNT(*)
FROM users t1
INNER JOIN users t2
WHERE t1.id > t2.id
AND t1.name = t2.name
AND t1.email=t2.email
解决方案 2:
SELECT name,
email,
COUNT(*)
FROM users
GROUP BY name,
email
HAVING COUNT(*) > 1
如果您使用 Microsoft Access,则此方式有效:
CREATE TABLE users (id int, name varchar(10), email varchar(50));
INSERT INTO users VALUES (1, 'John', '[email protected]');
INSERT INTO users VALUES (2, 'Sam', '[email protected]');
INSERT INTO users VALUES (3, 'Tom', '[email protected]');
INSERT INTO users VALUES (4, 'Bob', '[email protected]');
INSERT INTO users VALUES (5, 'Tom', '[email protected]');
SELECT name, email, COUNT(*) AS CountOf
FROM users
GROUP BY name, email
HAVING COUNT(*)>1;
DELETE *
FROM users
WHERE id IN (
SELECT u1.id
FROM users u1, users u2
WHERE u1.name = u2.name AND u1.email = u2.email AND u1.id > u2.id
);
感谢 Tancrede Chazallet 的删除代码。
请尝试
SELECT UserID, COUNT(UserID)
FROM dbo.User
GROUP BY UserID
HAVING COUNT(UserID) > 1
您使用我使用的以下查询:
select *
FROM TABLENAME
WHERE PrimaryCoumnID NOT IN
(
SELECT MAX(PrimaryCoumnID)
FROM TABLENAME
GROUP BY AnyCoumnID
);
我认为这会对你有所帮助
SELECT name, email, COUNT(* )
FROM users
GROUP BY name, email
HAVING COUNT(*)>1
我想列出所有可能的方式,我们可以通过各种方式做到这一点,这可能会传授我们对如何做到这一点的理解,并且寻求者可以选择最适合他/她需求的解决方案之一,因为这是最常见的查询之一SQL开发人员遇到不同的业务用例,或者有时在面试中。
创建示例数据
我将从仅从此问题中设置一些示例数据开始。
Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT NewTable VALUES (1,'John','[email protected]')
INSERT NewTable VALUES (2,'Sam','[email protected]')
INSERT NewTable VALUES (3,'Tom','[email protected]')
INSERT NewTable VALUES (4,'Bob','[email protected]')
INSERT NewTable VALUES (5,'Tom','[email protected]')
1.使用 GROUP BY 子句
SELECT
name,email, COUNT(*) AS Occurence
FROM NewTable
GROUP BY name,email
HAVING COUNT(*)>1
工作原理:
- GROUP BY 子句按 “姓名”和“电子邮件”列。
- 然后,COUNT() 函数返回数字 每个组(姓名、电子邮件)的出现次数。
- 然后,HAVING 子句保留 仅重复组,即具有多个组的组 发生。
2. 使用 CTE:
若要返回每个重复行的整行,请使用公用表表达式 (CTE) 将上述查询的结果与表联接起来:NewTable
WITH cte AS (
SELECT
name,
email,
COUNT(*) occurrences
FROM NewTable
GROUP BY
name,
email
HAVING COUNT(*) > 1
)
SELECT
t1.Id,
t1.name,
t1.email
FROM NewTable t1
INNER JOIN cte ON
cte.name = t1.name AND
cte.email = t1.email
ORDER BY
t1.name,
t1.email;
3.使用 ROW_NUMBER() 功能
WITH cte AS (
SELECT
name,
email,
ROW_NUMBER() OVER (
PARTITION BY name,email
ORDER BY name,email) rownum
FROM
NewTable t1
)
SELECT
*
FROM
cte
WHERE
rownum > 1;
工作原理:
ROW_NUMBER()
按 AND 列中的值将表的行分布到分区中。重复的行在 和 列中具有重复的值,但行号不同NewTable
name
email
name
email
- 外部查询将删除每个组中的第一行。
好吧,现在我相信,您可以很好地了解如何查找重复项并应用逻辑在所有可能的情况下查找重复项。
评论
试试这个:
DECLARE @myTable TABLE
(
id INT,
name VARCHAR(10),
email VARCHAR(50)
);
INSERT @myTable
VALUES
(1, 'John', 'John-email');
INSERT @myTable
VALUES
(2, 'John', 'John-email');
INSERT @myTable
VALUES
(3, 'fred', 'John-email');
INSERT @myTable
VALUES
(4, 'fred', 'fred-email');
INSERT @myTable
VALUES
(5, 'sam', 'sam-email');
INSERT @myTable
VALUES
(6, 'sam', 'sam-email');
WITH cte
AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rowNum,
*
FROM @myTable)
SELECT c1.id,
c1.name,
c1.email
FROM cte AS c1
WHERE 1 <
(
SELECT COUNT(c2.rowNum)
FROM cte AS c2
WHERE c1.name = c2.name
AND c1.email = c2.email
);
评论
name