在比较两个表后创建标志列

Create flag column after comparing two tables

提问人:sushi 提问时间:5/23/2019 最后编辑:Salman Asushi 更新时间:5/23/2019 访问量:1127

问:

我有两个不同的报告表,分别是日期时间和报告所有者。我想选择至少写过一次报告的人。我还需要一个计算字段来显示他们编写的报告编号。报表 1 优先,因此,如果在任何时候有人撰写了报表 1,则新的report_number列应显示 1,否则为 2(对于报表 2)。

'people' table
| person_id | full_name
--------------------------
| 1         | John L Smith
| 2         | Carl M Selt
| 3         | Another Person

'report_1' table
| report_1_id | author_person_id | date_entered | other_columns
---------------------------------------------------------------
| 1           | 1                | 2018-01-12   | foo
| 2           | 1                | 2018-02-18   | foo foo

'report_2' table
| report_2_id | author_person_id | date_entered | other_columns
---------------------------------------------------------------
| 1           | 1                | 2018-03-21   | bar
| 2           | 1                | 2018-03-28   | bar bar
| 3           | 2                | 2018-04-16   | baz
| 4           | 2                | 2018-04-30   | baz baz

预期结果:

| full_name    | report_number
---------------------------
| John L Smith | 1
| Carl M Smelt | 2

请注意,尽管约翰也撰写了一份报告 2。report_number1

报表 1 和报表 2 具有不同的附加列,即使它们在上面看起来相同。

我试过什么:

    /* Get people from both reports */
WITH report_1_people AS (
    SELECT P.full_name
    FROM report_1 R1
    INNER JOIN people P ON R1.author_person_id = P.person_id
    WHERE P.full_name IS NOT NULL 
    AND P.full_name <> ''
), report_2_people AS (
    SELECT P2.full_name
    FROM report_2 R2
    INNER JOIN people P2 ON R2.author_person_id = P2.person_id
    WHERE P2.full_name IS NOT NULL 
    AND P2.full_name <> ''
)
SELECT 
    P.full_name,
    CASE WHEN P.full_name IN ( /* Check if in report 1 */
                    SELECT full_name
                    FROM report_1)
                    THEN 1
            ELSE 2
            END AS report_number
FROM people P
WHERE P.full_name IS NOT NULL AND P.full_name <> ''
/* Eliminate duplicate names */
GROUP BY P.full_name 
/* Filter only who either authored report 1 or report 2 */
HAVING P.full_name IN (SELECT full_name
                       FROM report_1_people)
OR P.full_name IN (SELECT full_name
                   FROM report_2_people)

注意:有一个带有 people 表的 GROUP BY,因为由于某种原因存在重复的条目。

查询花了很长时间,它与数据库断开连接(24+ 小时),所以我认为我做错了什么。有没有更好的方法来基于两个表完成此标志计算列?对 SQL 相对较新,所以我想知道是否有另一种思维方式,我正在过度寻找 SQL 逻辑。

sql-server 性能 t-sql sql-server-2017

评论

0赞 SS_DBA 5/23/2019
在 CTE 查询中,请改用 。然后在你的 2 个表到表。删除 和 ...删除该语句,并将其替换为 See what you get then。person_idSelect LEFT JOINCTEpeopleGroup ByHavingCaseISNULL(CTE1, CTE2)ReportID
0赞 TT. 5/23/2019
带有检查的 having 子句,很奇怪,因为你已经在分组了。将检查移到 where 子句。然后,让 OR 检查它们是否在两个表中,可以重写为使用两个查询的 UNION 签入。P.full_nameP.full_name

答:

1赞 Salman A 5/23/2019 #1

您可以使用:OUTER APPLY

SELECT person_id, full_name, COALESCE(ca1.report_num, ca2.report_num)
FROM people
OUTER APPLY (SELECT TOP (1) 1 FROM report_1 WHERE author_person_id = people.person_id) AS ca1(report_num)
OUTER APPLY (SELECT TOP (1) 2 FROM report_2 WHERE author_person_id = people.person_id) AS ca2(report_num)

db<>fiddle 演示

评论

0赞 sushi 5/23/2019
谢谢你的建议!不过,这会在新列中返回很多 NULL 值,因为它正在执行 OUTER APPLY(我相信这就像 LEFT JOIN)。不过我还没有听说过 APPLY,所以我会研究一下这是否是我应该更频繁使用的功能。
1赞 sticky bit 5/23/2019 #2

CTE 上的 s 很可能会杀死它。IN

另一种方法是用来检查一个人是否写了报告。表达式可以处理优先级。EXISTSCASE

SELECT p.full_name,
       CASE
         WHEN EXISTS (SELECT *
                             FROM report_1 r1
                             WHERE r1.author_person_id = p.person_id) THEN
           1
         WHEN EXISTS (SELECT *
                             FROM report_2 r2
                             WHERE r2.author_person_id = p.person_id) THEN
           2
       END report_number
       FROM people p
       WHERE EXISTS (SELECT *
                            FROM report_1 r1
                            WHERE r1.author_person_id = p.person_id)
              OR EXISTS (SELECT *
                                FROM report_2 r2
                                WHERE r2.author_person_id = p.person_id);

为了提高性能,请尝试将索引放在 和 上。因为你可能会尝试一个索引(可能已经存在)或者一个复合索引 on 和 。report_1 (author_person_id)report_2 (author_person_id)peopleperson_idperson_idfull_name

评论

0赞 sushi 5/23/2019
这太棒了!我没有考虑这种方法,因为我认为相关的子查询会更慢。这种方法似乎是建议的解决方案中最快的。我删除了 case 语句中的第二个 EXIST 检查,因为底部的 where 子句中的 2 确保它们是报告 1 或报告 2。
0赞 Luis Cazares 5/23/2019 #3

这只是获得结果的另一种方法。

SELECT 
    P.full_name,
    MIN( R.Report_Number) AS report_number
FROM people P
OUTER APPLY (SELECT 1 WHERE EXISTS(SELECT * FROM report_1 R1 WHERE R1.author_person_id = P.person_id)
             UNION ALL
             SELECT 2 WHERE EXISTS(SELECT * FROM report_2 R2 WHERE R2.author_person_id = P.person_id)) AS R(Report_Number)
WHERE P.full_name IS NOT NULL AND P.full_name <> ''
/* Eliminate duplicate names */
GROUP BY P.full_name;