将多个表连接在一起而不创建重复项,从而弄乱计数和求和函数?

Join several tables together without creating duplicates messing with count and sum functions?

提问人:user22830709 提问时间:10/31/2023 最后编辑:Dale Kuser22830709 更新时间:10/31/2023 访问量:69

问:

数据库:

Sales.Customers(CustomerID ,CustomerName) 
Sales.Orders(OrderID, CustomerID) 
Sales.OrderLines(OrderID, Quantity, UnitPrice) 
Sales.Invoices(InvoiceID, OrderID, CustomerID) 
Sales.InvoiceLines(InvoiceID, Quantity, UnitPrice)

目标是生成以下输出:

column1: list CustomerID (only orders converted into invoices) 
Column2: list of CustomerName 
Column3: number of orders by customer ID (only orders converted into invoices) 
Column4: number of invoices by customer ID 
Column5: sum (Quantity*UnitPrice) of orders by CustomerID (only orders converted into invoices) 
Column6: sum  (Quantity*UnitPrice) of invoices by Customer ID 
Column7: the difference between column 4 and column 5

我设法加入了前 4 个表并得到了预期的结果

Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]

FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID    
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID

GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID

但是,一旦我加入最后一个列,COUNT 列仍然正确,但两个 SUM 列都没有返回预期数字(预期 SUM 的 1.6 倍)。

返回不正确 SUM 的查询:

Select 
        c.CustomerID
        , c.CustomerName
        , COUNT(DISTINCT(o.OrderID)) AS [number orders]
        , COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
        , SUM(DISTINCT(il.Quantity * il.UnitPrice)) AS [TOTAL invoices]
        , SUM(DISTINCT(ol.Quantity * ol.UnitPrice)) AS [TOTAL orders] 
            
FROM Sales.Invoices AS i
        LEFT JOIN Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID 
        LEFT JOIN Sales.Orders AS o
        ON o.orderID = i.OrderID    
        LEFT JOIN Sales.OrderLines AS ol
        ON ol.OrderID = o.OrderID
        LEFT JOIN Sales. Customers AS c
        ON c.CustomerID = i.CustomerID
    
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID

我尝试了不同类型的连接、不同的顺序、子查询和除法查询......但总是以一些错误或不正确的结果告终。 任何帮助将不胜感激!

sql-server 联接

评论

0赞 Charlieface 10/31/2023
两者之间的关系尚不清楚。一张发票可以有多个订单吗,反之亦然?如果有订单,是否总是有发票,反之亦然?同样,订单和发票行如何关联?
0赞 Dale K 10/31/2023
除了您的查询外,请提供一个最小的可重现示例,其中包含示例数据和所需的结果。
0赞 jarlh 10/31/2023
请注意,这不是一个函数,它是一个集合量词,跳过那些额外的括号,简单地编写以使代码更清晰。(但是,我不认为这是正确的解决方案。DISTINCTCOUNT(DISTINCT o.OrderID)

答:

0赞 Squirrel 10/31/2023 #1

在上一个查询中,您没有与 相关。因此,您实际上是在这两个表之间进行交叉连接。您可以将查询更改为 ,删除所有聚合和分组依据子句,检查结果,您将看到它。InvoiceLinesOrderLinesSELECT *

根据您的要求,使用派生表或 cte 更容易做到这一点。在派生表中对 or 执行必要的计算,然后在外部查询上将其联接到表中OrdersInvoicesCustomers

Select 
        c.CustomerID
        , c.CustomerName
        , i.[number invoices]
        , i.[TOTAL invoices]
        , o.[number orders]           
        , o.[TOTAL orders]                 
FROM    Sales.Customers AS c
        LEFT JOIN
        (
            SELECT i.CustomerID
                   , COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
                   , SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
            FROM   Sales.Invoices AS i
                   INNER JOIN Sales.InvoiceLines AS il
                           ON il.InvoiceID = i.InvoiceID 
            GROUP BY i.CustomerID
        ) i                ON i.CustomerID = c.CustomerID 
        LEFT JOIN
        (
            SELECT  o.CustomerID
                  , COUNT(DISTINCT(o.OrderID)) AS [number orders]           
                  , SUM(ol.Quantity * ol.UnitPrice) AS [TOTAL orders] 
            FROM    Sales.Orders AS o
                    INNER JOIN Sales.OrderLines AS ol
                           ON ol.OrderID = o.OrderID
            GROUP BY o.CustomerID
        ) o                ON o.CustomerID = c.CustomerID

评论

0赞 Charlieface 10/31/2023
SUM(DISTINCT(毫无意义。而且现在发票和订单之间根本没有相关性,这不是 OP 想要的。
0赞 Squirrel 10/31/2023
你是对的。将编辑
0赞 Nick.Mc 10/31/2023 #2

要进行故障排除,请删除所有组依据和总和,选择一个订单(使用 where 子句)并实际查看详细数据。

这都是基本问题解决过程的一部分。你无法解决问题,看摘要,你需要看细节。

首先,你有“重复计算”

一个订单可能有一张发票和三个发票行。当你加入这些时,你会得到三条记录。该订单可能还有两个订单行。当您将这三条记录连接到两个订单行时,您将获得六 (3x2) 条记录。因此,现在当您汇总行或发票时,您有太多的记录,并且这个数字被夸大了。

其次,这绝对是不正确的

SUM(DISTINCT(ol.Quantity * ol.UnitPrice))

如果 7 个完全不相关的订单行具有相同的值。它会扔掉其中的六个。这绝对是不正确的。

你不能仅仅通过连接所有表格来回答你需要的问题。

您需要执行一个单独的子表(或 CTE)查询,按 customerid 汇总订单价值,然后才能加入该查询。

-- this dataset is unique on customer id 
-- so won't double count when joined to any other dataset 
-- that is unique on customerid
with ordersummary as (
select o.customerid, SUM(ol.Quantity * ol.UnitPrice) as ordervalue
from Sales.Orders AS o
INNER JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
group by customerid
),
-- this dataset is also unique on customer id 
-- so won't double count when joined to any other dataset 
-- that is unique on customerid
invoicesummary as 

(
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]

FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID    
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
)
-- now we use our CTE's
-- and join on thoe unique customerid value. No double counting.
SELECT
    invoicesummary.CustomerID
    , invoicesummary.CustomerName
    , invoicesummary.[number orders]
    , invoicesummary.[number invoices]
    , invoicesummary.[TOTAL invoices]
    , ordersummary.ordervalue 
FROM invoicesummary
    INNER JOIN ordersummary 
    ON ordersummary.customerid=invoicesummary.Customerid
ORDER BY invoicesummary.CustomerID
0赞 Charlieface 10/31/2023 #3

您需要分别预汇总订单和发票。你不能同时聚合它们,否则你会得到一个巨大的交叉联接。

由于您只需要包含订单的发票,因此您需要汇总每张发票中的订单,然后汇总所有发票。

向某物投掷 DISTINCT 不是解决方案。首先,您需要考虑为什么会出现重复项。

SELECT
        c.CustomerID
        , c.CustomerName
        , i.NumberOrders
        , i.NumberInvoices
        , i.TotalInvoices
        , i.TotalOrders
FROM Sales.Customers AS c
LEFT JOIN (
    SELECT
      i.CustomerID,
      COUNT(*) AS NumberInvoices,
      SUM(il.Quantity * il.UnitPrice) AS TotalInvoices,
      SUM(o.NumberOfOrders) AS NumberOfOrders,
      SUM(o.TotalOrders) AS TotalOrders
    FROM Sales.Invoices AS i
    JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID
    JOIN (
        SELECT
          o.InvoiceID,
          COUNT(*) AS NumberOrders,
          SUM(ol.Quantity * ol.UnitPrice) AS TotalOrders
        FROM Sales.Orders AS o
        JOIN Sales.OrderLines AS ol ON ol.OrderID = o.OrderID
        GROUP BY o.InvoiceID
    ) AS o ON o.InvoiceID = i.InvoiceID
    GROUP BY i.CustomerID
) AS i ON c.CustomerID = i.CustomerID
ORDER BY
  c.CustomerID;

您也可以使用 或 进行这种预聚合。子查询中的移动和分组 ID 可以从CROSS APPLYOUTER APPLYONSELECT

SELECT
        c.CustomerID
        , c.CustomerName
        , i.NumberOrders
        , i.NumberInvoices
        , i.TotalInvoices
        , i.TotalOrders
FROM Sales.Customers AS c
OUTER APPLY (
    SELECT
      COUNT(*) AS NumberInvoices,
      SUM(il.Quantity * il.UnitPrice) AS TotalInvoices,
      SUM(o.NumberOfOrders) AS NumberOfOrders,
      SUM(o.TotalOrders) AS TotalOrders
    FROM Sales.Invoices AS i
    JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID
    CROSS APPLY (
        SELECT
          COUNT(*) AS NumberOrders,
          SUM(ol.Quantity * ol.UnitPrice) AS TotalOrders
        FROM Sales.Orders AS o
        JOIN Sales.OrderLines AS ol ON ol.OrderID = o.OrderID
        WHERE o.InvoiceID = i.InvoiceID
        GROUP BY o.InvoiceID
    ) AS o
    WHERE c.CustomerID = i.CustomerID
    GROUP BY c.CustomerID
) AS i 
ORDER BY
  c.CustomerID;