提问人:user22830709 提问时间:10/31/2023 最后编辑:Dale Kuser22830709 更新时间:10/31/2023 访问量:69
将多个表连接在一起而不创建重复项,从而弄乱计数和求和函数?
Join several tables together without creating duplicates messing with count and sum functions?
问:
数据库:
Sales.Customers(CustomerID ,CustomerName)
Sales.Orders(OrderID, CustomerID)
Sales.OrderLines(OrderID, Quantity, UnitPrice)
Sales.Invoices(InvoiceID, OrderID, CustomerID)
Sales.InvoiceLines(InvoiceID, Quantity, UnitPrice)
目标是生成以下输出:
column1: list CustomerID (only orders converted into invoices)
Column2: list of CustomerName
Column3: number of orders by customer ID (only orders converted into invoices)
Column4: number of invoices by customer ID
Column5: sum (Quantity*UnitPrice) of orders by CustomerID (only orders converted into invoices)
Column6: sum (Quantity*UnitPrice) of invoices by Customer ID
Column7: the difference between column 4 and column 5
我设法加入了前 4 个表并得到了预期的结果
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID
但是,一旦我加入最后一个列,COUNT 列仍然正确,但两个 SUM 列都没有返回预期数字(预期 SUM 的 1.6 倍)。
返回不正确 SUM 的查询:
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(DISTINCT(il.Quantity * il.UnitPrice)) AS [TOTAL invoices]
, SUM(DISTINCT(ol.Quantity * ol.UnitPrice)) AS [TOTAL orders]
FROM Sales.Invoices AS i
LEFT JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
LEFT JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
LEFT JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
LEFT JOIN Sales. Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID
我尝试了不同类型的连接、不同的顺序、子查询和除法查询......但总是以一些错误或不正确的结果告终。 任何帮助将不胜感激!
答:
在上一个查询中,您没有与 相关。因此,您实际上是在这两个表之间进行交叉连接。您可以将查询更改为 ,删除所有聚合和分组依据子句,检查结果,您将看到它。InvoiceLines
OrderLines
SELECT *
根据您的要求,使用派生表或 cte 更容易做到这一点。在派生表中对 or 执行必要的计算,然后在外部查询上将其联接到表中Orders
Invoices
Customers
Select
c.CustomerID
, c.CustomerName
, i.[number invoices]
, i.[TOTAL invoices]
, o.[number orders]
, o.[TOTAL orders]
FROM Sales.Customers AS c
LEFT JOIN
(
SELECT i.CustomerID
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
GROUP BY i.CustomerID
) i ON i.CustomerID = c.CustomerID
LEFT JOIN
(
SELECT o.CustomerID
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, SUM(ol.Quantity * ol.UnitPrice) AS [TOTAL orders]
FROM Sales.Orders AS o
INNER JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
GROUP BY o.CustomerID
) o ON o.CustomerID = c.CustomerID
评论
SUM(DISTINCT(
毫无意义。而且现在发票和订单之间根本没有相关性,这不是 OP 想要的。
要进行故障排除,请删除所有组依据和总和,选择一个订单(使用 where 子句)并实际查看详细数据。
这都是基本问题解决过程的一部分。你无法解决问题,看摘要,你需要看细节。
首先,你有“重复计算”
一个订单可能有一张发票和三个发票行。当你加入这些时,你会得到三条记录。该订单可能还有两个订单行。当您将这三条记录连接到两个订单行时,您将获得六 (3x2) 条记录。因此,现在当您汇总行或发票时,您有太多的记录,并且这个数字被夸大了。
其次,这绝对是不正确的
SUM(DISTINCT(ol.Quantity * ol.UnitPrice))
如果 7 个完全不相关的订单行具有相同的值。它会扔掉其中的六个。这绝对是不正确的。
你不能仅仅通过连接所有表格来回答你需要的问题。
您需要执行一个单独的子表(或 CTE)查询,按 customerid 汇总订单价值,然后才能加入该查询。
-- this dataset is unique on customer id
-- so won't double count when joined to any other dataset
-- that is unique on customerid
with ordersummary as (
select o.customerid, SUM(ol.Quantity * ol.UnitPrice) as ordervalue
from Sales.Orders AS o
INNER JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
group by customerid
),
-- this dataset is also unique on customer id
-- so won't double count when joined to any other dataset
-- that is unique on customerid
invoicesummary as
(
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
)
-- now we use our CTE's
-- and join on thoe unique customerid value. No double counting.
SELECT
invoicesummary.CustomerID
, invoicesummary.CustomerName
, invoicesummary.[number orders]
, invoicesummary.[number invoices]
, invoicesummary.[TOTAL invoices]
, ordersummary.ordervalue
FROM invoicesummary
INNER JOIN ordersummary
ON ordersummary.customerid=invoicesummary.Customerid
ORDER BY invoicesummary.CustomerID
您需要分别预汇总订单和发票。你不能同时聚合它们,否则你会得到一个巨大的交叉联接。
由于您只需要包含订单的发票,因此您需要汇总每张发票中的订单,然后汇总所有发票。
向某物投掷 DISTINCT
不是解决方案。首先,您需要考虑为什么会出现重复项。
SELECT
c.CustomerID
, c.CustomerName
, i.NumberOrders
, i.NumberInvoices
, i.TotalInvoices
, i.TotalOrders
FROM Sales.Customers AS c
LEFT JOIN (
SELECT
i.CustomerID,
COUNT(*) AS NumberInvoices,
SUM(il.Quantity * il.UnitPrice) AS TotalInvoices,
SUM(o.NumberOfOrders) AS NumberOfOrders,
SUM(o.TotalOrders) AS TotalOrders
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID
JOIN (
SELECT
o.InvoiceID,
COUNT(*) AS NumberOrders,
SUM(ol.Quantity * ol.UnitPrice) AS TotalOrders
FROM Sales.Orders AS o
JOIN Sales.OrderLines AS ol ON ol.OrderID = o.OrderID
GROUP BY o.InvoiceID
) AS o ON o.InvoiceID = i.InvoiceID
GROUP BY i.CustomerID
) AS i ON c.CustomerID = i.CustomerID
ORDER BY
c.CustomerID;
您也可以使用 或 进行这种预聚合。子查询中的移动和分组 ID 可以从CROSS APPLY
OUTER APPLY
ON
SELECT
SELECT
c.CustomerID
, c.CustomerName
, i.NumberOrders
, i.NumberInvoices
, i.TotalInvoices
, i.TotalOrders
FROM Sales.Customers AS c
OUTER APPLY (
SELECT
COUNT(*) AS NumberInvoices,
SUM(il.Quantity * il.UnitPrice) AS TotalInvoices,
SUM(o.NumberOfOrders) AS NumberOfOrders,
SUM(o.TotalOrders) AS TotalOrders
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID
CROSS APPLY (
SELECT
COUNT(*) AS NumberOrders,
SUM(ol.Quantity * ol.UnitPrice) AS TotalOrders
FROM Sales.Orders AS o
JOIN Sales.OrderLines AS ol ON ol.OrderID = o.OrderID
WHERE o.InvoiceID = i.InvoiceID
GROUP BY o.InvoiceID
) AS o
WHERE c.CustomerID = i.CustomerID
GROUP BY c.CustomerID
) AS i
ORDER BY
c.CustomerID;
上一个:将行透视为具有重复 ID 的列
评论
DISTINCT
COUNT(DISTINCT o.OrderID)