提问人:Wagner André Yamada Vieira 提问时间:1/9/2023 更新时间:1/9/2023 访问量:25
在接触点表创建 (SQL) 中按问题分组
Group by Problem in Touchpoint Table Creation (SQL)
问:
我正在 Azure Databricks 环境中处理具有以下数据集的 SQL 查询:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
该任务包括生成下表:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
其中,列包含以下内容:
- 内容:在列表中找到的元素
- 唯一:元素在列表中单独出现的次数
- Started:元素在开头出现的次数
- Finished:元素在末尾出现的次数
- 中间:元素在开始和结束之间出现的次数。
使用以下查询,我几乎得到了结果,但不知何故,分组无法正常工作
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
你能帮我建议一个解决这个任务的代码吗?
答:
0赞
George Joseph
1/9/2023
#1
这是使用 sql server 执行此操作的一种方法。
with main_data
as (
select list
,ltrim(x.value) as split_val
,x.ordinal
,case when x.ordinal=1 and tt.path_length=1 then
'unique'
when x.ordinal=1 then
'start'
when x.ordinal=(tt.path_length+1)/2 then
'middle'
when x.ordinal=tt.path_length then
'end'
end as pos
from touchpoints_table tt
CROSS APPLY STRING_SPLIT(list,',',1) x
)
select split_val
,count(case when pos='unique' then 1 end) as unique_cnt
,count(case when pos='start' then 1 end) as start_cnt
,count(case when pos='middle' then 1 end) as middle_cnt
,count(case when pos='end' then 1 end) as end_cnt
from main_data
group by split_val
+-----------+------------+-----------+------------+---------+
| split_val | unique_cnt | start_cnt | middle_cnt | end_cnt |
+-----------+------------+-----------+------------+---------+
| AAA | 0 | 0 | 3 | 1 |
| BBB | 0 | 1 | 0 | 0 |
| CCC | 1 | 0 | 0 | 3 |
| DDD | 0 | 2 | 0 | 0 |
| EEE | 0 | 1 | 0 | 0 |
+-----------+------------+-----------+------------+---------+
上一个:保持一列固定并移位其他列
下一个:如何从列表创建接触点表
评论