提问人:Wagner André Yamada Vieira 提问时间:1/9/2023 最后编辑:Wagner André Yamada Vieira 更新时间:1/9/2023 访问量:53
如何从列表创建接触点表
How to Create a Touchpoint Table from a list
问:
我正在 Azure Databricks 环境中处理具有以下数据集的 SQL 查询:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
该任务包括生成下表:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
其中,列包含以下内容:
- 内容:在列表中找到的元素
- 唯一:元素在列表中单独出现的次数
- Started:元素在开头出现的次数
- Finished:元素在末尾出现的次数
- 中间:元素在开始和结束之间出现的次数。
使用以下查询,我几乎得到了结果,但不知何故,分组无法正常工作
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
你能帮我建议一个解决这个任务的代码吗?
答:
1赞
ValNik
1/9/2023
#1
SQL Server 的查询示例
with allElements as(
select list ,el,elN,elQty
from touchpoints_table tp
cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN
,count(*)over() elQty
from string_split(tp.list,',')
) t
)
select el
,sum(case when elQty=1 then 1 else 0 end) as 'unique'
,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el
评论
0赞
Wagner André Yamada Vieira
1/9/2023
我尝试为 Databricks SQL 编写代码,但我的组有问题,你知道我的查询可能有什么问题吗?我更新了问题。
0赞
ValNik
1/10/2023
我想测试这个假设:拆分列表,如“DDD、BBB、AAA、EEE、CCC”和“BBB、AAA、CCC”
0赞
ValNik
1/10/2023
爆炸列表:'DDD',' BBB', 'AAA', 'EEE', 'CCC','BBB' .删除(修剪)空格。dbfiddle.uk/2JJfKZxY
0赞
ValNik
1/10/2023
也许用 trim(BOTH FROM touch_array) 替换 EXPLODE(touch_array) 就足够了。您的touch_array有 1 个元素。
评论