如何从列表创建接触点表

How to Create a Touchpoint Table from a list

提问人:Wagner André Yamada Vieira 提问时间:1/9/2023 最后编辑:Wagner André Yamada Vieira 更新时间:1/9/2023 访问量:53

问:

我正在 Azure Databricks 环境中处理具有以下数据集的 SQL 查询:

CREATE OR REPLACE TABLE touchpoints_table
(
  List  STRING,
  Path_Lenght INT
);

INSERT INTO touchpoints_table VALUES
  ('BBB, AAA, CCC', 3),  
  ('BBB', 1),
  ('DDD, AAA', 2), 
  ('DDD, BBB, AAA, EEE, CCC', 5), 
  ('EEE, AAA, EEE, CCC', 4);
  
SELECT * FROM touchpoints_table
|     |             List           |   Path_length   | 
|  0  | BBB, AAA, CCC              |       3         |  
|  1  | CCC                        |       1         |   
|  2  | DDD, AAA                   |       2         |   
|  3  | DDD, BBB, AAA, EEE, CCC    |       5         |  
|  4  | EEE, AAA, EEE, CCC         |       4         |

该任务包括生成下表:

|     |   Content       |   Unique    |  Started  |  Middleway  |  Finished  |
|  0  |     AAA         |     0       |     0     |      3      |      1     | 
|  1  |     BBB         |     0       |     1     |      1      |      0     |
|  2  |     CCC         |     1       |     0     |      0      |      3     |
|  3  |     DDD         |     0       |     2     |      0      |      0     |
|  4  |     EEE         |     0       |     1     |      2      |      0     |

其中,列包含以下内容:

  • 内容在列表中找到的元素
  • 唯一:元素在列表中单独出现的次数
  • Started:元素在开头出现的次数
  • Finished:元素在末尾出现的次数
  • 中间:元素在开始和结束之间出现的次数。

使用以下查询,我几乎得到了结果,但不知何故,分组无法正常工作

WITH tb1 AS(
  SELECT 
    CAST(touch_array AS STRING) AS touch_list,
    EXPLODE(touch_array) AS explode_list,
    ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
    COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
  FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table) 
  )
  SELECT
     explode_list AS Content,
     SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
     SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
     SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
     SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished  
  FROM tb1 
  GROUP BY explode_list
  ORDER BY explode_list    
|     |   Content       |   Unique    |  Started  |  Middleway  |  Finished  |
|  0  |     AAA         |     0       |     0     |      3      |      1     | 
|  1  |     BBB         |     0       |     0     |      1      |      0     |
|  2  |     CCC         |     0       |     0     |      0      |      3     |
|  3  |     EEE         |     0       |     0     |      2      |      0     |
|  4  |     BBB         |     1       |     1     |      0      |      0     |
|  5  |     DDD         |     0       |     2     |      0      |      0     |
|  6  |     EEE         |     0       |     1     |      0      |      0     |

你能帮我建议一个解决这个任务的代码吗?

DataFrame 数据分析 Databricks-SQL

评论


答:

1赞 ValNik 1/9/2023 #1

SQL Server 的查询示例

with allElements as(
  select list ,el,elN,elQty
  from touchpoints_table tp 
  cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN 
                ,count(*)over() elQty
                from string_split(tp.list,',')
              ) t
)
select el
   ,sum(case when elQty=1 then 1 else 0 end) as 'unique'
   ,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
   ,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
   ,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el

演示

评论

0赞 Wagner André Yamada Vieira 1/9/2023
我尝试为 Databricks SQL 编写代码,但我的组有问题,你知道我的查询可能有什么问题吗?我更新了问题。
0赞 ValNik 1/10/2023
我想测试这个假设:拆分列表,如“DDD、BBB、AAA、EEE、CCC”和“BBB、AAA、CCC”
0赞 ValNik 1/10/2023
爆炸列表:'DDD',' BBB', 'AAA', 'EEE', 'CCC','BBB' .删除(修剪)空格。dbfiddle.uk/2JJfKZxY
0赞 ValNik 1/10/2023
也许用 trim(BOTH FROM touch_array) 替换 EXPLODE(touch_array) 就足够了。您的touch_array有 1 个元素。