提问人:Volodymyr 提问时间:7/19/2023 最后编辑:marc_sVolodymyr 更新时间:7/20/2023 访问量:50
MySQL 用户事件、操作和会话序列
MySQL user events actions and sessions sequence
问:
我在 MySQL 上有数据库 - 包含用户操作日志的数据集。我们有下一个数据收集逻辑 - 每个用户都有自己的user_id,我们在用户会话期间记录事件操作的日志。
user_id | session_id | 日期时间 | 事件 |
---|---|---|---|
1 | 机 管 局 | 2023-01-01 13:12:11 | 登录 |
1 | 机 管 局 | 2023-01-01 14:12:10 | 买 |
1 | BB型 | 2023-01-02 11:12:10 | 页 |
2 | 抄送 | 2023-01-01 10:11:01 | 登录 |
2 | GG系列 | 2023-01-03 11:12:11 | 注销 |
2 | GG系列 | 2023-01-03 13:11:03 | 点击 |
2 | GG系列 | 2023-01-03 14:10:07 | 注销 |
主要目标是使用原始数据准备 MySQL 查询,并添加 2 个新的计算列:一个是每个用户会话期间的事件操作序列,另一个是 SQL 所有用户生存期内的会话序列。
预期输出:
user_id | session_id | 日期时间 | 事件 | event_seq | session_seq |
---|---|---|---|---|---|
1 | 机 管 局 | 2023-01-01 13:12:11 | 登录 | 1 | 1 |
1 | 机 管 局 | 2023-01-01 14:12:10 | 买 | 2 | 1 |
1 | BB型 | 2023-01-02 11:12:10 | 页 | 1 | 2 |
2 | 抄送 | 2023-01-01 10:11:01 | 登录 | 1 | 1 |
2 | GG系列 | 2023-01-03 11:12:11 | 注销 | 1 | 2 |
2 | GG系列 | 2023-01-03 13:11:03 | 点击 | 2 | 2 |
2 | GG系列 | 2023-01-03 14:10:07 | 注销 | 3 | 2 |
答:
0赞
Bernd Buffen
7/19/2023
#1
您可以使用如下查询:
SELECT ds.*,
dense_rank() over (PARTITION BY user_id,session_id order BY session_id,`dateTime` ) AS event_seq,
dense_rank() over (PARTITION BY user_id order BY user_id,session_id ) AS session_seq
FROM dataset AS ds
ORDER BY user_id,session_id,`dateTime`;
样本
mysql> SELECT * FROM dataset;
+----+---------+------------+---------------------+--------+
| id | user_id | session_id | dateTime | event |
+----+---------+------------+---------------------+--------+
| 1 | 1 | aa | 2023-01-01 13:12:11 | login |
| 2 | 1 | aa | 2023-01-01 14:12:10 | buy |
| 3 | 1 | bb | 2023-01-02 11:12:10 | page |
| 4 | 2 | cc | 2023-01-01 10:11:01 | login |
| 5 | 2 | gg | 2023-01-03 11:12:11 | logout |
| 6 | 2 | gg | 2023-01-03 13:11:03 | click |
| 7 | 2 | gg | 2023-01-03 14:10:07 | logout |
+----+---------+------------+---------------------+--------+
7 rows in set (0.00 sec)
mysql> SELECT ds.*,
-> dense_rank() over (PARTITION BY user_id,session_id order BY session_id,`dateTime` ) AS event_seq,
-> dense_rank() over (PARTITION BY user_id order BY user_id,session_id ) AS session_seq
-> FROM dataset AS ds
-> ORDER BY user_id,session_id,`dateTime`;
+----+---------+------------+---------------------+--------+-----------+-------------+
| id | user_id | session_id | dateTime | event | event_seq | session_seq |
+----+---------+------------+---------------------+--------+-----------+-------------+
| 1 | 1 | aa | 2023-01-01 13:12:11 | login | 1 | 1 |
| 2 | 1 | aa | 2023-01-01 14:12:10 | buy | 2 | 1 |
| 3 | 1 | bb | 2023-01-02 11:12:10 | page | 1 | 2 |
| 4 | 2 | cc | 2023-01-01 10:11:01 | login | 1 | 1 |
| 5 | 2 | gg | 2023-01-03 11:12:11 | logout | 1 | 2 |
| 6 | 2 | gg | 2023-01-03 13:11:03 | click | 2 | 2 |
| 7 | 2 | gg | 2023-01-03 14:10:07 | logout | 3 | 2 |
+----+---------+------------+---------------------+--------+-----------+-------------+
7 rows in set (0.00 sec)
mysql>
注意:查看您的MySQL版本是否支持WINDOW函数
0赞
Israt
7/19/2023
#2
您可以使用窗口函数来实现此目的。
WITH session_views
AS (SELECT user_id,
session_id,
Row_number()
OVER (
partition BY user_id) session_seq
FROM user_actions
GROUP BY user_id,
session_id)
SELECT user_actions.user_id,
user_actions.session_id,
user_actions.dateTime,
user_actions.event,
Row_number()
OVER (
partition BY user_actions.session_id) event_seq,
session_views.session_seq,
FROM user_actions
LEFT JOIN session_views
ON session_views.session_id = user_actions.session_id and session_views.user_id = user_actions.user_id;
0赞
Albina
7/19/2023
#3
SELECT logs.*,
ROW_NUMBER() OVER (PARTITION BY user_id, session_id) as event_seq,
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY user_id, session_id) AS session_seq
FROM logs;
您可以使用窗口函数和:ROW_NUMBER()
DENSE_RANK()
ROW_NUMBER()
枚举按列分组的每个分区内的行user_id, session_id
DENSE_RANK()
因此,对每个分区内的行进行排名,该分区按列分组。顺序在表达式中确定。user_id
ORDER BY user_id, session_id
文档。
评论