使用开窗函数对数据分层,然后取出记录的开始结束时间
有如下表t_start_end_time,记录了每种分类的开始时间和结束时间,注意开始时间小于上一条记录的结束时间时,要整合成一条记录
type | start_time | end_time |
---|---|---|
A | 2024-01-01 00:12:56 | 2024-01-01 00:14:56 |
A | 2024-01-01 00:13:40 | 2024-01-01 00:15:40 |
A | 2024-01-01 00:15:22 | 2024-01-01 00:17:22 |
A | 2024-01-01 09:20:22 | 2024-01-01 09:22:22 |
B | 2024-01-01 00:13:26 | 2024-01-01 00:16:26 |
B | 2024-01-01 00:16:20 | 2024-01-01 00:19:20 |
要求得到的结果如下:
type | start_time | end_time |
---|---|---|
A | 2024-01-01 00:12:56 | 2024-01-01 00:17:22 |
A | 2024-01-01 09:20:22 | 2024-01-01 09:22:22 |
B | 2024-01-01 00:13:26 | 2024-01-01 00:19:20 |
使用开窗函数lag,判断数据有没有跳变,然后再有跳变的位置打上标记,再针对标记求累加和,就可以对数据分层
select type,level,min(start_time) as start_time,max(end_time) as end_time
from(
-- 如果上条记录的结束时间大于等于当前记录的开始时间则说明没有跳变,标记为0,否则标记为1,然后累加和对其分层
select *
,sum(case when end_time_pre is null or start_time <= end_time_pre then 0 else 1 end) over(partition by type order by start_time) as level
from(
-- 取上一条记录的结束时间
select *
,lag(end_time,1) over(partition by type order by start_time) as end_time_pre
from t_start_end_time
) t
) t
group by type,level
数据表
CREATE TABLE `t_start_end_time` (
`type` char(1),
`start_time` datetime,
`end_time` datetime
);
INSERT INTO test.t_start_end_time (`type`,start_time,end_time) VALUES
('A','2024-01-01 00:12:56','2024-01-01 00:14:56'),
('A','2024-01-01 00:13:40','2024-01-01 00:15:40'),
('A','2024-01-01 00:15:22','2024-01-01 00:17:22'),
('A','2024-01-01 09:20:22','2024-01-01 09:22:22'),
('B','2024-01-01 00:13:26','2024-01-01 00:16:26'),
('B','2024-01-01 00:16:20','2024-01-01 00:19:20');