求解用户登陆信息表中,每个用户连续登陆平台的天数,连续登陆基础为汇总日期必须登陆,表中每天只有一条用户登陆数据(计算中不涉及天内去重)。
表描述:user_id:用户的id;
sigin_date:用户的登陆日期。
注:求解过程有多种方式,下述求解解法为笔者思路,其他解法可在评论区交流。
思路:
该问题的突破的在于登陆时间,计算得到连续登陆标识,以标识分组为过滤条件,得到连续登陆的天数,最后以user_id分组,以count()函数求和得到每个用户的连续登陆天数。
连续登陆标识 =(当日登陆日期 - 用户的登陆日期)- 开窗排序的顺序号(倒序)
-- 1.建表语句
drop table if exists test_sigindate_cnt;
create table test_sigindate_cnt(
user_id string
,sigin_date string
)
;
-- 2.测试数据插入语句
insert overwrite table test_sigindate_cnt
select 'uid_1' as user_id,'2021-08-03' as sigin_date
union all
select 'uid_1' as user_id,'2021-08-04' as sigin_date
union all
select 'uid_1' as user_id,'2021-08-01' as sigin_date
union all
select 'uid_1' as user_id,'2021-08-02' as sigin_date
union all
select 'uid_1' as user_id,'2021-08-05' as sigin_date
union all
select 'uid_1' as user_id,'2021-08-06' as sigin_date
union all
select 'uid_2' as user_id,'2021-08-01' as sigin_date
union all
select 'uid_2' as user_id,'2021-08-05' as sigin_date
union all
select 'uid_2' as user_id,'2021-08-02' as sigin_date
union all
select 'uid_2' as user_id,'2021-08-06' as sigin_date
union all
select 'uid_3' as user_id,'2021-08-04' as sigin_date
union all
select 'uid_3' as user_id,'2021-08-06' as sigin_date
union all
select 'uid_4' as user_id,'2021-08-03' as sigin_date
union all
select 'uid_4' as user_id,'2021-08-02' as sigin_date
;
select user_id
,count(1) as sigin_cnt
from (
select
user_id
,datediff('2021-08-06',sigin_date) as data_diff
,row_number() over (partition by user_id order by sigin_date desc) as row_num
from test_sigindate_cnt
) t
where data_diff - row_num = -1
group by
user_id
;
| 汇总日期 | 用户id | 登陆天数 |
| 2021-08-06 | uid_1 | 6 |
| 2021-08-06 | uid_2 | 2 |
| 2021-08-06 | uid_3 | 1 |

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:mmqy2019@163.com进行举报,并提供相关证据,查实之后,将立刻删除涉嫌侵权内容。
长按识别二维码并关注微信
更方便到期提醒、手机管理