牛客 sql入门39题笔记

限制输入行数 limit
用法1:limit n (返回第0行到第n行）
用法2:limit m,n （跳过m行展示n行）

剔除某类条件的四种方式

！=
2.<>
3.not in
4.not like
问题：not in 和not exists的区别
1.in 与 exists的区别：外表大用in 内表大用exists 原理：用in：外表使用了索引，直接作hash连接；用exists：内表使用了索引，外表作loop循环再进行匹配；
in 引导的子句只能返回一个字段
2.not in 与 not exists 的区别：
存在两种分组条件时:仍然使用group by 分组用逗号隔开

**聚合函数作为筛选条件时，要使用having语法

SQL查找每个学校用户的平均答题数目(说明：某学校用户平均答题数量计算方式为该学校用户答题总次数除以答过题的不同用户个数)根据示例，你的查询应返回以下结果（结果保留4位小数），注意：结果按照university升序排序！！！

inner join 内连接只取两张表的交集部分

//inner join：inner join只返回两个表中联结字段相等的行;left join：left join返回包括左表中的所有记录和右表中联结字段相等的记录。
//inner join：inner join的数量小于等于左表和右表中的记录数量。 left join：left join的数量以左表中的记录数量相同。

select university,count(question_id)/count(distinct qpd.device_id) as avg_answer_cnt
from question_practice_detail qpd
inner join user_profile up
on qpd.device_id=up.device_id
group by university

平均答题数：答题总数/答题人数
答题总数：取question_id 进行计数
答题人数：distinct 对重复答题的账号进行去重，得到实际答题人数
分组：直接使用group by将答题总数和答题人数按学校分组
四舍五入：round

现在运营想要分别查看学校为山东大学或者性别为男性的用户的device_id、gender、age和gpa数据，请取出相应结果，结果不去重。
分别查看，结果不去重：union all 分别去查满足结果1和结果2的然后合在一起不去重
select device_id,gender,age,gpa
from user_profile
where university='山东大学'
union all
select device_id,gender,age,gpa
from user_profile
where gender='male'

题目：现在运营想要将用户划分为25岁以下和25岁及以上两个年龄段，分别查看这两个年龄段用户数量
！！本题注意：age为null 也记为 25岁以下
当数据中存在Null的处理： xx is null
①if函数
select if(age>=25,"25岁及以上","25岁以下") as age_cut,count() as number
from user_profile
group by age_cut
//好处：只去没有null的那一半数据，避免了对Null的处理
select
(case
when age<25 or age is null then '25岁以下'
when age>=25 then '25岁及以上'
end) age_cut,count() number
from user_profile
group by age_cut

题目：现在运营想要计算出2021年8月每天用户练习题目的数量，请取出相应数据。
跟日期有关的写法：比如用year/month函数的year(date)=2021 and month(date)=8，比如用date_format函数的date_format(date, "%Y-%m")="202108"
取出日期：day(date）

题目：现在运营想要查看用户在某天刷题后第二天还会再来刷题的平均概率。请你取出相应数据。
解法1：select
count(date2) / count(date1) as avg_ret
from
(
select
distinct qpd.device_id,
qpd.date as date1,
uniq_id_date.date as date2
from
question_practice_detail as qpd
left join(
select
distinct device_id,
date
from
question_practice_detail
) as uniq_id_date on qpd.device_id = uniq_id_date.device_id
and date_add(qpd.date, interval 1 day) = uniq_id_date.date
) as id_last_next_date
1.将一张表拆成两张表并分别进行命名
2.date_add函数 interval的用法：DATE_ADD(date,INTERVAL expr type)
date 参数是合法的日期表达式。expr 参数是您希望添加的时间间隔。
type 参数可以是 day year month

解法2：用lead函数将同一用户连续两天的记录拼接起来。先按用户分组partition by device_id，再按日期升序排序order by date，再两两拼接（最后一个默认和null拼接），即lead(date) over (partition by device_id order by date)

DATEDIFF() 函数返回两个日期之间的天数
select
avg(if(datediff(date2, date1) = 1, 1, 0)) as avg_ret
from
(
select
distinct device_id,
date as date1,
lead(date) over (
partition by device_id
order by
date
) as date2
from
(
select
distinct device_id,
date
from
question_practice_detail
) as uniq_id_date
) as id_last_next_date

字符串截取函数：substring_index(str,delim,count)
str:要处理的字符串
delim：分隔符
count：计数
如果count是正数，那么就是从左往右数，第N个分隔符的左边的全部内容，如果是负数，那就是从右边开始数，第N个分隔符右边的所有内容

题目：现在运营想要找到每个学校gpa最低的同学来做调研，请你取出每个学校的最低gpa。
错误解法：
select device_id,university,gpa
from user_profile
where gpa (select min(gpa)
from user_profile
group by university)
order by university
该解法先选出每个学校最低的gpa,再将所有学校的gpa与最低gpa做比较，若有某学校的gpa等于另一学校的最低gpa，则会导致筛选出错
较优解法：
select device_id,university,gpa
from user_profile
where (gpa,university) in (select min(gpa),university
from user_profile
group by university)
order by university
SELECT device_id,university,min(gpa) FROM user_profile GROUP BY university; 因为学校与学生是一对多的关系，如果仅用min求出gpa最低的学生，查询结果中的id与学生不一定是对应的关系，因此此方法错误。
（1）方法一：将表a的device_id,university,gpa和表b的university,min(gpa)连接起来找。
SELECT a.device_id,a.university,a.gpa FROM user_profile a
JOIN (SELECT university,min(gpa) gpa FROM user_profile GROUP BY university) b
on a.university=b.university and a.gpa=b.gpa
ORDER BY university;
（2）方法二：窗口函数
SELECT device_id,university,gpa FROM
(SELECT device_id,university,gpa,
RANK() over (PARTITION BY university ORDER BY gpa) rk FROM user_profile) a
WHERE a.rk=1;
简单来说，窗口函数有以下功能：
1）同时具有分组和排序的功能
2）不减少原表的行数
3）语法如下
<窗口函数> over (partition by <用于分组的列名>
order by <用于排序的列名>)

select *,
rank() over (order by 成绩 desc) as ranking,
dense_rank() over (order by 成绩 desc) as dese_rank,
row_number() over (order by 成绩 desc) as row_num
from 班级表

题目：现在运营想要了解浙江大学的用户在不同难度题目下答题的正确率情况，请取出相应数据，并按照准确率升序输出。
我的：select aa.difficult_level, sum(if(aa.result='right',1,0))/count(aa.device_id) correct_rate
from user_profile up
right join (select qpd.device_id,qpd.question_id,qpd.result,qd.difficult_level
from question_practice_detail qpd
right join question_detail qd
on qpd.question_id=qd.question_id) aa
on aa.device_id=up.device_id
where up.university='浙江大学'
group by aa.difficult_level
order by correct_rate asc

标答：
select difficult_level,
avg(if(qpd.result='right', 1, 0)) as correct_rate
-- sum(if(qpd.result='right', 1, 0)) / count(qpd.question_id) as correct_rate
-- count(if(qpd.result='right', 1, null)) / count(qpd.question_id) as correct_rate
from user_profile as up

inner join question_practice_detail as qpd
on up.device_id = qpd.device_id

inner join question_detail as qd
on qd.question_id = qpd.question_id

where up.university = '浙江大学'
group by qd.difficult_level
order by correct_rate asc;

posted @ 2022-11-24 15:26 桥桥泽阅读(62) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

qlxz

牛客 sql入门39题笔记

公告