department top three salary---leetcode 解题记录
这部分参考:
原题:
题目描述:
The Employee table holds all employees. Every employee has an Id, and there is also a column for the department Id.
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
+----+-------+--------+--------------+
The Department table holds all departments of the company.
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows.
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Randy | 85000 |
| IT | Joe | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
题目大意:
雇员表Employee保存了雇员的Id,姓名,薪水,以及部门Id。部门表Department保存了部门的Id和名称。
编写一个SQL查询找出每一个部门薪水排名前3位的雇员信息(排名可以并列),样例如上。
解题思路:
参考StackOverflow的问答(http://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results)
LeetCode OJ将此题的难度标记为Hard,可见题目的确有一定的难度。
解题步骤:
首先将雇员表按照雇员的DepartmentId和Salary分别正序和倒序排列;
然后利用MySQL用户定义变量对雇员标记排名rank值,同一部门的rank值从1开始,按照薪水降序递增(注意薪水相同时rank值不变);
最后筛选rank值不大于3的雇员信息,并与Department表做关联即可。
解题过程:
无意之间看到,最开始没注意到hard标记,看完成率还是挺高的,就点开这道题,开始分析,第一时间的想法就是:先根据部门进行分组,然后每组选前三名即可,写了一个类似于:
select top 3 from a group by departmentId order by salary
但是MySQL根本没有top关键字,其次,这语法也不对,换写法;
接下来想到,可以先针对原始记录进行排序,按照部门id 和 薪水两个字段进行排序,然后根据部门分组,最后用limit进行筛选,语句如下:
select * from (select * from Employee e order by e.DepartmentId asc,e.Salary desc)t group by t.DepartmentId order by t.Salary desc limit 3;
事实证明也不行,MySQL不支持组内排序,这时有点疑问,这题难度不小啊,去看了原题的标记,才知道是hard,这时才想起来,leetcode的完成率跟难易程度完全没参考性;
想了半天,也不知其所以然,开始搜索别人的思路,大部分都是写的,针对不同的人进行部门排序,即增加一个临时字段记录rank记录其排名,下一步针对排名进行筛选,根据这个思路就开始写:
select q.name as Department,p.Name as Employee,p.Salary as Salary from(
select DepartmentId,
Name,
Salary,
@rank:=IF(@preDepartmentId <> DepartmentId,1,IF(@preSalary=Salary,@rank,@rank+1) )as rank,
@preDepartmentId := DepartmentId as preDepartmentId,
@preSalary := Salary as preSalary
from (select * from Employee e order by e.DepartmentId asc,e.Salary desc) t,(select @DepartmentId:=null,@preSalary:=null,@rank:=0)r
)p
inner join Department q on p.DepartmentId=q.Id
where p.rank<4;
select DepartmentId,
Name,
Salary,
@rank:=IF(@preDepartmentId <> DepartmentId,1,IF(@preSalary=Salary,@rank,@rank+1) )as rank,
@preDepartmentId := DepartmentId as preDepartmentId,
@preSalary := Salary as preSalary
from (select * from Employee e order by e.DepartmentId asc,e.Salary desc) t,(select @DepartmentId:=null,@preSalary:=null,@rank:=0)r
)p
inner join Department q on p.DepartmentId=q.Id
where p.rank<4;
写这个sql还真是学到不少东西,
第一:关于自定义变量的使用,之前只在存储过程写过,还真没在查询sql中写过;
第二:关于join 不同类型,审题以为Employee 表 肯定会记录 departmentId,上来就写了一个left join ,一般的test case 都能通过,结果提交之后,发现 雇员 部门id为空时此条记录应该过滤,看了错误提示反应过来是join 类型不对;
第三:别人写的答案是:
SELECT d.NAME AS Department, t.NAME AS Employee, Salary FROM
( SELECT DepartmentId,
NAME, Salary,
@rank := IF(@prevDeptId != DepartmentId, 1, IF(@prevSalary = Salary, @rank, @rank + 1) ) AS Rank,
@prevDeptId := DepartmentId AS prevDeptId, @prevSalary := Salary AS prevSalary
FROM Employee e, (SELECT @rank := 0, @prevDeptId := NULL, @prevSalary := NULL) r ORDER BY DepartmentId ASC, Salary DESC ) t
INNER JOIN Department d ON t.DepartmentId = d.ID
WHERE t.rank <= 3
跟我的不同主要在于:ORDER BY DepartmentId ASC, Salary DESC 这一句使用的地方,本人写的直接在内层查询,网友写的是
FROM Employee e, (SELECT @rank := 0, @prevDeptId := NULL, @prevSalary := NULL) r
ORDER BY DepartmentId ASC, Salary DESC
按照字面的逻辑理解就是:先进行rank计算,然后根据部门id 和salary 进行排序,此处最开始不甚明白,使用explain 查看了执行计划,发现filesort 先于全表遍历查询;猜测此处MySQL作了优化,只是优化后看起来比较难理解(请教了别人后给出的原因,原谅我不求甚解,哪位知道原因还请告知,不甚感激)。
PS:leetcode提交答案时应该注意:leetcode对答案大小写敏感。

浙公网安备 33010602011771号