department top three salary---leetcode 解题记录

这部分参考:
原题:
 
题目描述:
The Employee table holds all employees. Every employee has an Id, and there is also a column for the department Id.
 
+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
+----+-------+--------+--------------+
The Department table holds all departments of the company.
 
+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+
Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows.
 
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Randy    | 85000  |
| IT         | Joe      | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+
题目大意:
雇员表Employee保存了雇员的Id,姓名,薪水,以及部门Id。部门表Department保存了部门的Id和名称。
 
编写一个SQL查询找出每一个部门薪水排名前3位的雇员信息(排名可以并列),样例如上。
 
解题思路:
参考StackOverflow的问答(http://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results)
 
LeetCode OJ将此题的难度标记为Hard,可见题目的确有一定的难度。
 
解题步骤:
首先将雇员表按照雇员的DepartmentId和Salary分别正序和倒序排列;
 
然后利用MySQL用户定义变量对雇员标记排名rank值,同一部门的rank值从1开始,按照薪水降序递增(注意薪水相同时rank值不变);
 
最后筛选rank值不大于3的雇员信息,并与Department表做关联即可。
 
解题过程:
无意之间看到,最开始没注意到hard标记,看完成率还是挺高的,就点开这道题,开始分析,第一时间的想法就是:先根据部门进行分组,然后每组选前三名即可,写了一个类似于:
select top 3 from a group by departmentId order by salary
但是MySQL根本没有top关键字,其次,这语法也不对,换写法;
接下来想到,可以先针对原始记录进行排序,按照部门id 和 薪水两个字段进行排序,然后根据部门分组,最后用limit进行筛选,语句如下:
select * from (select * from Employee e order by e.DepartmentId asc,e.Salary desc)t group by t.DepartmentId order by t.Salary desc limit 3;
事实证明也不行,MySQL不支持组内排序,这时有点疑问,这题难度不小啊,去看了原题的标记,才知道是hard,这时才想起来,leetcode的完成率跟难易程度完全没参考性;
想了半天,也不知其所以然,开始搜索别人的思路,大部分都是写的,针对不同的人进行部门排序,即增加一个临时字段记录rank记录其排名,下一步针对排名进行筛选,根据这个思路就开始写:
select q.name as Department,p.Name as Employee,p.Salary as Salary from(
select DepartmentId,
       Name,
       Salary,
       @rank:=IF(@preDepartmentId <> DepartmentId,1,IF(@preSalary=Salary,@rank,@rank+1) )as rank,
       @preDepartmentId := DepartmentId as preDepartmentId,
       @preSalary := Salary as preSalary
from (select * from Employee  e order by e.DepartmentId asc,e.Salary desc) t,(select @DepartmentId:=null,@preSalary:=null,@rank:=0)r
)p
inner join Department q on p.DepartmentId=q.Id
where p.rank<4;
写这个sql还真是学到不少东西,
第一:关于自定义变量的使用,之前只在存储过程写过,还真没在查询sql中写过;
第二:关于join 不同类型,审题以为Employee 表 肯定会记录 departmentId,上来就写了一个left join ,一般的test case 都能通过,结果提交之后,发现 雇员 部门id为空时此条记录应该过滤,看了错误提示反应过来是join 类型不对;
第三:别人写的答案是:
SELECT d.NAME AS Department, t.NAME AS Employee, Salary FROM
( SELECT DepartmentId,
NAME, Salary,
@rank := IF(@prevDeptId != DepartmentId, 1, IF(@prevSalary = Salary, @rank, @rank + 1) ) AS Rank,
@prevDeptId := DepartmentId AS prevDeptId, @prevSalary := Salary AS prevSalary
FROM Employee e, (SELECT @rank := 0, @prevDeptId := NULL, @prevSalary := NULL) r ORDER BY DepartmentId ASC, Salary DESC ) t
INNER JOIN Department d ON t.DepartmentId = d.ID
WHERE t.rank <= 3
 
跟我的不同主要在于:ORDER BY DepartmentId ASC, Salary DESC 这一句使用的地方,本人写的直接在内层查询,网友写的是
FROM Employee e, (SELECT @rank := 0, @prevDeptId := NULL, @prevSalary := NULL) r
ORDER BY DepartmentId ASC, Salary DESC
按照字面的逻辑理解就是:先进行rank计算,然后根据部门id 和salary 进行排序,此处最开始不甚明白,使用explain 查看了执行计划,发现filesort 先于全表遍历查询;猜测此处MySQL作了优化,只是优化后看起来比较难理解(请教了别人后给出的原因,原谅我不求甚解,哪位知道原因还请告知,不甚感激)。
 
PS:leetcode提交答案时应该注意:leetcode对答案大小写敏感。
posted @ 2015-09-30 10:26  轻言微雨  阅读(122)  评论(0)    收藏  举报