经常遇见这样的情况,根据条件筛选,结果同一个ID有很多条记录,但是只需每个ID对应第一条记录或最后一条记录。
创建表和初始化数据
create table Business
(
BusinessId varchar(4),
CreateDate datetime,
UserId int
)
insert Business values('B001','2014-02-13 12:30:25',1000)
insert Business values('B001','2014-02-14 12:36:25',1002)
insert Business values('B001','2014-02-16 12:45:25',1003)
insert Business values('B002','2014-03-13 12:30:25',1001)
insert Business values('B002','2014-03-14 12:36:25',1004)
insert Business values('B002','2014-03-16 12:45:25',1002)
insert Business values('B002','2014-04-13 12:30:25',1005)
insert Business values('B002','2014-04-14 12:36:25',1007)
insert Business values('B003','2014-05-13 12:30:25',1000)
insert Business values('B003','2014-05-14 12:36:25',1005)
insert Business values('B003','2014-06-16 12:45:25',1003)
insert Business values('B003','2014-06-20 12:30:25',1004)
刚开始想到这样的方法,用group by 分组,使用聚合函数。
select BusinessId,Min(CreateDate) as CreateDate,MIN(UserId) as UserId from Business group by BusinessId
使用这种方法查询,可以得到想要的结果。但是此时表中的CreateDate,UserId是按照递增插入,假使数据中并没有这种规律,这样查询出来的结果就是错误的。因此,这种方法不符合要求。
运用子查询的方法
方法一:
select Distinct BusinessId,CreateDate,UserId from Business a where CreateDate=(select top 1 CreateDate from Business where BusinessId=a.BusinessId) and UserId=(select top 1 UserId from Business where BusinessId=a.BusinessId)
方法二:
select * from ( select BusinessId,CreateDate,UserId,(select top 1 CreateDate from Business where BusinessId=a.BusinessId ) as CreateTime,(select top 1 CreateDate from Business where BusinessId=a.BusinessId ) as Creater from Business a ) b where CreateDate=CreateTime and UserId=Creater
但这些并不是一个最有效的方法,随着查询的字段增多,where后面的条件就会增多,执行效率就很低。
运用分区的方法
select * from(select BusinessId,CreateDate,UserId,ROW_NUMBER() over(partition by BusinessId order by BusinessId) as rId from Business) a where a.rId=1
Row_Number()函数获取记录的行数,patition by BusinessId 就是按BusinessId分区。这样取rId=1条件下的结果就是相当于数据分组后取top 1。
posted on
浙公网安备 33010602011771号