1 --数据库系统异常排查之 DMV 2013-5-23
2
3 -- 一 从数据库连接情况来判断异常:
4 --
5 --1. 首先我们来看一下目前数据库系统所有请求情况
6 --request info
7 select s. session_id , s . status, db_name (r . database_id) as database_name ,
8 s .login_name , s. login_time , s . host_name,
9 c .client_net_address , c. client_tcp_port ,s . program_name,
10 r .cpu_time , r. reads , r . writes, c .num_reads , c. num_writes ,
11 s .client_interface_name ,
12 s .last_request_start_time , s. last_request_end_time ,
13 c .connect_time , c. net_transport , c . net_packet_size,
14 r .start_time , r. status , r . command,
15 r .blocking_session_id , r. wait_type ,
16 r .wait_time , r. last_wait_type , r . wait_resource, r. open_transaction_count,
17 r .percent_complete , r. granted_query_memory
18 from Sys.dm_exec_requests r with( nolock )
19 right outer join Sys.dm_exec_sessions s with ( nolock)
20 on r. session_id = s . session_id
21 right outer join Sys.dm_exec_connections c with ( nolock)
22 on s. session_id = c . session_id
23 where s. session_id >50
24 order by s .session_id
25
26 -- 这个查询将目前数据库中的所有请求都显示出来了,其中比较重要的有 Status 、Login_name 、 Host_Name,
27 --Client_Net_Address 、 Program_name等,但是信息比较多,我们很难查看有什么异常,
28 -- 初步判断连接数是否超过了平时的标准,很这个查询返回的记录数
29 -- (很多时候系统异常是连接数过多造成的,而连接数过多又是因为其他原因影响的)。
30
31 ----------------------------------------------
32 --2. 哪个用户连接数最多:
33
34 --request info by user
35 select login_name, COUNT (0 ) user_count
36 from Sys.dm_exec_requests r with( nolock )
37 right outer join Sys.dm_exec_sessions s with ( nolock)
38 on r. session_id = s . session_id
39 right outer join Sys.dm_exec_connections c with ( nolock)
40 on s. session_id = c . session_id
41 where s. session_id >50
42 group by login_name
43 order by 2 DESC
44
45 -- 从图中我们可以很方便的看出用户连接数情况,
46 -- 如果我们的不同的功能是使用不同的的数据库账号的话,
47 -- 就能初步判断是哪部分功能可能出现了异常
48
49 ----------------------------------------------------
50 --3. 哪台机器发起到数据库的连接数最多
51 --request info by hostname
52 select s. host_name ,c . client_net_address, COUNT (0 ) host_count
53 from Sys.dm_exec_requests r with( nolock )
54 right outer join Sys.dm_exec_sessions s with ( nolock)
55 on r. session_id = s . session_id
56 right outer join Sys.dm_exec_connections c with ( nolock)
57 on s. session_id = c . session_id
58 where s. session_id >50
59 group by host_name, client_net_address
60 order by 3 DESC
61
62 -- 这个查询能够一下就帮我们找出来哪些机器发起了对数据库的链接,它们的链接数量是否有异常;这个其实对调查某些问题非常有用,我有一次就遇
63 --
64 -- 到一个case :
65 --
66 -- 用户反映,过一两个星期,系统就会出现一次异常,出问题时数据库连接数量很高,大量的访问被数据库拒绝,过半个小时左右,系统又自动恢复了,但是
67 --
68 -- 在数据库里面查看,并没有发现有异常的进程和错误的信息,问题一时很棘手,很难定位,系统不稳定领导不满, DBA 顶着压力一时不知道如何是好;后面
69 --
70 -- 转换方向,通过调查问题发生时,为什么会产生这么多连接,这些连接是那些机器发过来的,这些连接发过来正常吗,是数据库不砍业务的重负,还是业务
71 --
72 -- 在某个时间段内会出现暴涨等一系列原因,最终找出是一台 Web 因为开发人员代码写的有问题,内存出现内存泄露,导致大量的连接不能释放,出问题是,
73 --
74 -- 发出的数据库连接数比平时高倍,最终影响到了数据库,问题压根和数据库没关系(从这个事实看出, DBA 真是的炮灰角色,不是自己的问题,也得顶
75 --
76 -- 着压力调查出原因呀);如果在类似问题发生时,我们能通过这个查询及早知道问题是出在某台 Web 机器上,那就不用费尽心力来调查数据库了。
77
78
79 ---------------------------------------------------------------------------------------------
80 --4. 哪些连接在访问哪个库
81 --request info by databases
82 select db_name ( r. database_id ) as database_name, COUNT (0 ) host_count
83 from Sys.dm_exec_requests r with( nolock )
84 right outer join Sys.dm_exec_sessions s with ( nolock)
85 on r. session_id = s . session_id
86 right outer join Sys.dm_exec_connections c with ( nolock)
87 on s. session_id = c . session_id
88 where s. session_id >50
89 group by r .database_id
90 order by 2 DESC
91
92 -- 结果(为NULL 的估计是没办法定位库)
93
94 -------------------------------------------------------------------------------------
95 --5. 进程状态:
96 --request info by status
97 select s. status ,COUNT ( 0) host_count
98 from Sys.dm_exec_requests r with( nolock )
99 right outer join Sys.dm_exec_sessions s with ( nolock)
100 on r. session_id = s . session_id
101 right outer join Sys.dm_exec_connections c with ( nolock)
102 on s. session_id = c . session_id
103 where s. session_id >50
104 group by s .status
105 order by 2 DESC
106
107 -- 结果(running 数比较多,表示数据库压力比较大):
108
109 --------------------------------------------------------------------------------
110 -- 判断阻塞
111 --1. 查看数据库阻塞情况:
112 ----------------------------------------Blocked Info----------------------------------
113 -- 记录当前阻塞信息
114 select t1. resource_type as [lock type] , db_name( resource_database_id ) as [database]
115 , t1. resource_associated_entity_id as [blk object]
116 , t1. request_mode as [lock req] -- lock requested
117 , t1. request_session_id as [waiter sid] -- spid of waiter
118 , t2. wait_duration_ms as [wait time]
119 ,( select text from sys.dm_exec_requests as r with ( nolock) --- get sql for waiter
120 cross apply sys.dm_exec_sql_text (r . sql_handle)
121 where r. session_id = t1 . request_session_id) as waiter_batch
122 ,( select substring (qt . text, r .statement_start_offset / 2,
123 ( case when r. statement_end_offset = - 1 then len( convert (nvarchar ( max), qt. text)) * 2
124 else r. statement_end_offset end - r. statement_start_offset )/2 + 1)
125 from sys.dm_exec_requests as r with (nolock )
126 cross apply sys.dm_exec_sql_text (r . sql_handle) as qt
127 where r. session_id = t1 . request_session_id) as waiter_stmt --- statement executing now
128 , t2. blocking_session_id as [blocker sid] --- spid of blocker
129 ,( select text from sys.sysprocesses as p with ( nolock) --- get sql for blocker
130 cross apply sys.dm_exec_sql_text (p . sql_handle)
131 where p. spid = t2 . blocking_session_id) as blocker_stmt , getdate() time
132 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock )
133 where t1. lock_owner_address = t2 . resource_address
134
135 --------------------------------------------------------------------
136 --2. 查看阻塞其他进程的进程(阻塞源头):
137 select t2. blocking_session_id ,COUNT ( 0) counts
138 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock )
139 where t1. lock_owner_address = t2 . resource_address
140 group by blocking_session_id
141 order by 2
142
143
144
145
146 -----------------------------------------------------------------------
147 --3. 被阻塞时间最长的进程:
148 -- 被阻塞时间最长的 session
149 select top 10 t1 .resource_type as [lock type] , db_name( resource_database_id ) as [database]
150 , t1. resource_associated_entity_id as [blk object]
151 , t1. request_mode as [lock req] -- lock requested
152 , t1. request_session_id as [waiter sid] -- spid of waiter
153 , t2. wait_duration_ms as [wait time]
154 ,( select text from sys.dm_exec_requests as r with ( nolock) --- get sql for waiter
155 cross apply sys.dm_exec_sql_text (r . sql_handle)
156 where r. session_id = t1 . request_session_id) as waiter_batch
157 ,( select substring (qt . text, r .statement_start_offset / 2,
158 ( case when r. statement_end_offset = - 1 then len( convert (nvarchar ( max), qt. text)) * 2
159 else r. statement_end_offset end - r. statement_start_offset )/2 + 1)
160 from sys.dm_exec_requests as r with (nolock )
161 cross apply sys.dm_exec_sql_text (r . sql_handle) as qt
162 where r. session_id = t1 . request_session_id) as waiter_stmt --- statement executing now
163 , t2. blocking_session_id as [blocker sid] --- spid of blocker
164 ,( select text from sys.sysprocesses as p with ( nolock) --- get sql for blocker
165 cross apply sys.dm_exec_sql_text (p . sql_handle)
166 where p. spid = t2 . blocking_session_id) as blocker_stmt , getdate() time
167 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock )
168 where t1. lock_owner_address = t2 . resource_address
169 order by t2 .wait_duration_ms desc