笔记287 数据库系统异常排查之 DMV 2013-5-23

笔记287 数据库系统异常排查之 DMV   2013-5-23

  1 --数据库系统异常排查之 DMV   2013-5-23
  2 
  3 -- 一 从数据库连接情况来判断异常:
  4 --
  5 --1. 首先我们来看一下目前数据库系统所有请求情况
  6 --request info
  7 select s. session_id , s . status, db_name (r . database_id) as database_name ,
  8 s .login_name , s. login_time , s . host_name,
  9 c .client_net_address , c. client_tcp_port ,s . program_name,
 10 r .cpu_time , r. reads , r . writes, c .num_reads , c. num_writes ,
 11 s .client_interface_name ,
 12  s .last_request_start_time , s. last_request_end_time ,
 13 c .connect_time , c. net_transport , c . net_packet_size,
 14 r .start_time , r. status , r . command,
 15 r .blocking_session_id , r. wait_type ,
 16 r .wait_time , r. last_wait_type , r . wait_resource, r. open_transaction_count,
 17 r .percent_complete , r. granted_query_memory
 18 from Sys.dm_exec_requests r with( nolock )
 19 right outer join Sys.dm_exec_sessions s  with ( nolock)
 20 on r. session_id = s . session_id
 21 right outer join Sys.dm_exec_connections c  with ( nolock)
 22 on s. session_id = c . session_id
 23 where s. session_id >50
 24 order by s .session_id
 25 
 26 -- 这个查询将目前数据库中的所有请求都显示出来了,其中比较重要的有 Status 、Login_name 、 Host_Name,
 27 --Client_Net_Address 、 Program_name等,但是信息比较多,我们很难查看有什么异常,
 28 -- 初步判断连接数是否超过了平时的标准,很这个查询返回的记录数
 29 -- (很多时候系统异常是连接数过多造成的,而连接数过多又是因为其他原因影响的)。
 30 
 31 ----------------------------------------------
 32 --2. 哪个用户连接数最多:
 33 
 34 --request info by user
 35 select login_name, COUNT (0 ) user_count
 36 from Sys.dm_exec_requests r with( nolock )
 37 right outer join Sys.dm_exec_sessions s  with ( nolock)
 38 on r. session_id = s . session_id
 39 right outer join Sys.dm_exec_connections c  with ( nolock)
 40 on s. session_id = c . session_id
 41 where s. session_id >50
 42 group by login_name
 43 order by 2 DESC
 44 
 45 -- 从图中我们可以很方便的看出用户连接数情况,
 46 -- 如果我们的不同的功能是使用不同的的数据库账号的话,
 47 -- 就能初步判断是哪部分功能可能出现了异常
 48 
 49 ----------------------------------------------------
 50 --3. 哪台机器发起到数据库的连接数最多
 51 --request info by hostname
 52 select s. host_name ,c . client_net_address, COUNT (0 ) host_count
 53 from Sys.dm_exec_requests r with( nolock )
 54 right outer join Sys.dm_exec_sessions s  with ( nolock)
 55 on r. session_id = s . session_id
 56 right outer join Sys.dm_exec_connections c  with ( nolock)
 57 on s. session_id = c . session_id
 58 where s. session_id >50
 59 group by host_name, client_net_address
 60 order by 3 DESC
 61 
 62 -- 这个查询能够一下就帮我们找出来哪些机器发起了对数据库的链接,它们的链接数量是否有异常;这个其实对调查某些问题非常有用,我有一次就遇
 63 --
 64 -- 到一个case :
 65 --
 66 -- 用户反映,过一两个星期,系统就会出现一次异常,出问题时数据库连接数量很高,大量的访问被数据库拒绝,过半个小时左右,系统又自动恢复了,但是
 67 --
 68 -- 在数据库里面查看,并没有发现有异常的进程和错误的信息,问题一时很棘手,很难定位,系统不稳定领导不满, DBA 顶着压力一时不知道如何是好;后面
 69 --
 70 -- 转换方向,通过调查问题发生时,为什么会产生这么多连接,这些连接是那些机器发过来的,这些连接发过来正常吗,是数据库不砍业务的重负,还是业务
 71 --
 72 -- 在某个时间段内会出现暴涨等一系列原因,最终找出是一台 Web 因为开发人员代码写的有问题,内存出现内存泄露,导致大量的连接不能释放,出问题是,
 73 --
 74 -- 发出的数据库连接数比平时高倍,最终影响到了数据库,问题压根和数据库没关系(从这个事实看出, DBA 真是的炮灰角色,不是自己的问题,也得顶
 75 --
 76 -- 着压力调查出原因呀);如果在类似问题发生时,我们能通过这个查询及早知道问题是出在某台 Web 机器上,那就不用费尽心力来调查数据库了。
 77 
 78 
 79 ---------------------------------------------------------------------------------------------
 80 --4. 哪些连接在访问哪个库
 81 --request info by databases
 82 select db_name ( r. database_id ) as database_name, COUNT (0 ) host_count
 83 from Sys.dm_exec_requests r with( nolock )
 84 right outer join Sys.dm_exec_sessions s  with ( nolock)
 85 on r. session_id = s . session_id
 86 right outer join Sys.dm_exec_connections c  with ( nolock)
 87 on s. session_id = c . session_id
 88 where s. session_id >50
 89 group by r .database_id
 90 order by 2 DESC
 91 
 92 -- 结果(为NULL 的估计是没办法定位库)
 93 
 94 -------------------------------------------------------------------------------------
 95 --5. 进程状态:
 96 --request info by status
 97 select s. status ,COUNT ( 0) host_count
 98 from Sys.dm_exec_requests r with( nolock )
 99 right outer join Sys.dm_exec_sessions s  with ( nolock)
100 on r. session_id = s . session_id
101 right outer join Sys.dm_exec_connections c  with ( nolock)
102 on s. session_id = c . session_id
103 where s. session_id >50
104 group by s .status
105 order by 2 DESC
106 
107 -- 结果(running 数比较多,表示数据库压力比较大):
108 
109 --------------------------------------------------------------------------------
110 -- 判断阻塞
111 --1. 查看数据库阻塞情况:
112 ----------------------------------------Blocked Info----------------------------------
113 -- 记录当前阻塞信息
114 select t1. resource_type as [lock type] , db_name( resource_database_id ) as [database]   
115 , t1. resource_associated_entity_id as [blk object]   
116 , t1. request_mode as [lock req]                          -- lock requested   
117 , t1. request_session_id as [waiter sid]                      -- spid of waiter   
118 , t2. wait_duration_ms as [wait time]         
119 ,( select text from sys.dm_exec_requests as r with ( nolock)                  --- get sql for waiter   
120 cross apply sys.dm_exec_sql_text (r . sql_handle)     
121 where r. session_id = t1 . request_session_id) as waiter_batch   
122 ,( select substring (qt . text, r .statement_start_offset / 2,     
123 ( case when r. statement_end_offset = - 1 then len( convert (nvarchar ( max), qt. text)) * 2    
124 else r. statement_end_offset end - r. statement_start_offset )/2 + 1)     
125 from sys.dm_exec_requests as r with (nolock )     
126 cross apply sys.dm_exec_sql_text (r . sql_handle) as qt   
127 where r. session_id = t1 . request_session_id) as waiter_stmt    --- statement executing now   
128 , t2. blocking_session_id as [blocker sid]                --- spid of blocker   
129 ,( select text from sys.sysprocesses as p with ( nolock)     --- get sql for blocker   
130 cross apply sys.dm_exec_sql_text (p . sql_handle)     
131 where p. spid = t2 . blocking_session_id) as blocker_stmt , getdate() time  
132 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock )      
133 where t1. lock_owner_address = t2 . resource_address
134 
135 --------------------------------------------------------------------
136 --2. 查看阻塞其他进程的进程(阻塞源头):
137 select   t2. blocking_session_id ,COUNT ( 0) counts
138 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock )    
139 where t1. lock_owner_address = t2 . resource_address
140 group by blocking_session_id
141 order by 2
142 
143 
144 
145 
146 -----------------------------------------------------------------------
147 --3. 被阻塞时间最长的进程:
148 -- 被阻塞时间最长的 session
149 select top 10  t1 .resource_type as [lock type] , db_name( resource_database_id ) as [database]   
150 , t1. resource_associated_entity_id as [blk object]   
151 , t1. request_mode as [lock req]                          -- lock requested   
152 , t1. request_session_id as [waiter sid]                      -- spid of waiter   
153 , t2. wait_duration_ms as [wait time]         
154 ,( select text from sys.dm_exec_requests as r with ( nolock)                  --- get sql for waiter   
155 cross apply sys.dm_exec_sql_text (r . sql_handle)     
156 where r. session_id = t1 . request_session_id) as waiter_batch   
157 ,( select substring (qt . text, r .statement_start_offset / 2,     
158 ( case when r. statement_end_offset = - 1 then len( convert (nvarchar ( max), qt. text)) * 2    
159 else r. statement_end_offset end - r. statement_start_offset )/2 + 1)     
160 from sys.dm_exec_requests as r with (nolock )     
161 cross apply sys.dm_exec_sql_text (r . sql_handle) as qt   
162 where r. session_id = t1 . request_session_id) as waiter_stmt    --- statement executing now   
163 , t2. blocking_session_id as [blocker sid]                --- spid of blocker   
164 ,( select text from sys.sysprocesses as p with ( nolock)     --- get sql for blocker   
165 cross apply sys.dm_exec_sql_text (p . sql_handle)     
166 where p. spid = t2 . blocking_session_id) as blocker_stmt , getdate() time  
167 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock )      
168 where t1. lock_owner_address = t2 . resource_address
169 order by t2 .wait_duration_ms desc

 

posted @ 2013-08-04 20:59 桦仔 阅读(...) 评论(...)  编辑 收藏