【译】The Accidental DBA：Troubleshooting Performance

最近重新翻看The Accidental DBA，将Troubleshooting Performance部分稍作整理，方便以后查阅。此篇是Part 2
Part 1：The Accidental DBA：SQL Server Backup
Part 3：The Accidental DBA：Troubleshooting
一、Baselines

网友提供的性能基线的含义：每天使用windows性能计数器定时(周期为一个月，具体需要根据自己的需求)收集服务器硬件信息，然后对硬件信息进行分析统计，计算平均值、最大值、最小值，用来与之后每天硬件信息进行比较，从而快速的估算服务器硬件状态。

之前对基线的理解一直停留在使用Perfmon收集几个计数器，然后拿收集到的数值和网上推荐的数值进行对比，得以判别是否异常。
数据库服务器的基线的作用：
1、帮助你发现在它成为问题之前发生了什么变化
2、允许你主动调整你的数据库
3、允许你使用历史信息解决问题
4、提供用于环境和数据趋势的数据
5、捕获数据，提供给管理层，以及服务器和存储管理员，用于资源和容量规划
基线详细内容可参考：Capturing Baseline Data
二、Tools for On-Going Monitoring
2.1、Performance Monitor and PAL
Performance Monitor(PerfMon)提供了大量关于Windows和SQL Server的配置选项，你可以根据需要对不同的服务器进行调整，或者每次都使用相同的模板。它允许你在指定的时间段内生成一个全面的性能概要，并且你可以实时查看性能。
性能计数器的捕获可参考PerfMon模板；性能监视器文件的分析可借助PAL工具
2.2、SQL Trace and Trace Analysis Tools
客户端(Profiler)获取跟踪可参考：Trace-跟踪高消耗的语句需添加哪些事件；服务器端跟踪可参考：模板-Trace；导出正在运行的跟踪模板可参考：Trace-导出已有的服务器端跟踪
分析跟踪文件的工具有ClearTrace和RML Utilities for SQL Server
对于SQL Server 2012及以上版本，推荐使用Extended Events代替Trace~
2.3、SQLNexus
SQLNexus用于分析SQLDiag和PSSDiag捕获的数据。我们可以自定义SQLDiag和PSSDiag的默认模版：SQLdiag-配置文件-PerfmonCollector、SQLdiag-配置文件-ProfilerCollector、SQLdiag-配置文件-扩展
收集SQLDiag数据过于繁琐，现在很少有人使用SQLNexus工具~
2.4、Essential DMVs for Monitoring
DMVs非常实用，每次需要使用的时候到处查找，得找个时间把这个坑填补一番

sys.dm_os_wait_stats --I want to know what SQL Server is waiting on, when there is a problem and when there isn’t.  
sys.dm_exec_requests --When I want to see what’s executing currently, this is where I start.
sys.dm_os_waiting_tasks --In addition to the overall waits, I want to know what tasks are waiting right now (and the wait_type).
sys.dm_exec_query_stats --execution count and resource usage
sys.dm_exec_query_plan --This DMV has cached plans as well as those for queries that are currently executing.
sys.dm_db_stats_properties --I always take a look at statistics in new systems, and when there’s a performance issue, initially just to check when they were last updated and the sample size.

Glenn had a great set of diagnostic queries to use for monitoring and troubleshooting.
三、Are your indexing strategies working?
对于一个全新的系统，可以按照以下三个步骤分析索引信息
step1、是否存在无效的索引
step2、是否存在臃肿和不健康的索引
step3、是否需要添加新索引
3.1、Getting rid of the dead weight
3.1.1、Fully duplicate indexes
如果不知道索引的内部机制，可能会比你想象的更难以识别重复索引。它并不总是简单的col1列上的Index1和col1上的Index2。
在内部，SQL Server会添加列到索引，大多数命令(比如sp_helpindex)不会显示这些内部添加的列。
可以参考Identifying Duplicate Indexes得到重复索引信息。
注意：你可能会打断使用索引提示的应用程序，因此，请当心！通常在删除索引之前最好先禁用一段时间。
3.1.2、Unused Indexes
从未使用的索引和重复索引一样消耗资源。你可以使用sys.dm_db_index_usage_stats获取索引的使用情况，注意在sqlserver2012某些版本，重建索引会清空sys.dm_db_index_usage_stats中此索引的条目
user_updates列只反映语句的数量，不反映影响的行数。例如，我执行以下语句

UPDATE Table SET ColumnX = VALUE

影响10000行，那么表和包含ColumnX列的索引对应的user_updates都会累加1(update/delete/insert 类似)

--数据表、索引参考Identifying Duplicate Indexes(http://www.cnblogs.com/Uest/p/6679504.html)中的测试数据
USE Test
GO
SELECT * 
INTO Test.dbo.SalesOrderDetail_IndexUsage
FROM AdventureWorks2008R2.Sales.SalesOrderDetail
GO
--CREATE UNIQUE CLUSTERED INDEX SalesOrderDetail_IndexUsage ON dbo.SalesOrderDetail(SalesOrderDetailID)
CREATE INDEX IX_SalesOrderID1 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid,SalesOrderDetailID) INCLUDE(LineTotal)
--CREATE INDEX IX_SalesOrderID2 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid) INCLUDE(LineTotal)
--CREATE INDEX IX_SalesOrderID3 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid) INCLUDE(SalesOrderDetailID,LineTotal)
--CREATE UNIQUE INDEX IX_SalesOrderID4 ON dbo.SalesOrderDetail_IndexUsage(SalesOrderID,rowguid) INCLUDE(SalesOrderDetailID,LineTotal)
GO

--查看索引使用情况
select o.name,i.index_id,i.name,user_seeks,user_scans,user_lookups,user_updates
from sys.dm_db_index_usage_stats ddus
inner join sys.tables o
on ddus.object_id=o.object_id
inner join sys.indexes i
on ddus.index_id=i.index_id
and ddus.object_id=i.object_id
where database_id = db_id()
and o.name='SalesOrderDetail_IndexUsage'
order by i.index_id

--查询返回12行
select * from SalesOrderDetail_IndexUsage WHERE SalesOrderID=43659
--sys.dm_db_index_usage_stats结果
name    index_id    name    user_seeks    user_scans    user_lookups    user_updates
SalesOrderDetail_IndexUsage    0    NULL    0    0    1    0
SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    1    0    0    0

--更新影响12行
UPDATE SalesOrderDetail_IndexUsage 
SET LineTotal=LineTotal*1
WHERE SalesOrderID=43659
--sys.dm_db_index_usage_stats结果
name    index_id    name    user_seeks    user_scans    user_lookups    user_updates
SalesOrderDetail_IndexUsage    0    NULL    0    0    1    1
SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    2    0    0    1

--删除影响12行
DELETE from SalesOrderDetail_IndexUsage 
WHERE SalesOrderID=43659
--sys.dm_db_index_usage_stats结果
name    index_id    name    user_seeks    user_scans    user_lookups    user_updates
SalesOrderDetail_IndexUsage    0    NULL    0    0    1    2
SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    3    0    0    2

--插入12行
SET IDENTITY_INSERT dbo.SalesOrderDetail_IndexUsage on
INSERT INTO SalesOrderDetail_IndexUsage
(SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,LineTotal,rowguid,ModifiedDate)
select * from AdventureWorks2008R2.Sales.SalesOrderDetail
WHERE SalesOrderID=43659
SET IDENTITY_INSERT dbo.SalesOrderDetail_IndexUsage OFF
--sys.dm_db_index_usage_stats结果
name    index_id    name    user_seeks    user_scans    user_lookups    user_updates
SalesOrderDetail_IndexUsage    0    NULL    0    0    1    3
SalesOrderDetail_IndexUsage    2    IX_SalesOrderID1    3    0    0    3

--注意在sqlserver2012某些版本，重建索引会清空sys.dm_db_index_usage_stats中此索引的条目
ALTER INDEX IX_SalesOrderID1 ON SalesOrderDetail_IndexUsage REBUILD

-- Clean up
--DROP TABLE Test.dbo.SalesOrderDetail_IndexUsage

View Code

3.1.3、Similar or semi-redundant indexes
你可能会有一些适合合并的索引
Indexes that have the same key (but possibly different included columns)

Index1: Key = LastName
Index2: Key = LastName, INCLUDE = FirstName

在这种情况下你不"需要"Index1，因为Index1能做的Index2都能做。然而Index2要宽些。因此下面的查询需要更多的I/O

SELECT LastName,COUNT(*) FROM TableName GROUP BY LastName

但是，问题是这个查询有多重要？这个索引使用频率是多少？你可以使用sys.dm_db_index_usage_stats检查索引使用情况。
Indexes that have left-based subsets of other index KEYS

Index1: Key = LastName, FirstName, MiddleInitial
Index2: Key = LastName INCLUDE = SSN
Index3: Key = LastName, FirstName INCLUDE = phone

这种情况下每个索引提供特定使用，然而你会冗余很多数据。如果我们创建一个新索引: LastName,FirstName,MiddleInitial INCLUDE(SSN,phone)
同样，这个新索引比之前的3个索引都要宽，但是这个新索引有更多的用途而且它的总开销更少(只需要维护一个索引，磁盘上只有一个索引，缓存中只有一个索引)。但是，你还是得确定使用窄索引的查询有多重要，以及使用新索引会消耗多少更多的资源。
Index consolidation is a critical step in reducing waste and table bloat but there isn’t a simple answer to every consolidation option.This is another "it depends" case.
3.2、Analyze the health of your existing indexes
在清理重复/未使用/相似索引后，要确保现有的索引是健康的
Make sure your index maintenance routines at indexes on tables AND views
Make sure your index routines use a LIMITED scan if you’re only analyzing avg_fragmentation_in_percent
更多内容可查看Index Maintenance
3.3、Adding more indexes
这是一个棘手的问题。在添加索引方面有很多好/坏的做法。最糟糕的是，大多数人在没有真正全面分析(并正确分析)现有索引的情况下添加索引。
我说正确分析索引的原因是，那些sp_helpindex和SSMS工具会隐藏部分被添加到索引的列。除非你真正了解你的索引，否则你不可能正确的添加新索引，同时合并现有索引。
虽然我强烈建议你把缺失索引作为指南，我希望你记住它们不是完美的：
The missing index DMVs only tune the plan that was executed. If the plan performed a hash join then the index is going to help the hash join. But, it’s unlikely that the join type will change. And, it might be the case that a different index would perform a different join type and the query would be even faster.
缺失索引只是针对每个索引给出最好的索引，但是你得综合考虑，你不可能为每一个查询单独创建一个索引。
缺失索引可能显示已经存在索引，SQL Server的Missing index DMV的 bug可能会使你失去理智
 四、Essential PerfMon counters
可以通过导出/编辑PAL模板，得到性能计数器配置文件。日常收集使用的计数器参考：模板-Perfmon
得到性能监视器文件后，可以先在命令行使用relog命令对其处理，之后再借助PAL工具进行分析~

--relog /?
--列出输入文件中的性能计数器
relog F:\TroubleShooting\Perfmon\SamplePerfmonLog.blg -q -o F:\TroubleShooting\Perfmon\PerfmonCounters.txt -y
--从输入文件中筛选出计数器
relog F:\TroubleShooting\Perfmon\DataCollector01.blg -c "\Memory\Available MBytes" -o F:\TroubleShooting\Perfmon\logfile.blg -y

--截取某段时间内的计数器到新的文件中
relog F:\TroubleShooting\Perfmon\DataCollector01.blg -b 2014/10/16 14:14:00 -e 2014/10/16 14:15:00 -o F:\TroubleShooting\Perfmon\logfile.blg -y
--转到csv文件中
relog F:\TroubleShooting\Perfmon\DataCollector01.blg -o F:\TroubleShooting\Perfmon\Counters.csv -f csv

--以2*原采样间隔重新收集
relog F:\TroubleShooting\Perfmon\DataCollector02.blg -b 2014/10/16 14:30:00 -e 2014/10/16 14:35:00 -o F:\TroubleShooting\Perfmon\logfile.blg -y -t 2

View Code

An important consideration when looking at performance counters, or any monitoring data from SQL Server for that matter, is that no single data point will tell you the root cause of a problem, if one occurs. For performance counters, you need to look across multiple counters for correlating information to pinpoint the root of problems.
总结
很巧原文提及的部分知识点可以从工具分类中找到，继续搬砖 ●-●

posted @ 2017-05-06 18:01 Uest 阅读(375) 评论(4) 收藏举报

刷新页面返回顶部

	【作者】：醒嘞
	【出处】： http://www.cnblogs.com/Uest/
	【声明】：本文内容仅代表个人观点。如需转载请保留此段声明，且在文章页面明显位置给出原文链接！

【译】The Accidental DBA：Troubleshooting Performance

公告