代码改变世界

VmWare平台Windows Server 2012 无响应宕机

2016-10-20 17:10  潇湘隐者  阅读(3452)  评论(0编辑  收藏  举报

我们生产服务器都部署在VMware ESXi 5.5平台上,最近大半年的时间,偶尔就会出现操作系统为Windows Servre 2012的服务器出现没有任何响应(unresponsive)的情况,出现问题的时候,服务器有下面一些现象:

 

1: 应用程序无法访问SQL Server数据库,使用Microsoft SQL Server Management Sutdio去测试连接数据库,也会返回连接错误。

2: 网络有时候能Ping通,有时候是Ping不通的情况。

3: 远程连接无法访问服务器,从VMware vSphere Client进入后,尝试从该服务器的控制台进入系统,无任何响应。无法进入系统,实质上就是一个宕机的情况。

4: 出现问题是随机的,没有任何规律可言,有时候隔了个把月出现一次,有时候好长时间才出现。

 

碰到这种情况,只能在VMware vSphere Client里面,选择该服务器,单击右键选择“电源”选项,然后关闭电源,重启电源来解决。分析过服务器和虚拟机的日志,均无法获取有价值的错误信息。后面同事在VMware官方论坛发现很多人都遇到过这种情况 Windows Server 2012 VM becomes unresponsive / VW Tools "Not Running"官方暂时没有给出结论,有个人反馈是因为赛门铁克的杀毒软件(Symantec Endpoint Protection anti-virus)引起的,如下所示:

 

Question to all that are having the problem.  Do you have SEP (Symantec Endpoint Protection) anti-virus installed on these servers?

I had similar issue and after doing a lot of tracing and log reviewing I noticed that every one of my servers that froze had a SEP definition update and within 3 minutes the server froze and had to be hard power off and back on.  If you have SEP installed check the SEP client log under applications and services logs in event viewer and see if you notice a gap between when the server froze and when you rebooted the server.  This log entry time will correspond with time entries in the system and application logs within 3 minutes or so when you have no entries until you rebooted the server.

My resolution was to uninstall SEP from the servers and I have not had anymore freeze since.  I don't know if something change in SEP but my servers has had SEP on them for years and never encounter this problem until early February and then I was getting 1-2 frozen servers each week until I uninstalled SEP and I have not had another freeze since early March.

If somebody thinks it's something else I'm all ears but SEP was the only commonality (within 3 minutes of a SEP update) my servers had in common.  The one thing I was to point out is that all my unresponsive servers were still pingable but nothing else was responding, no cntl-alt-del, no rdc, nothing.

 

一个人反馈找过VmWare和微软公司,都没有找出原因和解决方案,后面发现出现问题的服务器都安装了SEP Client 12.1.2.x,后面通过Symantec的技术支持,要求其更新最新的SEP Client 12.1.6.x,后面也确实没有出现过问题。如下所示:

 

16. Re: Windows Server 2012 VM becomes unresponsive / VW Tools "Not Running"

copelsimo1 2016-5-10 上午6:09 (回复 Robby68)

Hi to all.

In my company we have the same issue: random unresponsive server (2012/2012r2)

We have ESXI 6.0 up.2

We opened different support request (VmWare, Microsoft, etc) but no one tell us why this happened, and no solution.

Then crossing different tables from different console, i noticed that all unresponsive server had same sep version (12.1.2.x).

So i open a technical call to Symantec, and meantime i started to distribute last update of sep client (at time 12.1.6.x). This update require a system reboot, so only 30-40% of systems have been updated in the first step.

Symantec tell me we had old version of SEP, and requested us full Microsoft dump to analize (but this require reboot,too!) as well as update all client version.

No one server with last SEP version (21.1.6.x) got unresponsive.

At the end, Symantec confirm us problem was right in SEP version:

@- Fix ID: 3590578

@ Symptom: System freezes due to a deadlock in File System Auto-Protect driver after updating virus definitions.

@ Solution: Modified File System Auto-Protect driver to avoid this deadlock.

So, UPGRADING SEP TO LAST VERSION, PROBLEM SOLVED.

I hope to have helped.

Simone

Alba(CN)

 

我们系统管理员将所有遇到过这种情况的服务器的Symantec Endpoint Protection anti-virus Client都升级了,暂时也不能确定就能真正解决了这个问题,还需通过时间来验证。

 

------------------------------------------------------------PS 2017-01-06 追加下面部分内容------------------------------------------------------------

 

自服务器的Symantec Endpoint Protection anti-virus Client都升级后,从2016-10-20号到今天2017-01-06已经几个月,都没有出现过宕机情况,看来确实就是这个问题所致。可以下定论了!