性能测试项目小结
最近刚刚结束一个小的性能测试项目,抽出时间来总结性能测试中遇到的问题。此项目是关于保险的系统,主要功能是测试大单导入被保险人和大单核保的功能,之前的功能处理机制是串行,执行效率很慢,核保2万被保人规则的时候需要18小时左右,此次项目主要是通过并行处理线程提高大单导入和核保的效率,但对服务器的硬件资源要求较高。此次项目遇到如下两个突出的性能问题:
1.内存溢出的问题,报错信息如下:
Exception in thread "Thread-12" java.lang.OutOfMemoryError: Java heap space
at java.lang.reflect.Array.newInstance(Array.java:52)
at oracle.jdbc.driver.BufferCache.get(BufferCache.java:226)
at oracle.jdbc.driver.PhysicalConnection.getCharBuffer(PhysicalConnection.java:7672)
at oracle.jdbc.driver.OracleStatement.prepareAccessors(OracleStatement.java:1005)
at oracle.jdbc.driver.T4CTTIdcb.receiveCommon(T4CTTIdcb.java:277)
at oracle.jdbc.driver.T4CTTIdcb.receive(T4CTTIdcb.java:146)
at oracle.jdbc.driver.T4C8Oall.readDCB(T4C8Oall.java:844)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:358)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3593)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3637)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1495)
at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:135)
at com.sinosoft.utility.ExeSQL.execSQL(ExeSQL.java:1232)
at com.sinosoft.threadGcs.MultiThreadBatchRunner.multiThreadBatchRunner(MultiThreadBatchRunner.java:107)
at com.sinosoft.threadGcs.StartBatchRunner$MyThread.run(StartBatchRunner.java:40)
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$BlockDataInputStream.<init>(ObjectInputStream.java:2330)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:276)
at sun.rmi.server.MarshalInputStream.<init>(MarshalInputStream.java:107)
at sun.rmi.transport.ConnectionInputStream.<init>(ConnectionInputStream.java:38)
at sun.rmi.transport.StreamRemoteCall.getInputStream(StreamRemoteCall.java:115)
at sun.rmi.transport.Transport.serviceCall(Transport.java:124)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
在(PubSubmit)中抛出如下错误:java.lang.reflect.InvocationTargetException
一开始定位问题怀疑是JVM的设置参数问题,把Java虚拟机的最大虚拟内存设置到1024M(物理内存的1/4),重新试了一下还是会出现内存溢出,再次排查,怀疑是代码问题导致的,定位是哪个方法分配的类造成了内存泄露,查资料后可以通过JVM中虚拟机生成的堆内存转储,让开发很快知道问题出在哪个方法上,与开发沟通后,发现现有逻辑中当被保险人数量为2万人时,出现大对象处理的情况,考虑对规则执行结果进行分批次提交的优化,降低提交数据导致的大对象情况,果然修改完逻辑,功能上测试通过,性能回归测试了一下也不会出现内存溢出的情况了,
2.测试优化过的代码发现比之前的响应时间提高了不少,但是还没有达到性能指标,通过AWR报告观察到部分sql出现enq: TX - row lock contention异常等待问题,严重的时候占到Top 5 Timed Events的80%多,分析原因逻辑上出现保费计算的线程并发时会更新同一行数据,Oracle的机制是为了保证数据库的一致性,加了一个TX锁,这时另一个或者多个会话必须等待第一会话commit或者rollback,否则会一直等待下去,在报告中segments by row lock waits里面找到具体是哪张表导致的锁等待,加上开发的配合问题很快得到解决,达到了性能测试指标。
Top 5 Timed Events
Event | Waits | Time(s) | Avg Wait(ms) | % Total Call Time | Wait Class |
---|---|---|---|---|---|
enq: TX - row lock contention | 3,040 | 4,536 | 1,492 | 48.7 | Application |
CPU time | 4,219 | 45.3 | |||
log file sync | 96,343 | 78 | 1 | .8 | Commit |
log file parallel write | 85,062 | 52 | 1 | .6 | System I/O |
log file switch (checkpoint incomplete) | 133 | 44 | 330 | .5 | Configuration |
Segments by Row Lock Waits
- % of Capture shows % of row lock waits for each top segment compared
- with total row lock waits for all segments captured by the Snapshot
Owner | Tablespace Name | Object Name | Subobject Name | Obj. Type | Row Lock Waits | % of Capture |
---|---|---|---|---|---|---|
GCS | GCS | LDMAXNO | TABLE | 2,834 | 77.39 | |
GCS | GCS | LCGRPCONT | TABLE | 272 | 7.43 | |
GCS | GCSINDEX | IDX_LCGET_GRPCONTNO | INDEX | 65 | 1.77 | |
GCS | GCSINDEX | LCPOL_INDEX_PREM | INDEX | 52 | 1.42 | |
GCS | GCSINDEX | PK_LCGET | INDEX | 50 | 1.37 |
本人第一次写博客,只是希望记录下方便日后查阅,也希望给遇到此类问题的同学们指路,如有错误言论,欢迎批评指出!