问题描述

上一期的需求上线之后,线上多了一个异常:Connection reset。如下:

[2017-03-22 00:45:00 ERROR] [creativeAuditTaskScheduler_Worker-9] (cn.com.ServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset
org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset
	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:558)
	at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:511)
	at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:248)
	at cn.com.ServiceImpl.java:136)
	at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:98)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:262)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
	at com.sun.proxy.$Proxy28.getAuditResult(Unknown Source)
	at cn.com.QueryAuditResultTask.queryAuditResult(QueryAuditResultTask.java:117)
	at cn.com.QueryAuditResultTask.execute(QueryAuditResultTask.java:88)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:196)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
	at sun.security.ssl.InputRecord.read(InputRecord.java:480)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
	at org.apache.http.impl.conn.LoggingInputStream.read(LoggingInputStream.java:87)
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
	at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
	at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:52)
	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:542)
	... 19 more

这里使用Spring RestTemplate调外部接口查询结果。Spring RestTemplate 配置如下:

    <bean id="ky.pollingConnectionManager" class="org.apache.http.impl.conn.PoolingHttpClientConnectionManager">
        <!--整个连接池的并发-->
        <property name="maxTotal" value="1000" />
        <!--每个路由的并发-->
        <property name="defaultMaxPerRoute" value="32" />
    </bean>

    <bean id="ky.httpClientBuilder" class="org.apache.http.impl.client.HttpClientBuilder" factory-method="create">
        <property name="connectionManager" ref="ky.pollingConnectionManager" />
        <!--开启重试-->
        <property name="retryHandler">
        <!--新加的异常处理,只处理ConnectTimeoutException和UnknownHostException异常-->
        <!--上一版本使用的是org.apache.http.impl.client.DefaultHttpRequestRetryHandler-->
        <!--默认处理InterruptedIOException、UnknownHostException、ConnectException、SSLException 4种异常-->
            <bean class="cn.com.autodx.common.http.CustomRequestRetryHandler">
                <constructor-arg value="3"/>
                <constructor-arg value="true"/>
                <constructor-arg value="200"/>
            </bean>
        </property>
        <!--开启keep-Alive-->
        <property name="keepAliveStrategy">
            <bean class="org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy" />
        </property>
        <property name="defaultHeaders">
            <list>
                <bean class="org.apache.http.message.BasicHeader">
                    <constructor-arg value="User-Agent"/>
                    <constructor-arg value="Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36"/>
                </bean>
                <bean class="org.apache.http.message.BasicHeader">
                    <constructor-arg value="Accept-Encoding"/>
                    <constructor-arg value="gzip,deflate"/>
                </bean>
                <bean class="org.apache.http.message.BasicHeader">
                    <constructor-arg value="Accept-Language"/>
                    <constructor-arg value="zh-CN"/>
                </bean>
                <bean class="org.apache.http.message.BasicHeader">
                    <constructor-arg value="Connection"/>
                    <constructor-arg value="keep-alive"/>
                </bean>
            </list>
        </property>
    </bean>

    <bean id="ky.httpClient" factory-bean="ky.httpClientBuilder" factory-method="build" />

    <bean id="ky.clientHttpRequestFactory" class="org.springframework.http.client.HttpComponentsClientHttpRequestFactory">
        <constructor-arg ref="ky.httpClient"/>
        <!--连接超时时间,毫秒-->
        <property name="connectTimeout" value="2000"/>
        <!--读写超时时间,毫秒-->
        <property name="readTimeout" value="5000"/>
    </bean>

    <bean id="restTemplate" class="org.springframework.web.client.RestTemplate">
        <constructor-arg ref="ky.clientHttpRequestFactory"/>
        <property name="errorHandler">
            <bean class="org.springframework.web.client.DefaultResponseErrorHandler"/>
        </property>
        <property name="messageConverters">
            <list>
                <bean class="org.springframework.http.converter.FormHttpMessageConverter"/>
                <bean class="cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter"/>
                <bean class="org.springframework.http.converter.StringHttpMessageConverter">
                    <property name="supportedMediaTypes">
                        <list>
                            <value>text/html;charset=UTF-8</value>
                            <value>application/json</value>
                        </list>
                    </property>
                </bean>
            </list>
        </property>
    </bean>

本次需求,并没有修改逻辑,为什么会出现这种情况呢?只是网络关系,还是跟代码有关呢。我有几个疑问:

  1. 什么情况下会产生Connection reset?
  2. 长连接中,向server发请求,是先发送数据的,如果连接断开,应该是写数据异常,为什么是读数据异常呢?请求是否发送成功?发送之前有校验连接是否可用吗?
  3. http连接池defaultMaxPerRoute什么意思?每个并发都建立一个长连接吗?
  4. Connection reset之后,如何重新建立连接,继而继续进行业务交互?
  5. RestTemplate中配置了重试,为什么没有重新发起连接?

我们便来解答上面的问题。

问题1,什么情况下会产生Connection reset?

网上搜一下,很多这样的打包附送的答案,如下:

第1个异常是java.net.BindException:Address already in use: JVM_Bind。该异常发生在服务器端进行new ServerSocket(port)(port是一个0,65536的整型值)操作时。异常的原因是以为与port一样的一个端口已经被启动,并进行监听。此时用netstat –an命令,可以看到一个Listending状态的端口。只需要找一个没有被占用的端口就能解决这个问题。

第2个异常是java.net.ConnectException: Connection refused: connect。该异常发生在客户端进行new Socket(ip, port)操作时,该异常发生的原因是或者具有ip地址的机器不能找到(也就是说从当前机器不存在到指定ip路由),或者是该ip存在,但找不到指定的端口进行监听。出现该问题,首先检查客户端的ip和port是否写错了,如果正确则从客户端ping一下服务器看是否能ping通,如果能ping通(服务服务器端把ping禁掉则需要另外的办法),则看在服务器端的监听指定端口的程序是否启动,这个肯定能解决这个问题。

第3个异常是java.net.SocketException: Socket is closed,该异常在客户端和服务器均可能发生。异常的原因是己方主动关闭了连接后(调用了Socket的close方法)再对网络连接进行读写操作。

第4个异常是java.net.SocketException: (Connection reset或者Connect reset by peer:Socket write error)。该异常在客户端和服务器端均有可能发生,引起该异常的原因有两个,第一个就是如果一端的Socket被关闭(或主动关闭或者因为异常退出而引起的关闭),另一端仍发送数据,发送的第一个数据包引发该异常(Connect reset by peer)。另一个是一端退出,但退出时并未关闭该连接,另一端如果在从连接中读数据则抛出该异常(Connection reset)。简单的说就是在连接断开后的读和写操作引起的。

第5个异常是java.net.SocketException: Broken pipe。该异常在客户端和服务器均有可能发生。在第4个异常的第一种情况中(也就是抛出SocketExcepton:Connect reset by peer:Socket write error后),如果再继续写数据则抛出该异常。前两个异常的解决方法是首先确保程序退出前关闭所有的网络连接,其次是要检测对方的关闭连接操作,发现对方关闭连接后自己也要关闭该连接。

这里我们关心的是第四个异常,即server已经关闭了连接,client仍然在从连接中读数据。

细节剖析

正常流程(成功日志)剖析

接下来,先逐步debug,分析调用成功的日志。

org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)

(org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domainName/creative/getAuditInfo"
(org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json]
(org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default
(org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context

org.apache.http.impl.execchain.MainClientExec.execute

  1. 获取ConnectionRequest connManager.requestConnection(route, userToken);
    调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
  1. 根据ConnectionRequest,获取HttpClientConnection [调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection]
    获取连接后,校验连接不可用,则关闭连接connection。后面判断connection没有打开,就会重新建立连接
(org.apache.http.wire:?) - http-outgoing-798 << "end of stream" 
 //isStale返回true     
(org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-798: Close connection    
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 799][route: {s}->https://domainName:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]

3.如果managedConn没有打开, 则建立路由 establishRoute

(org.apache.http.impl.execchain.MainClientExec:?) - Opening connection {s}->https://domainName:443

调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect
调用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect 遍历地址集,新建socket,并与connection绑定,成功即返回。 以下是建立连接的过程

(org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connecting to domainName/0.0.0.0:443

调用 sock = sf.connectSocket(connectTimeout, sock, host, remoteAddress, localAddress, context); //建立socket
conn.bind(sock); //绑定socket

(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Connecting socket to domainName/0.0.0.0:443 with timeout 2000
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled protocols: [TLSv1]
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled cipher suites:[]  //session.getCipherSuite()

调用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket sslsock.startHandshake();

(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Starting handshake

调用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket.verifyHostname(sslsock, host.getHostName()); session可用,打印信息

(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Secure session established
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  negotiated protocol: TLSv1  
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  peer principal: CN=*.home.com.cn, OU=IT, O="Beijing Autohome Information Technology Co., Ltd.", L=Beijing, ST=Beijing, C=CN
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  issuer principal: CN=Symantec Class 3 Secure Server CA - G4, OU=Symantec Trust Network, O=Symantec Corporation, C=US

调用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect

(org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connection established localhost:4357<->0.0.0.0:443
  1. 设置socket超时时间 managedConn.setSocketTimeout(timeout);
(org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-799: set socket timeout to 5000
  1. 开始处理请求 根据request头部参数,作相应操作
(org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
(org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
(org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
  1. 处理请求 requestExecutor.execute(request, managedConn, context);
    调用:org.apache.http.protocol.HttpRequestExecutor.execute
    主要功能:
  HttpResponse response = doSendRequest(request, conn, context);
  if (response == null) {
      response = doReceiveResponse(request, conn, context);
  }

6.1 doSendRequest
6.1.1 发送请求头部
调用org.apache.http.impl.DefaultBHttpClientConnection.sendRequestHeader (确保socket有效,然后向缓存写请求头部,写完后打印以下日志)

(org.apache.http.headers:?) - http-outgoing-799 >> GET /creative/getAuditInfo HTTP/1.1
(org.apache.http.headers:?) - http-outgoing-799 >> Accept: application/json, application/*+json, text/html, application/json
(org.apache.http.headers:?) - http-outgoing-799 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
(org.apache.http.headers:?) - http-outgoing-799 >> Accept-Encoding: gzip,deflate
(org.apache.http.headers:?) - http-outgoing-799 >> Accept-Language: zh-CN
(org.apache.http.headers:?) - http-outgoing-799 >> Connection: keep-alive
(org.apache.http.headers:?) - http-outgoing-799 >> Host: domainName

6.1.2 发送请求头部之后,如果请求带有entity,则继续发送entity,即conn.sendRequestEntity。这里对HTTP 1.0协议做了兼容判断

6.1.3 通过connection将所有缓存数据发送到服务端,并记录日志如下: (headers和wire的日志稍有不同。)

(org.apache.http.wire:?) - http-outgoing-799 >> "GET /creative/getAuditInfo HTTP/1.1[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Accept: application/json, application/*+json, text/html, application/json[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Encoding: gzip,deflate[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Language: zh-CN[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Connection: keep-alive[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Host: domainName[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "[\r][\n]"

6.2 doReceiveResponse 如果response为空,则接收服务端的响应

 response = conn.receiveResponseHeader();
 if (canResponseHaveBody(request, response)) {
     conn.receiveResponseEntity(response);
 }

6.2.1 接收响应头 receiveResponseHeader
--- 读入数据,头部和数据体,第一次读取内容【这里数据包读了两次】 << 表示接收数据,从输入流读入

(org.apache.http.wire:?) - http-outgoing-799 << "HTTP/1.1 200 OK[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Server: 10.29[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Date: Wed, 22 Mar 2017 01:50:00 GMT[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Content-Type: application/json;charset=UTF-8[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Transfer-Encoding: chunked[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Connection: keep-alive[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Pragma: no-cache[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Cache-Control: no-cache, no-store, max-age=0[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Expires: Thu, 01 Jan 1970 00:00:00 GMT[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Content-Language: zh-CN[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "6f7[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "{"data":{......... data1

--- 接收完之后,打印日志

(org.apache.http.headers:?) - http-outgoing-799 << HTTP/1.1 200 OK
(org.apache.http.headers:?) - http-outgoing-799 << Server: 10.29
(org.apache.http.headers:?) - http-outgoing-799 << Date: Wed, 22 Mar 2017 01:50:00 GMT
(org.apache.http.headers:?) - http-outgoing-799 << Content-Type: application/json;charset=UTF-8
(org.apache.http.headers:?) - http-outgoing-799 << Transfer-Encoding: chunked
(org.apache.http.headers:?) - http-outgoing-799 << Connection: keep-alive
(org.apache.http.headers:?) - http-outgoing-799 << Pragma: no-cache
(org.apache.http.headers:?) - http-outgoing-799 << Cache-Control: no-cache, no-store, max-age=0
(org.apache.http.headers:?) - http-outgoing-799 << Expires: Thu, 01 Jan 1970 00:00:00 GMT
(org.apache.http.headers:?) - http-outgoing-799 << Content-Language: zh-CN

6.2.2 如果响应含有消息体,则接收消息体

  1. 检查reuse策略和keepAlive策略,设置connection属性。 这里永久保留
    (org.apache.http.impl.execchain.MainClientExec:?) - Connection can be kept alive indefinitely

org.springframework.web.client.RestTemplate

  1. 处理请求response = request.execute(); 日志如上
    2.如果响应没有出错,则打印以下日志
(org.springframework.web.client.RestTemplate:?) - GET request for "https://domainName/creative/getAuditInfo?" resulted in 200 (OK)
  1. 读取扩展字段 extractData

3.1使用HttpMessageConverter读取并解析数据
调用org.springframework.http.converter.GenericHttpMessageConverter.read方法,从输入流中读取数据

(org.springframework.web.client.RestTemplate:?) - Reading [java.lang.String] as "application/json;charset=UTF-8" using [cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter@56a00a64]

3.2如果上一次的数据没有读全,这里会接着从输入流读取数据

(org.apache.http.wire:?) - http-outgoing-799 << "......  data2"
(org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "0[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"

3.3 数据读取完毕,就释放连接
调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection

(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection [id: 799][route: {s}->https://domainName:443] can be kept alive indefinitely
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 799][route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]

RestTemplate调用完毕

(cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:?) - getAuditResult .....

总结一下流程:

  • 创建连接请求
  • 根据连接请求的参数,从连接池中获取一个连接
  • 如果连接没有打开,则创建一个底层的socket连接。
  • 设置socket超时时间
  • 发送请求头部(如果请求中带有entity,则发送)
  • 接收响应(先接收头部,如果有主体,则接收)
  • 读取扩展数据(使用HttpMessageConverter读取并解析数据,读取完成后,关闭输入流及释放连接池中的连接)
  • 调用完毕,返回数据

本次异常日志剖析

接下来,查看失败日志:
org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)

 (org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domain/creative/getAuditInfo"
 (org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json]
 (org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default
 (org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context

org.apache.http.impl.execchain.MainClientExec.execute

  1. 获取ConnectionRequest connManager.requestConnection(route, userToken);
    调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
 (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domain:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]

2.根据ConnectionRequest,获取HttpClientConnection
调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection
其间获取entry时,校验connection().isStale()。

    public boolean isStale() {
        if (!isOpen()) {
            return true;
        }
        try {
        	//如果测试结果返回-1说明不可用
            final int bytesRead = fillInputBuffer(1);
            return bytesRead < 0;
        } catch (final SocketTimeoutException ex) {
            //注意这里SocketTimeoutException时,认为是可用的
            return false;
        } catch (final IOException ex) {
            //有I/O异常,不可用
            return true;
        }
    }

这里读超时,返回连接可用。于是后面,就没有关闭连接。也就没有重新建立新连接。具体参见这篇文章

 (org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Read timed out"
 (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]

设置socket超时时间 managedConn.setSocketTimeout(timeout);

 (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: set socket timeout to 5000

处理请求,并发送请求数据。处理过程同成功日志中的流程。

 (org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
 (org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
 (org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
 (org.apache.http.headers:?) - http-outgoing-766 >> GET /creative/getAuditInfo HTTP/1.1
 (org.apache.http.headers:?) - http-outgoing-766 >> Accept: application/json, application/*+json, text/html, application/json
 (org.apache.http.headers:?) - http-outgoing-766 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
 (org.apache.http.headers:?) - http-outgoing-766 >> Accept-Encoding: gzip,deflate
 (org.apache.http.headers:?) - http-outgoing-766 >> Accept-Language: zh-CN
 (org.apache.http.headers:?) - http-outgoing-766 >> Connection: keep-alive
 (org.apache.http.headers:?) - http-outgoing-766 >> Host: domain
 (org.apache.http.wire:?) - http-outgoing-766 >> "GET /creative/getAuditInfo HTTP/1.1[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Accept: application/json, application/*+json, text/html, application/json[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Encoding: gzip,deflate[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Language: zh-CN[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Connection: keep-alive[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Host: domain[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "[\r][\n]"

接收相应数据时,出现I/O异常,关闭连接,并向上抛出异常。

 (org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Connection reset"
 (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Close connection
 (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Shutdown connection
 (org.apache.http.impl.execchain.MainClientExec:?) - Connection discarded
 (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 0 of 32; total allocated: 0 of 1000]
 (cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://domain/creative/getAuditInfo":Connection reset; nested exception is java.net.SocketException: Connection reset

对比失败与成功的日志,发现问题在于获取connection的时候,校验connection是否可用的操作上。此时服务器因为不可知的原因断开了连接(服务端不可以向客户端发数据),这里应该是没有按照正常流程进行四次挥手,所以客户端还保持着连接(可以向服务端发数据,但收不到数据)。测试连接时,客户端读超时(必然的),但此时认为连接可用,实际上不可用(不知道这里是不是认为给的1ms探测时间太短了,允许读超时?),然后就没有重新建立连接。将错误操作延迟到读取请求这一步。

现在可以回答前面的问题

  • 长连接中,向server发请求,是先发送数据的,如果连接断开,应该是写数据异常,为什么是读数据异常呢?请求是否发送成功?发送之前有校验连接是否可用吗?
    本次异常发生在发送完请求,读取response的时候,所以是read异常。既然服务端连接断掉,请求应该是没有发送成功。发送之前有检查连接是否可用,然而检查认为连接可用。

  • http连接池defaultMaxPerRoute什么意思?每个并发都建立一个长连接吗?
    为每一个路由建立一个连接池,连接数最大为defaultMaxPerRoute。每一个连接都是一个socket连接。如果配置为长连接,则是长连接。

  • Connection reset之后,如何重新建立连接,继而继续进行业务交互?
    上面的分析,connection reset之后,把有问题的连接关闭掉了,所以,后面不会再使用这个连接,只要重试,一般是可以成功的。

  • RestTemplate中配置了重试,为什么没有重新发起连接?
    在org.apache.http.impl.execchain.RetryExec的execute(调用MainClientExec的execute,但是被RestTemplate的doExecute调用)中,有如下重试处理:

    public CloseableHttpResponse execute(final HttpRoute route,
            final HttpRequestWrapper request, final HttpClientContext context,
            final HttpExecutionAware execAware) throws IOException, HttpException {
        ......

        //不停重试,停下的判断在retryHandler.retryRequest
        for (int execCount = 1;; execCount++) {
            try {
                //执行操作
                return this.requestExecutor.execute(route, request, context, execAware);
            } catch (final IOException ex) {
                ......

                //判断是否符合重试的条件   
                if (retryHandler.retryRequest(ex, execCount, context)) {
                    //重试的话就打印日志
                    if (this.log.isInfoEnabled()) {
                        this.log.info("I/O exception ("+ ex.getClass().getName() +
                                ") caught when processing request to "
                                + route + ": " + ex.getMessage());
                    }
                    //是否可以重试
                    if (!RequestEntityProxy.isRepeatable(request)) {
                        this.log.debug("Cannot retry non-repeatable request");
                        throw new NonRepeatableRequestException("Cannot retry request " +
                                "with a non-repeatable request entity", ex);
                    }
                    request.setHeaders(origheaders);
                } else {
                    //不符合重试条件,就抛出异常
                    if (ex instanceof NoHttpResponseException) {
                        final NoHttpResponseException updatedex = new NoHttpResponseException(
                                route.getTargetHost().toHostString() + " failed to respond");
                        updatedex.setStackTrace(ex.getStackTrace());
                        throw updatedex;
                    } else {
                        throw ex;
                    }
                }
            }
        }
    }

上一个版本中,没有connection reset的Error日志,这一个版本就有了。如果说新版本上线之后网络才有问题,可能性小。所以,以前和现在都会偶尔有网络问题产生。
如果之前版本有网络问题,为什么没有Error日志呢?是否有重试呢?是否通过重试补救了呢?
查看一下以前的info日志,发现的确有重试,也的确补救成功(没有类似的Connection reset报出)。

//3.11
(org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset

//3.12
(org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset

然后查看新版本上线之后的日志,没有发现重试日志,说明SocketException能被DefaultHttpRequestRetryHandler处理,而不能被CustomRequestRetryHandler处理。在对比两者的异常处理类型,发现DefaultHttpRequestRetryHandler处理的ConnectException extends SocketException,所以能处理。因为,我们只要在CustomRequestRetryHandler中,加入SocketException或者ConnectException,就可以通过重试解决网络问题。

至此,我们搞清楚了这个Connection reset的问题,并且找到了解决方案,开心ing~~

补充与参考

几个重要的类说明:

  • MainClientExec
    执行流程核心类,execute方法。这里有源码分析,还不错。

  • PoolingHttpClientConnectionManager
    用来管理connection,方法如:requestConnection(从连接池中lease一个连接,连接的状态可能是关闭的),connect(真正建立底层socket连接),releaseConnection(释放连接池中的连接)等。这里有一篇分析。

  • 如何检查httpConnection连接是否有效

方法一,org.apache.http.HttpConnection的isStale()方法
    public boolean isStale() {
        //没有打开,即socket为空,则不可用
        if (!isOpen()) {
            return true;
        }
        try {
        //socket链路有了,测试链路是否可用
        //这里的测试方法是查看很短的时间内(这里是1ms),是否可以从输入流中读到数据
        //如果测试结果返回-1说明不可用
            final int bytesRead = fillInputBuffer(1);
            return bytesRead < 0;
        } catch (final SocketTimeoutException ex) {
            //注意这里SocketTimeoutException时,认为是可用的
            return false;
        } catch (final IOException ex) {
            //有I/O异常,不可用
            return true;
        }
    }
方法二、org.apache.http.HttpClientConnection的isResponseAvailable方法

有几种实现方法,但大都被弃用。查看org.apache.http.impl.DefaultBHttpClientConnection的实现。

    @Override
    public boolean isResponseAvailable(final int timeout) throws IOException {
        ensureOpen();
        try {
            return awaitInput(timeout);
        } catch (final SocketTimeoutException ex) {
           //这里与isStale不同,SocketTimeoutException时认为不可用
            return false;
        }
    }

    protected boolean awaitInput(final int timeout) throws IOException {
        if (this.inbuffer.hasBufferedData()) {
            return true;
        }
        //BHttpConnectionBase 中的fillInputBuffer方法,与isStale的测试方法一致
        fillInputBuffer(timeout);
        return this.inbuffer.hasBufferedData();
    }

比较两种方法:isResponseAvailable暂时没有找到使用的地方;isStale在MainClientExec.execute中有直接使用。使用的话需要配置staleConnectionCheckEnabled,即每次请求都要去检测(最高耗时30ms)一次,看起来有一点悲观锁的意思,对性能影响比较大。所以4.4版本开始,默认值为false,且功能被标识为过时。这种方法不推荐。

if (config.isStaleConnectionCheckEnabled()) {
    // validate connection
    if (managedConn.isOpen()) {
        this.log.debug("Stale connection check");
        if (managedConn.isStale()) {
            this.log.debug("Stale connection detected");
            managedConn.close();
        }
    }
}
方法三:官方推荐的org.apache.http.impl.conn.PoolingHttpClientConnectionManager#getValidateAfterInactivity()

对非活动的永久连接,每个validateAfterInactivity毫秒(默认2s)做一次链路检查,尽量确保在使用的时候是可用的。为什么说尽量呢?本文就是一个例子,2s检查没有问题,但在使用之前的2s内网络出了问题,这就没有办法了。

validateAfterInactivity的使用,在org.apache.http.pool.AbstractConnPool中,getPoolEntryBlocking方法。

if (entry.isExpired(System.currentTimeMillis())) {
    entry.close();
} else if (this.validateAfterInactivity > 0) {
    if (entry.getUpdated() + this.validateAfterInactivity <= System.currentTimeMillis()) {
        if (!validate(entry)) {
            entry.close();
        }
    }
}

详细解释在这里

posted on 2017-03-29 22:45  漫夭  阅读(166480)  评论(6编辑  收藏  举报