带着问题略读 Retrofit2 源码

最近部门老大给了我一份代码让我帮忙维护，一个Java服务端应用，用于将某物联网设备的上报数据转发到另一个平台服务器。项目刚跑没多久就遇到了问题，设备一开始的请求都能及时转发出去，但是后来发现将数据转发出去的延迟越来越大，在收到一份设备数据后可能要二三十分钟才被转发出去。仔细看了项目的业务逻辑，服务在收到设备数据后略经处理后确实就请求了 http 客户端进行发送，于是我立刻怀疑请求被积压在了 http 客户端里，这里用的 http 客户端是 Retrofit2，在 Google 好几页没找到关于 Retrofit 类似的问题案例的情况下，我只能从请求的源码入手解决问题了。

Retrofit2 使用说明

要解决问题，得先知道 Retrofit2 是怎么用的：

定义一个接口并在接口中定义要发送的HTTP请求

public interface GitHubService {
  @GET("users/{user}/repos")
  Call<List<Repo>> listRepos(@Path("user") String user);
}

这里使用诸如@GET的注解描述声明的http api

构造一个 Retrofit 对象，并使用该对象创建上述接口的代理对象

Retrofit retrofit = new Retrofit.Builder()
    .baseUrl("https://api.github.com/")
    .build();

GitHubService service = retrofit.create(GitHubService.class);

create方法主要用了JDK动态代理创建了一个代理对象，与我的问题不相关就略过了

调用前面声明的接口方法得到Call对象

Call<List<Repo>> call = service.listRepos("octocat");

Call 用于发送请求并返回结果，提供了execute方法进行同步阻塞的网络请求，enqueue方法进行异步请求，我这里使用了enqueue

call.enqueue(new Callback<List<Repo>>());

顾名思义请求会进入某个队列再被异步处理，显然我的请求从这里被“吃”了，等了好久才被“吐”出来，查看源码就从这里入手。

源码浅析

@Override
public void enqueue(final Callback<T> callback) {
    checkNotNull(callback, "callback == null");

    okhttp3.Call call;
    Throwable failure;

    synchronized (this) {
      if (executed) throw new IllegalStateException("Already executed.");
      executed = true;

      call = rawCall;
      failure = creationFailure;
      if (call == null && failure == null) {
        try {
          call = rawCall = createRawCall();
        } catch (Throwable t) {
          throwIfFatal(t);
          failure = creationFailure = t;
        }
      }
    }

    if (failure != null) {
      callback.onFailure(this, failure);
      return;
    }

    if (canceled) {
      call.cancel();
    }

    call.enqueue(new okhttp3.Callback() {
      @Override public void onResponse(okhttp3.Call call, okhttp3.Response rawResponse) {
        ...
      }

      @Override public void onFailure(okhttp3.Call call, IOException e) {
        ...
      }

      private void callFailure(Throwable e) {
        ...
      }
    });
}

忽略对结果的回调处理代码，可以看到retrofit2的enqueue其实是对okhttp3的enqueue方法的一层封装（装饰者模式），内部还是通过createRawCall方法得到一个okhttp3.Call对象，调用其enqueue方法进行异步请求。那么我解决问题的视线就转移到了okhttp3.Call的enqueue方法上了，直接看其源码：

@Override public void enqueue(Callback responseCallback) {
    //状态标识，不关心
    synchronized (this) {
      if (executed) throw new IllegalStateException("Already Executed");
      executed = true;
    }
    //调用栈获取，不关心
    captureCallStackTrace();
    //监听器监听，不关心
    eventListener.callStart(this);
    //重点，真正请求的代码
    client.dispatcher().enqueue(new AsyncCall(responseCallback));
 }

en...发现套路深，这里看到真正的请求者是Dispatcher，其enqueue方法需要传入一个AsyncCall对象，简单看一下其源码发现它就是个Runnable，里面的执行体是真正的网络请求代码，其执行体源码后面再看，先看Dispatcher的enqueue方法源码：

void enqueue(AsyncCall call) {
    synchronized (this) {
      readyAsyncCalls.add(call);

      // Mutate the AsyncCall so that it shares the AtomicInteger of an existing running call to
      // the same host.
      if (!call.get().forWebSocket) {
        AsyncCall existingCall = findExistingCallWithHost(call.host());
        if (existingCall != null) call.reuseCallsPerHostFrom(existingCall);
      }
    }
    promoteAndExecute();
}

这里将传入的AsyncCall放进readyAsyncCalls这个双端队列中，又按注释说的维护了一个请求主机相关的计数器，这两件事有什么用后面才知道，显然主逻辑在promoteAndExecute：

private boolean promoteAndExecute() {
    ...
    
    List<AsyncCall> executableCalls = new ArrayList<>();
    ...
    synchronized (this) {
      for (Iterator<AsyncCall> i = readyAsyncCalls.iterator(); i.hasNext(); ) {
        AsyncCall asyncCall = i.next();

        if (runningAsyncCalls.size() >= maxRequests) break; // Max capacity.
        if (asyncCall.callsPerHost().get() >= maxRequestsPerHost) continue; // Host max capacity.

        i.remove();
        asyncCall.callsPerHost().incrementAndGet();
        executableCalls.add(asyncCall);
        runningAsyncCalls.add(asyncCall);
      }
      isRunning = runningCallsCount() > 0;
    }

    for (int i = 0, size = executableCalls.size(); i < size; i++) {
      AsyncCall asyncCall = executableCalls.get(i);
      asyncCall.executeOn(executorService());
    }
    ...
}

看到这里我的问题终于找到了突破口，这里会遍历放到readyAsynCalls队列的所有待执行的网络请求，但是在执行之前做了如下判断：当runningAsyncCalls的元素数量超过maxRequests的时候结束遍历，不再试图执行新的请求，往下面看发现确定立即执行的请求会被放到runningAsyncCalls，也就是说这个maxRequests（默认值64）即是retrofit同时执行请求的数量上限，但是其实我的请求是被第二个判断拦住的，这个后面再看。先假设代码顺利地跑到了最后几行，看看asyncCall.executeOn(executorService())是否真正执行了请求：

public synchronized ExecutorService executorService() {
    if (executorService == null) {
      executorService = new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60, TimeUnit.SECONDS,
          new SynchronousQueue<>(), Util.threadFactory("OkHttp Dispatcher", false));
    }
    return executorService;
}

原来executeOn传入了一个线程池，那么很容易猜到executeOn的源码会使用这个线程池来执行网络请求了：

void executeOn(ExecutorService executorService) {
      assert (!Thread.holdsLock(client.dispatcher()));
      boolean success = false;
      try {
        executorService.execute(this);
        success = true;
      } catch (RejectedExecutionException e) {
        InterruptedIOException ioException = new InterruptedIOException("executor rejected");
        ioException.initCause(e);
        eventListener.callFailed(RealCall.this, ioException);
        responseCallback.onFailure(RealCall.this, ioException);
      } finally {
        if (!success) {
          client.dispatcher().finished(this); // This call is no longer running!
        }
      }
}

果然，被作为传参传入的线程池反过来执行该AsyncCall（本质就是一个Runnable），其请求执行体的方法我已经不关心，回到promoteAndExecute方法看看哪里吞掉了我的请求：

private boolean promoteAndExecute() {
    ...
    if (asyncCall.callsPerHost().get() >= maxRequestsPerHost) continue; // Host max capacity.
    ...
    asyncCall.callsPerHost().incrementAndGet();
    ...
}

这里的callsPerHost()返回一个原子计数器，可以看到即将执行新的请求的时候会自增，再看看AsyncCall的请求执行方法体会看到请求完成后这个计数器会自减，通过其"perhost"的命名可猜测其记录的是单个请求主机的同时请求数量，回到前面的Dispatcher的enqueue源码可以证实这一点：

void enqueue(AsyncCall call) {
    synchronized (this) {
      ...
      // Mutate the AsyncCall so that it shares the AtomicInteger of an existing running call to
      // the same host.
      ...
      AsyncCall existingCall = findExistingCallWithHost(call.host());
      if (existingCall != null) call.reuseCallsPerHostFrom(existingCall);
    }
    promoteAndExecute();
}

private AsyncCall findExistingCallWithHost(String host) {
    for (AsyncCall existingCall : runningAsyncCalls) {
      if (existingCall.host().equals(host)) return existingCall;
    }
    for (AsyncCall existingCall : readyAsyncCalls) {
      if (existingCall.host().equals(host)) return existingCall;
    }
    return null;
}

这里调用findExistingCallWithHost从待执行队列或正在执行队列中找到同目标主机的AsyncCall，并调用reuseCallsPerHostFrom复用其计数器。再回到promoteAndExecute方法，当这个主机连接计数器达到maxRequestsPerHost后请求就会被阻止立即执行，而是留在readyAsyncCalls中等待被取出，而maxRequestsPerHost的默认值仅仅是5，这对我这里的服务端应用明显太低，经过一番断点调试，针对一部分特殊请求，目标主机会在30秒后才返回结果，导致我请求方很快就会出现可用并发连接不够，请求积压的情况，越积越多导致后面的请求可能排队半个小时才被发出！

定位到问题原因之后，解决办法也很明了了，直接配置maxRequestsPerHost的值至一个较大的合理值，并配置合理的请求超时时间，避免单主机并发连接数被很快地耗尽。

//简化的客户端配置
OkHttpClient okHttpClient = new OkHttpClient.Builder()
                .connectTimeout(10, TimeUnit.SECONDS)
                .readTimeout(15, TimeUnit.SECONDS)
                .writeTimeout(60, TimeUnit.SECONDS)
                .build();
okHttpClient.dispatcher().setMaxRequestsPerHost(64);
Retrofit retrofit = new Retrofit.Builder()
                .baseUrl(serverUrl)
                .client(okHttpClient)
                .build();

总结

通过追踪源码，问题很快得到了解决。Retrofit 依赖的 okhttp 默认配置保守，看起来确实是一个为安卓开发而生的HTTP客户端，相比高并发性能，似乎更注重代码的优雅封装和简易的使用体验，在服务端程序中应该有更合适的轮子。但是 Retrofit 的代码优雅，大量使用了各种设计模式，值得每个 Java 程序员去学习。

posted @ 2019-07-13 16:14 晴空醒阅读(233) 评论(0) 收藏举报

刷新页面返回顶部

晴空醒

带着问题略读 Retrofit2 源码

Retrofit2 使用说明

源码浅析

总结

公告