webrtc ICE 连接过程解析与业务优化

参考:

  • rfc8445

1. 概述

本篇博客主要介绍一下自己学习的一些 webrtc ice 连接和重连逻辑,最后面介绍一些针对重连的优化。

2. 一些重要的类

ice 模块有一些非常重要的类,他们各司其职,又协同工作:

类名 文件 功能
P2PTransportChannel p2p/base/p2p_transport_channel.cc 入口和控制类
BasicNetworkManager rtc_base/network.h 收集本机网卡(ip地址)信息
BasicPortAllocatorSession p2p/client/basic_port_allocator.h 控制网络地址分配会话
AllocationSequence p2p/client/basic_port_allocator.h 收集所有 local candidates
Connection p2p/base/connection.h 实现一对连接的相关动作(如心跳、连通检查等)
BasicIceController p2p/base/basic_ice_controller.cc 实现 ICE 连接选择算法

3. 发起连接过程

step 1, 发起连接

用户通过调用 P2PTransportChannel::MaybeStartGathering() 函数开始整个 ice 过程。在此函数中,会创建 BasicPortAllocatorSession 对象,表示一个网络地址分配会话被启动:

void P2PTransportChannel::AddAllocatorSession(
    std::unique_ptr<PortAllocatorSession> session) {
  ...
  session->set_generation(static_cast<uint32_t>(allocator_sessions_.size()));
  session->SignalPortReady.connect(this, &P2PTransportChannel::OnPortReady);
  session->SignalPortsPruned.connect(this, &P2PTransportChannel::OnPortsPruned);
  session->SignalCandidatesReady.connect(
      this, &P2PTransportChannel::OnCandidatesReady);
  session->SignalCandidateError.connect(this,
                                        &P2PTransportChannel::OnCandidateError);
  session->SignalCandidatesRemoved.connect(
      this, &P2PTransportChannel::OnCandidatesRemoved);
  session->SignalCandidatesAllocationDone.connect(
      this, &P2PTransportChannel::OnCandidatesAllocationDone);
  session->SignalNetworksChanged.connect(
      this, &P2PTransportChannel::OnNetworksChanged);
  if (!allocator_sessions_.empty()) {
    allocator_session()->PruneAllPorts();
  }
  allocator_sessions_.push_back(std::move(session));
  ...
}

allocator_sessions_.push_back(std::move(session)); 会将新建的 BasicPortAllocatorSession 对象保存起来。

step 2, 开始网络地址分配

BasicPortAllocatorSession 对象被创建后,会立刻通过 BasicPortAllocatorSession::StartGettingPorts() 函数启动地址分配过程。

void P2PTransportChannel::MaybeStartGathering() {
  ...
  // Start gathering if we never started before, or if an ICE restart occurred.
  if (allocator_sessions_.empty() ||
      IceCredentialsChanged(allocator_sessions_.back()->ice_ufrag(),
                            allocator_sessions_.back()->ice_pwd(),
                            ice_parameters_.ufrag, ice_parameters_.pwd)) {
  ...
    } else {
      AddAllocatorSession(allocator_->CreateSession(
          transport_name(), component(), ice_parameters_.ufrag,
          ice_parameters_.pwd));
      allocator_sessions_.back()->StartGettingPorts();
    }
  }
}

step 3, 收集本机网卡信息

启动地址分配过程后,在 BasicPortAllocatorSession::DoAllocate() 函数中,会通过 BasicPortAllocatorSession::GetNetworks() 函数拿到所有的本机网卡和 ip 地址,收集完成后,会继续根据预设规则过滤掉无用的网卡:

std::vector<rtc::Network*> BasicPortAllocatorSession::GetNetworks() {
  RTC_DCHECK_RUN_ON(network_thread_);
  std::vector<rtc::Network*> networks;
  rtc::NetworkManager* network_manager = allocator_->network_manager();
  RTC_DCHECK(network_manager != nullptr);
  if (network_manager->enumeration_permission() ==
      rtc::NetworkManager::ENUMERATION_BLOCKED) {
    set_flags(flags() | PORTALLOCATOR_DISABLE_ADAPTER_ENUMERATION);
  }
  ...
  if (flags() & PORTALLOCATOR_DISABLE_ADAPTER_ENUMERATION) {
    network_manager->GetAnyAddressNetworks(&networks);
  } else {
    network_manager->GetNetworks(&networks);
    // If network enumeration fails, use the ANY address as a fallback, so we
    // can at least try gathering candidates using the default route chosen by
    // the OS. Or, if the PORTALLOCATOR_ENABLE_ANY_ADDRESS_PORTS flag is
    // set, we'll use ANY address candidates either way.
    if (networks.empty() || flags() & PORTALLOCATOR_ENABLE_ANY_ADDRESS_PORTS) {
      network_manager->GetAnyAddressNetworks(&networks);
    }
  }
  // Filter out link-local networks if needed.
  if (flags() & PORTALLOCATOR_DISABLE_LINK_LOCAL_NETWORKS) {
    NetworkFilter link_local_filter(
        [](rtc::Network* network) { return IPIsLinkLocal(network->prefix()); },
        "link-local");
    FilterNetworks(&networks, link_local_filter);
  }
  // Do some more filtering, depending on the network ignore mask and "disable
  // costly networks" flag.
  NetworkFilter ignored_filter(
      [this](rtc::Network* network) {
        return allocator_->network_ignore_mask() & network->type();
      },
      "ignored");
  FilterNetworks(&networks, ignored_filter);
  if (flags() & PORTALLOCATOR_DISABLE_COSTLY_NETWORKS) {
    uint16_t lowest_cost = rtc::kNetworkCostMax;
    for (rtc::Network* network : networks) {
      // Don't determine the lowest cost from a link-local network.
      // On iOS, a device connected to the computer will get a link-local
      // network for communicating with the computer, however this network can't
      // be used to connect to a peer outside the network.
      if (rtc::IPIsLinkLocal(network->GetBestIP())) {
        continue;
      }
      lowest_cost = std::min<uint16_t>(lowest_cost, network->GetCost());
    }
    NetworkFilter costly_filter(
        [lowest_cost](rtc::Network* network) {
          return network->GetCost() > lowest_cost + rtc::kNetworkCostLow;
        },
        "costly");
    FilterNetworks(&networks, costly_filter);
  }
  // Lastly, if we have a limit for the number of IPv6 network interfaces (by
  // default, it's 5), remove networks to ensure that limit is satisfied.
  //
  int ipv6_networks = 0;
  for (auto it = networks.begin(); it != networks.end();) {
    if ((*it)->prefix().family() == AF_INET6) {
      if (ipv6_networks >= allocator_->max_ipv6_networks()) {
        it = networks.erase(it);
        continue;
      } else {
        ++ipv6_networks;
      }
    }
    ++it;
  }
  return networks;
}

实际过程是通过 BasicNetworkManager 类实现的,BasicNetworkManager::UpdateNetworksOnce() 函数会被周期性的调用,用于更新获取的本机网卡信息。在此函数中,会继续通过 BasicNetworkManager::CreateNetworks() 函数,真正的获取到不同平台的网卡和ip信息,以移动端为例:

bool BasicNetworkManager::CreateNetworks(bool include_ignored,
                                         NetworkList* networks) const {
  struct ifaddrs* interfaces;
  int error = getifaddrs(&interfaces);
  if (error != 0) {
    RTC_LOG_ERR(LERROR) << "getifaddrs failed to gather interface data: "
                        << error;
    return false;
  }

  std::unique_ptr<IfAddrsConverter> ifaddrs_converter(CreateIfAddrsConverter());
  ConvertIfAddrs(interfaces, ifaddrs_converter.get(), include_ignored,
                 networks);

  freeifaddrs(interfaces);
  return true;
}

step 4, 创建分配序列

分配序列即创建 AllocationSequence 对象,在 BasicPortAllocatorSession::DoAllocate() 函数中,会为每个有效网卡创建一个 AllocationSequence 对象:

// For each network, see if we have a sequence that covers it already.  If not,
// create a new sequence to create the appropriate ports.
void BasicPortAllocatorSession::DoAllocate(bool disable_equivalent) {
  std::vector<rtc::Network*> networks = GetNetworks();
  if (networks.empty()) {
    done_signal_needed = true;
  } else {
    RTC_LOG(LS_INFO) << "Allocate ports on " << networks.size() << " networks";
    PortConfiguration* config = configs_.empty() ? nullptr : configs_.back();
    for (uint32_t i = 0; i < networks.size(); ++i) {
      uint32_t sequence_flags = flags();
      ...
      AllocationSequence* sequence =
          new AllocationSequence(this, networks[i], config, sequence_flags);
      sequence->SignalPortAllocationComplete.connect(
          this, &BasicPortAllocatorSession::OnPortAllocationComplete);
      sequence->Init();
      sequence->Start();
      sequences_.push_back(sequence);
      done_signal_needed = true;
    }
  }
  if (done_signal_needed) {
    network_thread_->Post(RTC_FROM_HERE, this, MSG_SEQUENCEOBJECTS_CREATED);
  }
}

step 5, 启动分配序列

分配序列启动后,有一个被称为 phase(阶段) 的过程,这在 AllocationSequence::OnMessage() 函数中得到体现:

void AllocationSequence::OnMessage(rtc::Message* msg) {
  ...
  switch (phase_) {
    case PHASE_UDP:
      CreateUDPPorts();
      CreateStunPorts();
      break;

    case PHASE_RELAY:
      CreateRelayPorts();
      break;

    case PHASE_TCP:
      CreateTCPPorts();
      state_ = kCompleted;
      break;

    default:
      RTC_NOTREACHED();
  }
  ...
}

PHASE_UDP 开始,每完成一个 phase,就转到下一个 phase。

step 6, 创建 host port

在 AllocationSequence phase 中,首先会调用 AllocationSequence::CreateUDPPorts() 函数,此函数即从对象绑定的网卡(ip 地址)上分配一个 UDP 端口,即一个 host port。分配完成后,即得到了一个 local host candidate。

step 7, 获取 server reflex candidate

在 AllocationSequence::CreateStunPorts() 函数中,一般会利用上一步创建的 host ip+port,即通过上一步的 host ip+port 发送 stun request 消息,得到 stun 服务返回的 server reflex address。等待 stun 服务返回后,即得到一个 server reflex candidate。

step 8, 完成其它 phase

接下来还需要完成 relay 服务地址分配和 tcp 地址收集,这里不再说明。

step 9, 组合 connection

当收到对端的 candidates 后,需要通过 P2PTransportChannel::AddRemoteCandidate() 函数添加到 ice 中,添加之后通过 P2PTransportChannel::CreateConnection() 函数创建 connections。组合 connections 需要遵循一些规则,如网络类型、generation 值等,具体参考 ICE 简介 中的参考部分。

step 10, 发起连通性检查

上一步得到多个 connections 后,在 P2PTransportChannel::SortConnectionsAndUpdateState() 函数会控制更新 connection 状态,并调用 P2PTransportChannel::MaybeStartPinging() 函数开始对 connection 发起连通性检查。即通过调用 Connection::Ping() 函数发送 stun binding request 消息:

void P2PTransportChannel::MaybeStartPinging() {
  RTC_DCHECK_RUN_ON(network_thread_);
  if (started_pinging_) {
    return;
  }

  if (ice_controller_->HasPingableConnection()) {
    RTC_LOG(LS_INFO) << ToString()
                     << ": Have a pingable connection for the first time; "
                        "starting to ping.";
    invoker_.AsyncInvoke<void>(
        RTC_FROM_HERE, thread(),
        rtc::Bind(&P2PTransportChannel::CheckAndPing, this));
    regathering_controller_->Start();
    started_pinging_ = true;
  }
}

关于 P2PTransportChannel::MaybeStartPinging() 函数在后文详述。

step 11, 收到响应

收到响应后,会触发 Connection::OnConnectionRequestResponse() 函数,此函数最终会触发 P2PTransportChannel::OnConnectionStateChange() 函数,接着通过调用 P2PTransportChannel::SortConnectionsAndUpdateState() 函数,判断是否选择此 connection 作为最终的 ice 连接,关于此逻辑下文详述。
到此发起连接过程完毕。

4. 连通性检查和心跳

上一节中提到,在 P2PTransportChannel::SortConnectionsAndUpdateState() 函数中,会通过调用 P2PTransportChannel::MaybeStartPinging() 函数开始对 connection 发起连通性检查,本节主要来看看这块逻辑。

4.1 Connection 对象的状态

在 Connection 类中,为每个 connection 定义了几个写状态:

  enum WriteState {
    STATE_WRITABLE = 0,          // we have received ping responses recently
    STATE_WRITE_UNRELIABLE = 1,  // we have had a few ping failures
    STATE_WRITE_INIT = 2,        // we have yet to receive a ping response
    STATE_WRITE_TIMEOUT = 3,     // we have had a large number of ping failures
  };

初始时为 STATE_WRITE_INIT 状态。
在第一次收到对端的响应后,转为 STATE_WRITABLE 状态。
在一定时间内发请求都没有收到响应,同时请求没收到响应的次数达到一定阈值,转为 STATE_WRITABLE 状态。
当写状态处于 STATE_WRITABLE 状态时,一定时间内都没收到过响应,转为 STATE_WRITE_TIMEOUT 状态。

4.2 ping 的策略

在 P2PTransportChannel::MaybeStartPinging() 函数中,统称连通性检查和心跳为 ping。

4.2.1 启动 ping 的条件

首先通过 BasicIceController::HasPingableConnection() 函数判断是否有可以 ping 的 Connection 对象,这实际上是通过调用 BasicIceController::IsPingable() 函数实现的。
此函数内部有多个条件判断,实际上主要是判断 Connection 是否还有效,如果一个 Connection 一直处于 STATE_WRITE_TIMEOUT 状态一段时间,那么这个 Connection 就无效了。
只有存在可以 ping 的 connection 时,才会启动 ping,否则继续等待启动。
ping 会在一个线程内启动并独立运行,ping 一旦启动后,一般不会停止,也不会再启动一个 ping 线程。

4.2.2 选择一个 Connection 去 ping

不仅 P2PTransportChannel 对象只会启动一个线程去 ping,且每次 ping 时只会从所有的 Connections 中选择一个 Connection 去 ping。
webrtc 这么做的原因是从 NAT 设备的特性出发考虑的,因为 NAT 设备限制了新增网络映射的最小时间间隔,详细参考 rfc8445。
选择一个 connection 去 ping 是在 P2PTransportChannel::CheckAndPing() 函数中通过调用 BasicIceController::SelectConnectionToPing() 函数实现的。此函数再调用 BasicIceController::FindNextPingableConnection() 函数来实现具体选择逻辑。

4.2.3 ping 间隔

在 BasicIceController::SelectConnectionToPing() 函数中,除开选择一个 connection 去 ping,也会返回下一次选择 connection 的间隔:

IceControllerInterface::PingResult BasicIceController::SelectConnectionToPing(
    int64_t last_ping_sent_ms) {
  ...
  bool need_more_pings_at_weak_interval =
      absl::c_any_of(connections_, [](const Connection* conn) {
        return conn->active() &&
               conn->num_pings_sent() < MIN_PINGS_AT_WEAK_PING_INTERVAL;
      });
  int ping_interval = (weak() || need_more_pings_at_weak_interval)
                          ? weak_ping_interval()
                          : strong_ping_interval();

  ...
}

在 webrtc 中,ping 间隔默认为 480ms(strong) 或 48ms(weak)。

4.2.4 nomination 参数

nomination 参数是非常重要的,其代表了 ice controlling 端告诉 ice controlled 端,应该从众多 connections 中,最终选择哪个 connection 发送数据。
选择完 connection 后,接下来一项比较重要的工作就是判断后续的 stun request 是否需要携带 nomination 参数,这主要通过调用 BasicIceController::GetUseCandidateAttr() 函数实现:

bool BasicIceController::GetUseCandidateAttr(const Connection* conn,
                                             NominationMode mode,
                                             IceMode remote_ice_mode) const {
  switch (mode) {
    case NominationMode::REGULAR:
      // TODO(honghaiz): Implement regular nomination.
      return false;
    case NominationMode::AGGRESSIVE:
      if (remote_ice_mode == ICEMODE_LITE) {
        return GetUseCandidateAttr(conn, NominationMode::REGULAR,
                                   remote_ice_mode);
      }
      return true;
    case NominationMode::SEMI_AGGRESSIVE: {
      // Nominate if
      // a) Remote is in FULL ICE AND
      //    a.1) |conn| is the selected connection OR
      //    a.2) there is no selected connection OR
      //    a.3) the selected connection is unwritable OR
      //    a.4) |conn| has higher priority than selected_connection.
      // b) Remote is in LITE ICE AND
      //    b.1) |conn| is the selected_connection AND
      //    b.2) |conn| is writable.
      bool selected = conn == selected_connection_;
      if (remote_ice_mode == ICEMODE_LITE) {
        return selected && conn->writable();
      }
      bool better_than_selected =
          !selected_connection_ || !selected_connection_->writable() ||
          CompareConnectionCandidates(selected_connection_, conn) < 0;
      return selected || better_than_selected;
    }
    default:
      RTC_NOTREACHED();
      return false;
  }
}

selected_connection_ 对象即 ice 模块选择的最终有效连接,初始为空。
webrtc 默认为 SEMI_AGGRESSIVE 协商策略,对于非 ICEMODE_LITE 的对端:

  • 如果 selected_connection_ 对象还为空,那么会当前的 connection 会携带 nomination 标志。
  • 如果 selected_connection_ 对象不为空,那么需要调用 BasicIceController::CompareConnectionCandidates() 函数做进一步对比判断。

4.3 心跳

每个 connection 都会持续不断的发送心跳,无论此 connection 是不是 selected connection。只有一个 connection 被判断为 IceCandidatePairState::FAILED,其才会停止心跳。
在 BasicIceController::FindNextPingableConnection() 函数中:

const Connection* BasicIceController::FindNextPingableConnection() {
  if (selected_connection_ && selected_connection_->connected() &&
      selected_connection_->writable() &&
      WritableConnectionPastPingInterval(selected_connection_, now)) {
    return selected_connection_;
  }
  ...
}

selected_connection_ 需要判断满足 WritableConnectionPastPingInterval(selected_connection_, now) 函数才会选择 selected_connection_ 去 ping。此函数限制了 selected_connection_ ping 至少要间隔 2500ms。
如果没有选择 selected_connection_,那么其他的 connection 就有可能会被预设规则选中,并发起 ping 心跳。

5. 最终连接的选择

在 P2PTransportChannel::SortConnectionsAndUpdateState() 函数中,主要是通过调用 BasicIceController::SortAndSwitchConnection() 函数来实现最终连接的选择。

5.1 connections 排序

在 BasicIceController::SortAndSwitchConnection() 函数中:

IceControllerInterface::SwitchResult
BasicIceController::SortAndSwitchConnection(IceControllerEvent reason) {
  // Find the best alternative connection by sorting.  It is important to note
  // that amongst equal preference, writable connections, this will choose the
  // one whose estimated latency is lowest.  So it is the only one that we
  // need to consider switching to.
  // TODO(honghaiz): Don't sort;  Just use std::max_element in the right places.
  absl::c_stable_sort(
      connections_, [this](const Connection* a, const Connection* b) {
        int cmp = CompareConnections(a, b, absl::nullopt, nullptr);
        if (cmp != 0) {
          return cmp > 0;
        }
        // Otherwise, sort based on latency estimate.
        return a->rtt() < b->rtt();
      });
  ...
}

会通过调用 BasicIceController::CompareConnections() 函数对所有 connections 进行稳定排序。此函数参数为任意两个 Connection,其内部会进行一系列比较判断:

int BasicIceController::CompareConnections(
    const Connection* a,
    const Connection* b,
    absl::optional<int64_t> receiving_unchanged_threshold,
    bool* missed_receiving_unchanged_threshold) const {
  RTC_CHECK(a != nullptr);
  RTC_CHECK(b != nullptr);

  // We prefer to switch to a writable and receiving connection over a
  // non-writable or non-receiving connection, even if the latter has
  // been nominated by the controlling side.
  int state_cmp = CompareConnectionStates(a, b, receiving_unchanged_threshold,
                                          missed_receiving_unchanged_threshold);
  if (state_cmp != 0) {
    return state_cmp;
  }

  if (ice_role_func_() == ICEROLE_CONTROLLED) {
    // Compare the connections based on the nomination states and the last data
    // received time if this is on the controlled side.
    if (a->remote_nomination() > b->remote_nomination()) {
      return a_is_better;
    }
    if (a->remote_nomination() < b->remote_nomination()) {
      return b_is_better;
    }

    if (a->last_data_received() > b->last_data_received()) {
      return a_is_better;
    }
    if (a->last_data_received() < b->last_data_received()) {
      return b_is_better;
    }
  }

  // Compare the network cost and priority.
  return CompareConnectionCandidates(a, b);
}

5.2 连接选择和切换

在上一节中,介绍了 connections 的排序,排序完成后,我们可以得到一个 top connection:

IceControllerInterface::SwitchResult
BasicIceController::SortAndSwitchConnection(IceControllerEvent reason) {
  ...
  const Connection* top_connection =
      (!connections_.empty()) ? connections_[0] : nullptr;
  ...
}

我们知道 BasicIceController 模块维护了一个 selected connection 对象,表示已经选择的一个 connection。
接下来,需要对比 selected connection 和 top connection,判断 top connection 是否可以替换成为一个新的 selected connection。

5.2.1 初始化选择 top connection

在 selected connection 为空的情况下,只要 top connection 有效,那么就会将 top connection 设置为新的 selected connection,这个逻辑比较清晰简单。

5.2.2 切换 selected connection

在 selected connection 不为空的情况下,需要在 BasicIceController::ShouldSwitchConnection() 函数中比较 selected connection 和 top connection,如果 top connection 更优,那么就替换现有的 selected connection:

IceControllerInterface::SwitchResult BasicIceController::ShouldSwitchConnection(
    IceControllerEvent reason,
    const Connection* new_connection) {
  if (!ReadyToSend(new_connection) || selected_connection_ == new_connection) {
    return {absl::nullopt, absl::nullopt};
  }

  if (selected_connection_ == nullptr) {
    return HandleInitialSelectDampening(reason, new_connection);
  }

  // Do not switch to a connection that is not receiving if it is not on a
  // preferred network or it has higher cost because it may be just spuriously
  // better.
  int compare_a_b_by_networks = CompareCandidatePairNetworks(
      new_connection, selected_connection_, config_.network_preference);
  if (compare_a_b_by_networks == b_is_better && !new_connection->receiving()) {
    return {absl::nullopt, absl::nullopt};
  }

  bool missed_receiving_unchanged_threshold = false;
  absl::optional<int64_t> receiving_unchanged_threshold(
      rtc::TimeMillis() - config_.receiving_switching_delay_or_default());
  int cmp = CompareConnections(selected_connection_, new_connection,
                               receiving_unchanged_threshold,
                               &missed_receiving_unchanged_threshold);

  absl::optional<IceControllerEvent> recheck_event;
  if (missed_receiving_unchanged_threshold &&
      config_.receiving_switching_delay_or_default()) {
    // If we do not switch to the connection because it missed the receiving
    // threshold, the new connection is in a better receiving state than the
    // currently selected connection. So we need to re-check whether it needs
    // to be switched at a later time.
    recheck_event = reason;
    recheck_event->recheck_delay_ms =
        config_.receiving_switching_delay_or_default();
  }

  if (cmp < 0) {
    return {new_connection, absl::nullopt};
  } else if (cmp > 0) {
    return {absl::nullopt, recheck_event};
  }

  // If everything else is the same, switch only if rtt has improved by
  // a margin.
  if (new_connection->rtt() <= selected_connection_->rtt() - kMinImprovement) {
    return {new_connection, absl::nullopt};
  }

  return {absl::nullopt, recheck_event};
}

可以看到,两个 connection 的比较是多方面的,逻辑也较为复杂。

6. ICE 网络变化检测与处理

6.1 本机网络情况检测

前面介绍了 BasicNetworkManager 类负责获取本机网卡信息,实际上在 BasicNetworkManager::UpdateNetworksContinually() 函数中,会周期性的获取网络信息:

void BasicNetworkManager::UpdateNetworksContinually() {
  UpdateNetworksOnce();
  thread_->PostDelayed(RTC_FROM_HERE, kNetworksUpdateIntervalMs, this,
                       kUpdateNetworksMessage);
}

kNetworksUpdateIntervalMs 变量规定了周期间隔为 2000ms,即 rtc 会每 2000ms 获取一次本机网络信息。
每次获取完网络信息后,会在 BasicNetworkManager::UpdateNetworksOnce() 函数中与上一次获取的网络信息做对比,如果发生改变,则通知上层逻辑。

6.2 网络变化后的处理

在 BasicNetworkManager 类检测到网络变化后,会通知给 BasicPortAllocatorSession::OnNetworksChanged() 函数,此函数非常重要,可以看作网络变化处理逻辑的入口函数:

void BasicPortAllocatorSession::OnNetworksChanged() {
  RTC_DCHECK_RUN_ON(network_thread_);
  std::vector<rtc::Network*> networks = GetNetworks();
  std::vector<rtc::Network*> failed_networks;
  for (AllocationSequence* sequence : sequences_) {
    // Mark the sequence as "network failed" if its network is not in
    // |networks|.
    if (!sequence->network_failed() &&
        !absl::c_linear_search(networks, sequence->network())) {
      sequence->OnNetworkFailed();
      failed_networks.push_back(sequence->network());
    }
  }
  std::vector<PortData*> ports_to_prune = GetUnprunedPorts(failed_networks);
  if (!ports_to_prune.empty()) {
    RTC_LOG(LS_INFO) << "Prune " << ports_to_prune.size()
                     << " ports because their networks were gone";
    PrunePortsAndRemoveCandidates(ports_to_prune);
  }

  if (allocation_started_ && !IsStopped()) {
    if (network_manager_started_) {
      // If the network manager has started, it must be regathering.
      SignalIceRegathering(this, IceRegatheringReason::NETWORK_CHANGE);
    }
    bool disable_equivalent_phases = true;
    DoAllocate(disable_equivalent_phases);
  }
  ...
}

step 1, 通过所有 AllocationSequence 对象判断失败的网络
即每个 AllocationSequence 对象创建的时候已经绑定了一个 Network 对象,如果不匹配现存的某个 Netwroks 对象,那么其绑定的 Network 对象就被判断为无效了。

step 2, 根据失败的网络通知移除 local candidates

step 3, 调用 BasicPortAllocatorSession::DoAllocate() 函数在新增的 Network 对象上开启 local candidates 收集过程

7. ICE 已有的重连逻辑

webrtc ice 模块本身就支持断线重连,本节主要介绍其处理逻辑。
为表述方便,下文简称移动网络为 4G。

7.1 4G、WIFI 共存

移动端如果同时开启 4G 和 WIFI 开关,那么 ice 将起码检测到两个有效的 Network 对象,对于每个 Network 对象,都将开启连接检测过程。
一般来说先完成 ice 连通性检测的网络会被选为 selected connection,但是 wifi 网络具有更高优先级(priority 值)。如果 wifi 也能完成连通性检测,将在 BasicIceController::ShouldSwitchConnection() 函数中 selected connection 被替换成 wifi 网络。

7.2 4G 与 WIFI 互切换

如果一开始用户只打开 4G 网络,等待 ice 连通性检测完成后,selected connection 对应的网络即为 4G 网络。
等待用户关闭 4G,打开 WIFI 开关,在 BasicPortAllocatorSession::OnNetworksChanged() 函数中,通过调用 BasicPortAllocatorSession::DoAllocate() 函数启动对 WIFI 网络的地址收集与连通性检测过程。等待 WIFI 网络连通完成后,selected connection 顺利切换到 WIFI 网络。反之同理。

7.3 4G、WIFI 共存时关闭 WIFI

如果当前 selected connection 是 WIFI 网络,用户关闭 WIFI 网络时,selected connection 继续存在但是已经不可用。
根据前面的讨论,4G 网络实际上也一直在进行心跳保持,只是没有启用 nomination 参数。在 4G 网络对应的 connection 在 BasicIceController::SelectConnectionToPing() 函数被选中即将发起 ping,同时在 BasicIceController::GetUseCandidateAttr() 函数被决定赋予 nomination 参数并成功收到响应后,selected connection 顺利切换到 4G 网络。

7.3 开关 4G/WIFI 网络

如果当前用户打开了 4G 网络,selected connection 有效后关闭了 4G 网络,对应的 AllocationSequence/Connection 依然有效,会继续尝试 ping。当用户恢复 4G 网络后,ping 成功收到 response,网络恢复正常。
如果 ping 一直失败,connection 在 Connection::UpdateState() 函数被判断为 IceCandidatePairState::FAILED,那么整个 ice 很大可能都将被判断为失败,并回调通知应用层。

8. ICE 重连优化

本节主要介绍 ice 重连逻辑存在的一些可优化点。

8.1 存在的优化点

8.1.1 网络状态检测

前面介绍了,在 BasicNetworkManager 对象中,会每隔 2000ms 做一次网络状态检测。这种周期检测不是很及时。

8.1.2 selected connection 切换不及时

前面介绍了 ice 已有的重连逻辑,但是实际上发现如果 selected connection 有效,但是其对应的 Network 已经无效时,切换 selected connection 为新的 Connection 不是很及时。主要原因可能有几个:

  • BasicIceController::SelectConnectionToPing() 函数没有及时挑选潜在 Connection 去 ping
  • BasicIceController::GetUseCandidateAttr()() 函数没有及时将新的 connection 设置 nomination 标志
  • BasicIceController::SortAndSwitchConnection() 函数没有及时切换新的 selected connection

8.2 优化工作

8.2.1 网络状态检测

我们可以依靠上层业务相关函数,实时检测网络状态,当发现网络发生变化时,及时通知给 ice 模块。

8.2.2 标记 Connection 对象 Network 不可用状态

在 BasicPortAllocatorSession::OnNetworksChanged() 函数被触发时,将得到的 failed networks 通知回调给 P2PTransportChannel 对象,其维护了所有的 Connections 对象。
对比 Connection 对象绑定的 Network 对象与 failed Networks 对象,将相应的 Connection 对象进行标记:

// @note(hexin) 通过 p2p/client/basic_port_allocator.cc 模块标记 connections 对应的 network 是否有效
void P2PTransportChannel::OnNetworksChanged(
    PortAllocatorSession* session,
    std::vector<rtc::Network*>& failed_networks,
    std::vector<rtc::Network*>& available_networks) {
  RTC_DCHECK_RUN_ON(network_thread_);
  RTC_LOG(LS_INFO) << "OnNetworksChanged failed networks size " 
      << failed_networks.size() << ", available networks size "
      << available_networks.size();

  for (Connection* connection : connections()) {
    if (absl::c_linear_search(failed_networks, connection->network())) {
      connection->set_network_available(false);
      RTC_LOG(LS_INFO) << "mark connection network not available";
      continue;
    }

    if (absl::c_linear_search(available_networks, connection->network())) {
      connection->set_network_available(true);
    }
  }
}

Connection::set_network_available() 函数中对网络状态进行标记。

8.2.3 加快 selected connection 切换

在上面介绍的几个影响切换的函数中,都添加检测 Connection 对象的 Network 可用状态,例如:

bool BasicIceController::GetUseCandidateAttr(const Connection* conn,
                                             NominationMode mode,
                                             IceMode remote_ice_mode) const {
  switch (mode) {
    ...
    case NominationMode::SEMI_AGGRESSIVE: {
      bool better_than_selected =
          !selected_connection_ || !selected_connection_->writable() ||
          !selected_connection_->get_network_available() || 
          CompareConnectionCandidates(selected_connection_, conn) < 0;
      return selected || better_than_selected;
    }
    ...
  }
}

Connection::get_network_available() 函数中获取网络状态标记。

posted @ 2024-04-30 15:45  小夕nike  阅读(33)  评论(0编辑  收藏  举报