`congestion_control_vegas_process_sendme` 函数解析

简介

该函数实现了基于 Vegas 拥塞控制算法的 SENDME 处理逻辑。其核心目标是通过基于cwnd和BDP计算网络的拥塞情况，动态调整 cwnd 以优化数据传输，避免网络拥塞并提升带宽利用率。

函数目标

/**
 * Process a SENDME and update the congestion window according to the
 * rules specified in TOR_VEGAS of Proposal #324.
 *
 * Essentially, this algorithm attempts to measure queue lengths on
 * the circuit by subtracting the bandwidth-delay-product estimate
 * from the current congestion window.
 *
 * If the congestion window is larger than the bandwidth-delay-product,
 * then data is assumed to be queuing. We reduce the congestion window
 * in that case.
 *
 * If the congestion window is smaller than the bandwidth-delay-product,
 * then there is spare bandwidth capacity on the circuit. We increase the
 * congestion window in that case.
 *
 * The congestion window is updated only once every congestion window worth of
 * packets, even if the signal persists. It is also updated whenever the
 * upstream orcon blocks, or unblocks. This minimizes local client queues.
 */

根据当前的网络状况动态调整拥塞窗口（cwnd）。
通过计算带宽-时延积（BDP）与拥塞窗口（cwnd）的关系，判断是否需要增加或减少 cwnd。
在不同阶段（如慢启动、稳态等），采取不同的调整策略。

详细代码解析

1. 入口检查

1 2	tor_assert(cc && cc->cc_alg == CC_ALG_VEGAS); tor_assert(circ);

确保 cc（拥塞控制结构）和电路 circ 不为空。
确保使用的是 Vegas 拥塞控制算法。

2. 更新事件计数器

if (cc->next_cc_event)
  cc->next_cc_event--;

if (cc->next_cwnd_event)
  cc->next_cwnd_event--;

next_cc_event 和 next_cwnd_event 控制拥塞窗口的更新频率。
事件计数器减少，表示下一个可以进行事件处理的时刻。

3. 更新电路估算值

if (!congestion_control_update_circuit_estimates(cc, circ)) {
  cc->inflight = cc->inflight - cc->sendme_inc;
  return 0;
}

congestion_control_update_circuit_estimates(cc, circ) 更新电路的带宽和时延估算。
如果更新失败，则减少 inflight 数量，并返回。

/**
 * Upon receipt of a SENDME, pop the oldest timestamp off the timestamp
 * list, and use this to update RTT.
 *
 * Returns true if circuit estimates were successfully updated, false
 * otherwise.
 */
bool
congestion_control_update_circuit_estimates(congestion_control_t *cc,
                                            const circuit_t *circ)
{
  uint64_t now_usec = monotime_absolute_usec();

  /* Update RTT first, then BDP. BDP needs fresh RTT */
  uint64_t curr_rtt_usec = congestion_control_update_circuit_rtt(cc, now_usec);
  return congestion_control_update_circuit_bdp(cc, circ, curr_rtt_usec);
}

先更新当前的rtt
用更新后的rtt来更新BDP

/**
 * Called when we get a SENDME. Updates circuit RTT by pulling off a
 * timestamp of when we sent the CIRCWINDOW_INCREMENT-th cell from
 * the queue of such timestamps, and comparing that to current time.
 *
 * Also updates min, max, and EWMA of RTT.
 *
 * Returns the current circuit RTT in usecs, or 0 if it could not be
 * measured (due to clock jump, stall, etc).
 */
STATIC uint64_t
congestion_control_update_circuit_rtt(congestion_control_t *cc,
                                      uint64_t now_usec)
{
  uint64_t rtt, ewma_cnt;
  uint64_t sent_at_timestamp;

  tor_assert(cc);

  /* Get the time that we sent the cell that resulted in the other
   * end sending this sendme. Use this to calculate RTT */
  sent_at_timestamp = dequeue_timestamp(cc->sendme_pending_timestamps);

  rtt = now_usec - sent_at_timestamp;

  /* Do not update RTT at all if it looks fishy */
  if (time_delta_stalled_or_jumped(cc, cc->ewma_rtt_usec, rtt)) {
    num_clock_stalls++; /* Accounting */
    return 0;
  }

  ewma_cnt = n_ewma_count(cc);

  cc->ewma_rtt_usec = n_count_ewma(rtt, cc->ewma_rtt_usec, ewma_cnt);

  if (rtt > cc->max_rtt_usec) {
    cc->max_rtt_usec = rtt;
  }

  if (cc->min_rtt_usec == 0) {
    // If we do not have a min_rtt yet, use current ewma
    cc->min_rtt_usec = cc->ewma_rtt_usec;
  } else if (cc->cwnd == cc->cwnd_min && !cc->in_slow_start) {
    // Raise min rtt if cwnd hit cwnd_min. This gets us out of a wedge state
    // if we hit cwnd_min due to an abnormally low rtt.
    uint64_t new_rtt = percent_max_mix(cc->ewma_rtt_usec, cc->min_rtt_usec,
                                       rtt_reset_pct);

    static ratelim_t rtt_notice_limit = RATELIM_INIT(300);
    log_fn_ratelim(&rtt_notice_limit, LOG_NOTICE, LD_CIRC,
            "Resetting circ RTT from %"PRIu64" to %"PRIu64" due to low cwnd",
            cc->min_rtt_usec/1000, new_rtt/1000);

    cc->min_rtt_usec = new_rtt;
    num_rtt_reset++; /* Accounting */
  } else if (cc->ewma_rtt_usec < cc->min_rtt_usec) {
    // Using the EWMA for min instead of current RTT helps average out
    // effects from other conns
    cc->min_rtt_usec = cc->ewma_rtt_usec;
  }

  return rtt;
}

/**
 * Called when we get a SENDME. Updates the bandwidth-delay-product (BDP)
 * estimates of a circuit. Several methods of computing BDP are used,
 * depending on scenario. While some congestion control algorithms only
 * use one of these methods, we update them all because it's quick and easy.
 *
 * - now_usec is the current monotime in usecs.
 * - curr_rtt_usec is the current circuit RTT in usecs. It may be 0 if no
 *   RTT could bemeasured.
 *
 * Returns true if we were able to update BDP, false otherwise.
 */
static bool
congestion_control_update_circuit_bdp(congestion_control_t *cc,
                                      const circuit_t *circ,
                                      uint64_t curr_rtt_usec)
{
  int chan_q = 0;
  unsigned int blocked_on_chan = 0;

  tor_assert(cc);

  if (CIRCUIT_IS_ORIGIN(circ)) {
    /* origin circs use n_chan */
    chan_q = circ->n_chan_cells.n;
    blocked_on_chan = circ->circuit_blocked_on_n_chan;
  } else {
    /* Both onion services and exits use or_circuit and p_chan */
    chan_q = CONST_TO_OR_CIRCUIT(circ)->p_chan_cells.n;
    blocked_on_chan = circ->circuit_blocked_on_p_chan;
  }

  /* If we have no EWMA RTT, it is because monotime has been stalled
   * or messed up the entire time so far. Set our BDP estimates directly
   * to current cwnd */
  if (!cc->ewma_rtt_usec) {
     uint64_t cwnd = cc->cwnd;

     tor_assert_nonfatal(cc->cwnd <= cwnd_max);

     /* If the channel is blocked, keep subtracting off the chan_q
      * until we hit the min cwnd. */
     if (blocked_on_chan) {
       /* Cast is fine because we're less than int32 */
       if (chan_q >= (int64_t)cwnd) {
         log_notice(LD_CIRC,
                    "Clock stall with large chanq: %d %"PRIu64, chan_q, cwnd);
         cwnd = cc->cwnd_min;
       } else {
         cwnd = MAX(cwnd - chan_q, cc->cwnd_min);
       }
       cc->blocked_chan = 1;
     } else {
       cc->blocked_chan = 0;
     }

     cc->bdp = cwnd;

     static ratelim_t dec_notice_limit = RATELIM_INIT(300);
     log_fn_ratelim(&dec_notice_limit, LOG_NOTICE, LD_CIRC,
            "Our clock has been stalled for the entire lifetime of a circuit. "
            "Performance may be sub-optimal.");

     return blocked_on_chan;
  }

4. 计算队列长度

if (vegas_bdp(cc) > cc->cwnd)
  queue_use = 0; // This should not happen anymore..
else
  queue_use = cc->cwnd - vegas_bdp(cc);

vegas_bdp(cc) 计算带宽-时延积（BDP），这是理想的拥塞窗口大小。
queue_use 计算当前 cwnd 超过 BDP 的部分，即网络中的排队数据量。如果 cwnd 小于 BDP，则认为没有排队，queue_use 为 0。

5. 处理慢启动阶段 (Slow Start)

1	if (cc->in_slow_start) {

慢启动阶段：在慢启动阶段，cwnd 会根据网络的排队情况逐步增加。

a. 判断是否增加拥塞窗口

if (queue_use < cc->vegas_params.gamma && !cc->blocked_chan) {
  if (cc->cwnd_full) {
    uint64_t inc = rfc3742_ss_inc(cc);
    cc->cwnd += inc;

如果 queue_use 小于 gamma 且通道没有被阻塞，则继续增加 cwnd。
使用 rfc3742_ss_inc(cc) 来获取慢启动阶段的增量。

/**
 * Implements RFC3742: Limited Slow Start.
 * https://datatracker.ietf.org/doc/html/rfc3742#section-2
 */
static inline uint64_t
rfc3742_ss_inc(const congestion_control_t *cc)
{
  if (cc->cwnd <= cc->vegas_params.ss_cwnd_cap) {
    /* If less than the cap, round and always grow by at least 1 sendme_inc. */
    return ((uint64_t)cc->cwnd_inc_pct_ss*cc->sendme_inc + 50)/100;
  } else {
    // K = int(cwnd/(0.5 max_ssthresh));
    //  => K = 2*cwnd/max_ssthresh
    // cwnd += int(MSS/K);
    //  => cwnd += MSS*max_ssthresh/(2*cwnd)
    // Return at least 1 for inc.
    return MAX(
            ((uint64_t)cc->sendme_inc*cc->vegas_params.ss_cwnd_cap + cc->cwnd)/
                 (2*cc->cwnd),
               1);
  }
}

b. 如果达到 `cwnd` 最大值，退出慢启动

if (cc->cwnd >= cc->vegas_params.ss_cwnd_max) {
  cc->cwnd = cc->vegas_params.ss_cwnd_max;
  congestion_control_vegas_exit_slow_start(circ, cc);
}

如果 cwnd 达到慢启动的最大值（ss_cwnd_max），则将 cwnd 限制为最大值，并退出慢启动阶段。

c. 如果队列已满，则减少 `cwnd`

else {
  uint64_t old_cwnd = cc->cwnd;
  cc->cwnd = vegas_bdp(cc) + cc->vegas_params.gamma;
}

如果 queue_use 超过 gamma，说明出现了网络拥塞，cwnd 被设置为 BDP + gamma，从而减缓网络的拥塞。

6. 进入稳态阶段 (Congestion Avoidance)

1	} else if (cc->next_cc_event == 0) {

稳态阶段：当 next_cc_event == 0 时，表示可以更新 cwnd，即进入正常的拥塞控制阶段。

a. 队列长度大于 `delta`，减少 `cwnd`

1
2
3

if (queue_use > cc->vegas_params.delta) {
  cc->cwnd = vegas_bdp(cc) + cc->vegas_params.delta - CWND_INC(cc);
}

如果 queue_use 大于 delta，说明网络发生了较严重的拥塞，cwnd 被减少到 BDP + delta - CWND_INC(cc)。

1	#define CWND_INC(cc) ((cc)->cwnd_inc)

struct vegas_params_t {
    /** The slow-start cwnd cap for RFC3742 */
    uint32_t ss_cwnd_cap;
    /** The maximum slow-start cwnd */
    uint32_t ss_cwnd_max;
    /** The queue use allowed before we exit slow start */
    uint16_t gamma;
    /** The queue use below which we increment cwnd */
    uint16_t alpha;
    /** The queue use above which we decrement cwnd */
    uint16_t beta;
    /** The queue use at which we cap cwnd in steady state */
    uint16_t delta;
    /** Weighted average (percent) between cwnd estimator and
     * piecewise estimator. */
    uint8_t bdp_mix_pct;
};

b. 队列长度大于 `beta` 或者通道被阻塞，减小 `cwnd`

1
2
3

else if (queue_use > cc->vegas_params.beta || cc->blocked_chan) {
  cc->cwnd -= CWND_INC(cc);
}

如果 queue_use 大于 beta 或者通道被阻塞（blocked_chan 为真），则减少 cwnd 以缓解拥塞。

c. 队列长度小于 `alpha` 且 `cwnd` 已满，增大 `cwnd`

1
2
3

else if (cc->cwnd_full && queue_use < cc->vegas_params.alpha) {
  cc->cwnd += CWND_INC(cc);
}

如果 cwnd 已满且 queue_use 小于 alpha，则增加 cwnd，以利用未充分使用的带宽。

7. 其他处理

a. 确保 `cwnd` 不小于最小值

1	cc->cwnd = MAX(cc->cwnd, cc->cwnd_min);

确保 cwnd 不低于最小值 cwnd_min，避免 cwnd 降得过低。

b. 记录日志

1	congestion_control_vegas_log(circ, cc);

记录 cwnd 的变动日志，以便调试和分析。

c. 更新事件计数器

if (cc->next_cwnd_event == 0) {
  cc->next_cwnd_event = SENDME_PER_CWND(cc);
}
if (cc->next_cc_event == 0) {
  cc->next_cc_event = CWND_UPDATE_RATE(cc);
}

事件计数器被重置，以便在下次更新时再次进行控制。

d. 重置 `cwnd_full` 状态

1 2	if (cwnd_full_reset(cc)) cc->cwnd_full = 0;

如果发生了 cwnd_full 重置，标记 cwnd_full 为 0。

e. 更新正在传输的数据包数量

1	cc->inflight = cc->inflight - cc->sendme_inc;

更新 inflight，即正在传输的数据包数量。

总结：`cwnd` 的变化过程

1. 慢启动阶段

增加 cwnd：当队列长度小于 gamma 且通道没有被阻塞时，cwnd 增加。
退出慢启动：当 cwnd 达到最大值（ss_cwnd_max）时，退出慢启动。
减少 cwnd：当队列长度超过 gamma 时，cwnd 减少。

2. 稳态阶段

减少 cwnd：当队列长度超过 delta 时，cwnd 减少到 BDP + delta - CWND_INC(cc)。
轻度减少 cwnd：当队列长度超过 beta 或通道被阻塞时，cwnd 减少。
增加 cwnd：当队列长度小于 alpha 且 cwnd 已满时，cwnd 增加。

3. 最小值保护

cwnd 不会小于最小值 cwnd_min，避免窗口过小导致网络不稳定。

TimeHouse

Tor的Vegas机制

`congestion_control_vegas_process_sendme` 函数解析

简介

函数目标

详细代码解析

1. 入口检查

2. 更新事件计数器

3. 更新电路估算值

4. 计算队列长度

5. 处理慢启动阶段 (Slow Start)

a. 判断是否增加拥塞窗口

b. 如果达到 `cwnd` 最大值，退出慢启动

c. 如果队列已满，则减少 `cwnd`

6. 进入稳态阶段 (Congestion Avoidance)

a. 队列长度大于 `delta`，减少 `cwnd`

b. 队列长度大于 `beta` 或者通道被阻塞，减小 `cwnd`

c. 队列长度小于 `alpha` 且 `cwnd` 已满，增大 `cwnd`

7. 其他处理

a. 确保 `cwnd` 不小于最小值

b. 记录日志

c. 更新事件计数器

d. 重置 `cwnd_full` 状态

e. 更新正在传输的数据包数量

总结：`cwnd` 的变化过程

1. 慢启动阶段

2. 稳态阶段

3. 最小值保护

congestion_control_vegas_process_sendme 函数解析

简介

函数目标

详细代码解析

1. 入口检查

2. 更新事件计数器

3. 更新电路估算值

4. 计算队列长度

5. 处理慢启动阶段 (Slow Start)

a. 判断是否增加拥塞窗口

b. 如果达到 cwnd 最大值，退出慢启动

c. 如果队列已满，则减少 cwnd

6. 进入稳态阶段 (Congestion Avoidance)

a. 队列长度大于 delta，减少 cwnd

b. 队列长度大于 beta 或者通道被阻塞，减小 cwnd

c. 队列长度小于 alpha 且 cwnd 已满，增大 cwnd

7. 其他处理

a. 确保 cwnd 不小于最小值

b. 记录日志

c. 更新事件计数器

d. 重置 cwnd_full 状态

e. 更新正在传输的数据包数量

总结：cwnd 的变化过程

1. 慢启动阶段

2. 稳态阶段

3. 最小值保护

`congestion_control_vegas_process_sendme` 函数解析

b. 如果达到 `cwnd` 最大值，退出慢启动

c. 如果队列已满，则减少 `cwnd`

a. 队列长度大于 `delta`，减少 `cwnd`

b. 队列长度大于 `beta` 或者通道被阻塞，减小 `cwnd`

c. 队列长度小于 `alpha` 且 `cwnd` 已满，增大 `cwnd`

a. 确保 `cwnd` 不小于最小值

d. 重置 `cwnd_full` 状态

总结：`cwnd` 的变化过程