为什么BFT系统最多容忍1/3故障节点

参考文章：

https://medium.com/codechain/why-n-3f-1-in-the-byzantine-fault-tolerance-system-c3ca6bab8fe9

If we have two systems that are BFT, then obviously the one that can handle more faulty nodes is the superior system. However there is no system that assumes a faulty node of greater than ⅓ of the entire system. This is because theoretically, ⅓ is the greatest amount of faulty nodes that the system can handle.

There are 2 possible problems that can arise when a node undergoes byzantine failure. The first problem is not sending a message at all. More specifically, within a distributed system, in order for N nodes to function properly while having f nodes suffering from byzantine failure, a consensus has to be reached with N — f messages. Simply put, N — f nodes are required for quorum.

The second problem is when a node in byzantine failure maliciously sends different messages. In an extreme scenario, let’s say that among the N — f nodes that achieved quorum, f were sent by byzantine failure. Even in this case, the system has to operate normally, and thus (N — f) — f messages must be greater than f messages(sent by nodes suffering from byzantine failure).

In order to resolve the two problems above, (N — f) — f > f. N > 3f, which means when there are f nodes that has a byzantine failure, there has to be more than 3 f nodes in order for the system to be byzantine fault tolerant. The smallest N value here is 3f + 1. Thus, in a system that is made up of 3f + 1 nodes, the greatest amount of faulty nodes that can exist is f .

一、拜占庭问题的核心挑战

在分布式系统中，拜占庭故障（Byzantine Fault） 指节点可能以任意方式失效：不响应、发送错误信息、对不同节点发送矛盾信息（即“叛徒行为”）。Leslie Lamport 在1982年提出的“拜占庭将军问题”证明：当叛徒节点数 ≥ 1/3 时，无法保证忠诚节点达成一致 15。
关键原因：恶意节点可通过分裂投票或伪造消息，阻止诚实节点形成多数共识。

🔢 二、`N ≥ 3f + 1` 的数学推导

该公式是BFT系统的最低节点要求，源于对两种故障场景的严苛分析：

场景1：故障节点不响应消息（Quorum最小响应数）

系统需在部分节点无响应时仍能推进共识，因此必须满足：
决策所需最小响应数 = N - f
（即忽略f个故障节点，等待剩余N-f个节点的响应）36。

场景2：故障节点发送恶意消息（防止伪造多数）

在最坏情况下，响应消息的 N-f 个节点中可能包含 f个恶意节点，它们伪造不同结果。
为确保诚实节点仍占多数，必须满足：
诚实节点数 > 恶意节点数
即：
text
```
(N - f) - f > f  
→ N - 2f > f  
→ N > 3f
```
因此最小整数解为 N ≥ 3f + 1 3610。

合并约束：安全性与活性平衡

安全性（Safety）：所有诚实节点输出相同结果（需 N > 3f）。
活性（Liveness）：系统能在有限时间内达成共识（需 N - f > f，即诚实节点数 > 恶意节点数）。
N = 3f + 1 是同时满足两者的最优解 38。

💻 三、现实系统中的应用案例

1. PBFT（实用拜占庭容错算法）

采用三阶段协议（Pre-Prepare, Prepare, Commit），每个阶段需收集 2f+1 个相同消息以达成共识36。
示例：当 N=4（f=1）时：
- 若主节点诚实，系统正常推进；
- 若1个节点恶意（如发送矛盾消息），剩余3个诚实节点仍可形成多数（3 > 1）110。

2. BSC（币安智能链）的21节点模型

总节点数 N=21 → 最大容错数 f = floor((21-1)/3) = 6。
共识需至少 15 票（2f+1=13，实际取15以保证鲁棒性），确保即使6个节点作恶，诚实节点仍占绝对多数（15 > 6）8。

3. NEO的DBFT（委托拜占庭容错）

超级节点数 N=7 → f=2（容错上限为2个恶意节点）。
提案需 (2*7+1)/3 ≈ 5 个节点同意，一旦通过即不可逆，避免分叉2。

⚠️ 四、为什么无法容忍超过1/3的故障节点？

若恶意节点数 f' > f（即 f' ≥ f+1），则系统必然崩溃：

消息层面：恶意节点可控制 f+1 个响应，伪造多个“合法”结果，使诚实节点无法区分真相510。
共识层面：在投票场景中，恶意节点数超过 f 时，可能同时满足：
- 向部分节点发送“同意”
- 向另一部分发送“反对”
  导致网络分裂成两个互斥的多数派（如 4 节点系统中，2个恶意节点可制造 2:2 平局）110。

✅ 关键结论：N ≥ 3f + 1 是拜占庭系统的理论极限，突破此限制将破坏分布式共识的确定性39。

💎 五、扩展：BFT vs. 非拜占庭容错（CFT）

特性	拜占庭容错（BFT）	非拜占庭容错（CFT）
容忍故障类型	节点作恶、伪造消息	仅节点崩溃、网络中断
节点要求	N ≥ 3f + 1	N ≥ 2f + 1（如Raft）
容错率上限	≤ 33%	≤ 50%
适用场景	公有链/开放网络（如BSC）	私有链/可信环境（如ZooKeeper）

结语

N = 3f + 1 是BFT系统的数学基石，由Lamport的拜占庭将军问题严格证明，并通过PBFT、DBFT等算法工程化实现。这一公式的本质是：

在背叛的阴影下，信任需以三重冗余为代价 —— 每容忍1个作恶者，至少需3个节点提供“信任担保”139。

当前区块链系统（如BSC、NEO）均遵循此定律，未来若突破1/3限制，需依赖同步网络假设或混合共识模型（如Vitalik的99%容错方案），但这将牺牲去中心化或延迟9。

posted @ 2025-08-12 09:38 若-飞阅读(34) 评论(0) 收藏举报

刷新页面返回顶部

若-飞