A Byzantine fault is any fault presenting different symptoms to different observers. A Byzantine failure is the loss of a system service due to a Byzantine fault in systems that require consensus.
The objective of Byzantine fault tolerance is to be able to defend against failures of system components with or without symptoms that prevent other components of the system from reaching an agreement among themselves, where such an agreement is needed for the correct operation of the system.
Byzantine refers to the Byzantine Generals’ Problem, in which a group of generals, each commanding a portion of the Byzantine army, encircle a city. These generals wish to formulate a plan for attacking the city. In its simplest form, the generals must decide only whether to attack or retreat. Some generals may prefer to attack, while others prefer to retreat. The important thing is that every general agree on a common decision, for a halfhearted attack by a few generals would become a rout, and would be worse than either a coordinated attack or a coordinated retreat.
The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy, they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers). The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.
The typical mapping of this story onto computer systems is that the computers are the generals and their digital communication system links are the messengers.
Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a majority agreement on their strategy. There can be a default vote value given to missing messages. For example, missing messages can be given the value . Further, if the agreement is that the votes are in the majority, a pre-assigned default strategy can be used (e.g., retreat).
Byzantine fault tolerance is the dependability of a fault-tolerant computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. In a “Byzantine failure”, a component such as a server can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers.
It is difficult for the other components to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which component has failed in the first place. The term is derived from the Byzantine Generals’ Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Byzantine fault tolerance has been also referred to with the phrases interactive consistency or source congruency, error avalanche, Byzantine agreement problem, Byzantine generals problem, and Byzantine failure.