Devs release bug fix for crashed ETH 2.0 tesnet

  • The final Ethereum 2.0 multi-client testnet experienced a bug that led to unsynchronized nodes and almost crashed the network. The cause was the Prysm client.
  • Prysmatic Labs have released an update to fix the bug, known as Alpha.22.

After its successful launch, the Ethereum “Medalla” testnet encountered its first difficulties. Theoretically, Medalla is supposed to be the last multi-client testnet to be launched before phase 0 of Ethereum 2.0. However, a bug in the synchronization of the nodes is currently raising concerns in the community.

Medalla was launched on 4 August this year and works with five clients: Teku from ConsenSys, Prysm from Prysmatic Labs, Nimbus from Status, Lodestar from ChainSafe Systems and Lighthouse from Sigma Prime. At the start, it already counted 20,753 validators and 664,096 staked ETH. Medalla’s goal is to test the stability of the Ethereum 2.0 beacon chain under real conditions. It is therefore a kind of “dress rehearsal” before the deployment of phase 0, as the core developer of the Ethereum, Danny Ryan explained.

Ethereum 2.0 testnet falls to synchronization bug

The bug was reported on August 14 by Terence Tsao, a member of the Prysmatic Labs team. In his report, he stated that Prysmatic Labs’ Prysm client presented a synchronization problem with the roughtime clock, which was incorrectly set 4 hours in advance. According to a later report the single point of failure came from Cloudflare. In summary the following occurred:

The cloudflare roughtime servers all returned wrong information, and Prysm nodes did not properly fallback from this situation. This bug caused all Prysm nodes to exhibit clock skew. Because of this clock skew, validators incorrectly proposed blocks and attestations for future slots.

This directly impacted the function of the validators proposing invalid blocks “from the future”. All Prysm clients were affected from 5:30pm to 6:45pm UTC on August 14th. Prysmatic Labs co-founder, Preston Van Loon, reported that the number of people on the testnet who were successfully validating blocks dropped from 75% to nearly 5%. In the aftermath, Prysmatic Labs decided to implement the following measures:

We decided to disable the roughtime clock sync by default and replace it with an opt-in feature flag. This prevents the same type of issue to happen on a global scale and now roughtime results are reported as an FYI rather than an automatic clock adjustment.

These initial measures were implemented with a subsequent update to the Prysmatic Labs client. Two days after the bug was reported, the Prysmatic team released alpha.22, an update that “offers many fixes toward synch chain”. The Prysmatic team hopes that alpha.22 will serve as a “Medalla recovery”. In that sense, they asked for help from the community:

We need all the help we can get to get the testnet back on track and updating your nodes is a great way to add more healthy peers to the network. Once there’s good amount of healthy nodes, it should be a matter of time before validators can increase the participation rate.

The rapid reaction of the Prysmatic Labs team prevented a total collapse of the testnet. However, criticism has come from the cryptos community. However, it remains to be seen if the release of phase 0 of Ethereum 2.0 will be affected by the latest developments.