Ethereum Mainnet Shadow Forking: An Overview

It refers to using data from a testnet or the mainnet to test sync assumptions for a network upgrade, so that developers can test features before deploying the actual upgrade to the mainnet.

Ethereum Mainnet Shadow Forking: An Overview

The transition of Ethereum from Proof-of-Work (PoW) to Proof-of-Stake (PoS) consensus is getting closer, with developers launching the first mainnet shadow fork on April 11th. The goal is to put existing assumptions about testnets and the mainnet to the test. This was one of the most important historical events in the Merge upgrade. In this post, we will covere every shadow fork on Goerli testnet and Ethereum Mainnet. Thus the post will be updated with the latest information.

We have covered all mainnet shadow forks till MSF 13 and all goerli testnet shadow forks till GTSF 6.

This article is a live document that we will update with each & every shadow fork.

Table of Content

Merge

What is Shadow Forking?

It refers to using data from a testnet or the mainnet to test sync assumptions for a network upgrade so that developers can test features before deploying the actual upgrade to the mainnet.

A shadow fork is a test limited to a few weeks. Here, Developers fork an existing testnet, take its config, and add merge related fields such as Total Terminal Difficulty (TTD) and Merge Fork Block (for peering, forkID changes). Then, they have to spin up a new beacon chain for testing purposes. The new fork essentially inherits the state/transactions of the canonical testnet. Inheriting the state of existing testnets allows stressing test sync assumptions and assumptions around how long it takes to build a block/timeouts. The exciting part of staying connected to the peers on the canonical chain allows the team to import some of their transaction gossip as well.

“A shadow fork does not affect the canonical chain in any meaningful way. Since we reuse the chainID and the gossip channels are still connected, transactions submitted to the shadow fork could be included in the main chain as well. Proceed with extreme caution!” said Parithosh Jayanthi.

Alternatively, A shadow fork is a new devnet created by forking a live network with a small number of nodes. The shadow fork keeps the same state & history and can therefore replay transactions from the main network.

Here, a small number of nodes are configured to fork off from the network at a certain point. Developers do this by launching nodes that are set to run through The Merge at an earlier point than the entire network. This allows developers to test how the upgrade would have happened in similar conditions to the shadow-forked network without the vast majority of the nodes being aware this has happened.

A shadow fork can transition to proof of stake after reaching TTD while still having mainnet (testnet or Ethereum Mainnet) transactions and other history/state information. Then, after a successful merge on the shadow fork, it continues on the proof of stake chain, ignoring any blocks added using proof of work on the mainnet.

Timeline

January:

  • 21st January: Goerli Testnet Shadow Fork 1

March:

  • 25th March: Goerli Testnet Shadow Fork 2

April:

  • 4th April: Goerli Testnet Shadow Fork 3
  • 11th April: Mainnet Shadow Fork 1
  • 19th April: Goerli Testnet Shadow Fork 4
  • 23rd April: Mainnet Shadow Fork 2

May:

  • 5th May: Mainnet Shadow Fork 3
  • 12th May: Mainnet Shadow Fork 4
  • 19th May: Mainnet Shadow Fork 5
  • 31st May: Mainnet Shadow Fork 6

June:

  • 22nd June: Mainnet Shadow Fork 7

July:

  • 5th July: Mainnet Shadow Fork 8
  • 14th July: Mainnet Shadow Fork 9
  • 21st July: Goerli Testnet Shadow Fork 5
  • 26th July: Mainnet Shadow Fork 10

August:

  • 4th August: Goerli Testnet Shadow Fork 6
  • 18th August: Mainnet Shadow Fork 11
  • 31st August: Mainnet Shadow Fork 12

September:

  • 9th September: Mainnet Shadow Fork 13

Why did Ethereum Developers do Shadow Forking?

Ethereum is a multi-billion dollar chain. The Merge is a major upgrade for changing the consensus mechanism. Ethereum developers are exploring innovative ways of testing The Merge.

  • It allows developers to see how nodes react when The Merge happens using only a small number of nodes without disrupting the canonical chain.
  • It gives developers a more realistic environment to test in than launching new testnets because existing testnets already have transactions happening organically on them and a large state size & block history, which put nodes under more stress than new testnets.
  • It possibly exposes bugs in clients and would suggest optimizations needed.
  • It helps developers test the realistic scenario and test Optimistic Sync on Consensus Layer and Snap Sync on the Execution Layer.
  • It helps developers increase their confidence that implementations work as expected.

The Kiln Merge testnet aimed to allow the community to practice running their nodes, deploying contracts, testing infrastructure, etc. However, since it was a fresh testnet with a bit of activity, the team needed a way to stress test their assumptions around syncing and state growth.

The team also needed to check if their assumptions work on existing testnets and mainnet. So that's why they have shadow forked an existing testnet and added Merge-related fields such as Total Terminal Difficulty (TTD) and Merge Fork Block.

Testnets don't have much state as they run for weeks or months. Also, no one uses them daily, but the mainnet is used quite a lot, so its state is enormous, i.e., around 50 gigabytes. So, developers needed to synchronize all states, and shadow forking gave them an excellent way to test the synchronization. Also, the synchronization takes way longer on the mainnet, as the possibilities of bugs are higher.

Goerli Testnet Shadow Forking 1

In this section, we will understand how developers did the shadow forking of Goerli Testnet for the 1st time.

So, firstly developers deployed a contract on Goerli with all the validators. Then, once the TTD, i.e., Total Terminal Difficulty, was hit, developers configured those nodes to follow the proof of stake chain after a specific block or after a particular total difficulty went on and merged, i.e., switched to Proof of Stake. Then, they tested The Merge on a real testnet by creating a fork:

  1. Developers deployed a genesis contract on Goerli and deposited enough validators to start a chain.
  2. It went through all the stages of the merge.
  3. Transactions from Goerli were also executed on the testnet, and the network was finalizing.

In the above Image, The top row of Goerli blocks shows a node on the canonical chain which are not aware of the shadow fork. The middle row of Goeli blocks shows a node on the shadow-forked chain, which has a modified configuration telling it to fork once the TTD is hit. Finally, the bottom row shows a Beacon Chain, which was launched for the shadow fork only. It will provide consensus to the chain when TTD is hit. After TTD is hit, the nodes on the canonical chain continue producing blocks, usually as if nothing has happened to them. After TTD is hit, nodes with the modified configuration fork off and run through The Merge. The next validator produces the first post-merge block in the Beacon Chain.

So this makes shadow forking a good technique as it gives us a way to test whether the fork works and the merging mechanism works without disturbing the public testnet. Here is a link to the Goerli Shadow Fork Testing Plan. This document aims to list various test cases that can be performed. The goal is to repeat this shadow fork process relatively often, allowing us to test the merge transition in multiple scenarios.

After the 1st shadow fork, developers did three more shadow forks of Goerli to find more bugs and better preparation before The Merge.

Bugs

During the Goerli Shadow forking 1, Developers were able to find loads of pitfalls and issues during deployment, which will be helpful in better preparation of the Merge. This was a huge leap towards the merge testing.

Goerli Testnet Shadow Forking 2

The Goerli chain was already up and running, and the Beacon Chain underwent genesis at 9 AM UTC on 23rd March. The Merge occurred on 25th March at 5 AM. This was how Developers did Goerli Shadow Forking the 2nd time. Shadow forking was good, and the chain was finalizing. All nodes seem to have survived the shadow forking and were in consensus.

Bugs

Firstly, Parithosh forgot to add a JWT secret to the node connected to the Beacon Explorer, so everything just crashed when it tried to use the Engine API. This happened because the explorer runs on the bootnode and Parithosh tends to set up bootnodes manually. Parithosh fixed the issue later, but it had led to missing attestations and proposals from when the node was down. As a result, the participation rate decreased from 99.6% to 85%.

Secondly, Prysm started failing to verify the sig of blocks 13774, 13775, and 13776, which started triggering other issues. Nimbus Geth itself also started showing a sig verification error. tersec and Parithosh later resolved this issue.

Some more issues with syncing, handling big blocks, and operation timeouts were seen on Goerli. But these don’t happen on the mainnet. As a result, patches were released to clients. There were also some issues with the Geth/Teku combo on 8GB RAM nodes as both of them required a bit over 4GB, so they kept fighting for RAM.

Goerli Testnet Shadow Forking 3

After the success of Goerli Testnet Shadow Forking 2, Developers decided to shadow fork Goerli Testnet for the 3rd time. Developers meant this shadow fork to be another sync target to test against. The aim was to see if the mainnet split brought out any issues. This was an important stress test as it was about to dictate optimizations needed to run up the merge. It's the first time they attempted this.

Bugs

Developers discovered a bug impacting Geth nodes. However, the bug only affects a subset of Geth nodes and doesn’t impact network finalization.

Goerli Testnet Shadow Forking 4

Developers set this shadow fork to occur at 8:30 AM on the 19th April. The corresponding beacon chain launched on 15th April.

Bugs

There were a total of 20 client combinations participating in this shadow fork. There was no major insights or bugs as developers were focused mainly on the mainnet shadow forks at that time.

Goerli Testnet Shadow Forking 5

Goerli Shadow Fork 5 was merged successfully. Here is quick overview from Parithosh Jayanthi:

Bugs

There were no major issues. Around ~30% of the network was successfully running mev boost. But due to some bugs, Nethermind was not able to process the blocks successfully as there was a choice of terminal blocks available. So, Nethermind nodes that chose the wrong one, i.e., the one that the chain did not eventually follow, ended up in a kind of deadlock for a while as they didn’t automatically validate the alternative terminal block.

Goerli Testnet Shadow Forking 6

The goal was this shadow fork was to test MEV boost during the transition.

Bugs

There were no client issues, and all client combinations were successfully synced. Two nodes ran out of disk space, resulting in a sudden drop in the participation rate. 30% of the network ran the mev boost through the transition, and there were no major issues reported other than higher latency on one node in India that was potentially due to a defective machine.

Mainnet Shadow Forking 1

CL Beacon chain Genesis happened on 8th April at 2 PM. TTD was expected to hit on 11th April at 4 PM.

Since this is a mainnet shadow fork, devs have to download the mainnet state (could take some time). After the first mainnet shadow fork was live, the blocks were produced and finalized with no issues.

The node split on this shadow-fork attempts to mimic the mainnet. Here's a grafana dashboard where Green is Good, and Red is Bad.

The participation rate did drop, but the it is still well above the minimum required for finality.

This was the first attempt toward a mainnet-shadow-fork, the team is expecting to learn a lot from this transition.

The following week would be spent with sync tests against this fork and trying to trigger more edge cases. The team is also planning to repeat it next week for more advanced users.

The shadow fork has already processed 6,345,216 transactions with an average block time of 14.3 seconds as of now. We can find all the latest details on the Transaction Explorer.

We can find more details like Network Uptime, Block Propagation, Average Network Hashrate etc. on Ethstats.

Many thanks to Parithosh Jayanthi, a developer at Ethereum Foundation, for all this information and images.

Bugs

Marius Van Der Wijden found that Nethermind and Besu stopped at the transition, but a fix is being deployed for Nethermind that allows them to sync up. All beacon chain clients are now in agreement. Also, one more issue was found that could've easily been missed on the devnets. The default gas limit was 8 million, but miners voted it up to 30 million.

Time Bieko found that Nethermind had some sync issues around the transition. These weren't new, as they had been discovered during the Goerli fork, but the team wanted to make sure they didn't miss a single test run, so they ran with it despite not having fully patched the issue yet. But According to Marek Moraczyński, most of the Nethermind nodes were working fine on the mainnet shadow fork, but due to the wrong conditions in the beacon headers sync, their nodes were stuck with syncing.

Then, HyperledgerBesu also hit a few problems. The first was a simple config issue which was quickly solved. The second had to do with fast sync, i.e., when the pivot block moved from a pre to post-merge block, the post-merge rules weren't automatically applied. That's being fixed too.

Next, ErigonEth got through the fork smoothly and even picked up a bug in some CL clients, which were making calls to unexpected JSON RPC endpoints via the Engine API. Finally, a few more edge cases were found during the mainnet fork, none of which were consensus-breaking.

Mainnet Shadow Forking 2

#TestingTheMerge team Shadow forked the Ethereum Mainnet for the second time on April 23, 2022. Terminal total difficulty (TTD) was hit and the shadow fork of Mainnet is running on Proof of Stake. Except Erigon, other consensus & execution layer client combinations are synced. According to Danny Ryan "it seems to be gone well", live tweeted Ben Edgington from Amsterdam.

Progress of Shadow Fork 2 can be followed on the EthStats. Green is good!

sf

Mainnet Shadow Fork 2 was also successful. Developers have seen a really high participation for attestations and sync committees. There are also a few nodes still syncing the Execution Layer(EL) which has given developers a good test of optimistic sync. In particular some nodes had the Consensus Layer(CL) in sync prior to the merge but the EL not yet sync'd.

Here is a quick screenshot of teku and geth successfully validating.

Here is a quick screenshot of teku and besu successfully validating.

Credits to Adrian Sutton, a Lead Blockchain Protocol Engineer at ConsenSys, for all this available information and images.

Difference between Mainnet Shadow Forking 1 & 2

This was the first shadow fork where every client combination survived the transition and managed to stay in sync afterwards. Another minor difference was that this shadow fork used every client's develop/unstable branch, so developers aren't using merge branches anymore.

Developers have reused the mainnet deposit contract with a new fork ID. This means that every mainnet deposit needs to be processed and listed as invalid on the shadow fork. This huge computation triggered some edge cases in some clients, and the good news was that the network still chugged along. Developers were going to continue monitoring the network over the week to look for issues, perform sync tests, and stress the nodes.

Credits to Parithosh Jayanthi, a developer at Ethereum Foundation, for all this available information.

Bugs

Before the merge, both Prysm and Nimbus had some issues handling the huge stream of deposits that were being processed. Prysm was generating blocks very late, causing a lot of missed head votes and low sync committee participation. Nimbus was producing invalid blocks. Neither issue was related to the merge, but both have been resolved. Despite those issues and having so many late blocks, the merge went fine.

Mainnet Shadow Forking 3

Mainnet Shadow Fork 3 was also successful. Participation pre-TTD was 99.8% but sometimes Prysm/Besu validators would miss blocks. After the Merge, participation dropped slightly to 97.6%. Developers also purposefully made a few nodes fall out of sync leading up the Merge to see how they would handle resyncing to the chain post-Merge.

Bugs

Prysm/Nethermind validators also had some issues with the transition. The issues were being investigated and/or fixed. It was almost bugless this time and no late blocks were seen. All clients were in sync.

Mainnet Shadow Forking 4

Mainnet Shadow Fork 4 was also successful. All clients went through the transition without a hitch. Only Erigon was missing due to unrelated sync issues, but it wasn’t related to the implementation of Merge specs since the Erigon team was working on a new sync mode for their client software.

Bugs

A couple of client pairs were creating empty blocks because of a timing issue where CL clients weren't giving enough time to EL clients for block production. Time Beiko has explained that the way blocks get created post-merge is that a validator who is chosen to produce the next block has their CL sends a first API call to the EL to let them know about the latest head and have them start building a candidate block on top of that head. The CL must then make another call to the EL for it to return its candidate block. If the first and second call is two close to one another, the EL returns an empty block to the CL because it didn't have time to create one from the transactions in its transactions pool. Affected CL teams have started working on a fix for this, i.e., basically delaying the time between the two calls.

Mainnet Shadow Forking 5

Mainnet Shadow Fork 5 was also successful. According to Danny Ryan " it went exceedingly well. " It was done with an equal distribution of clients, i.e., equal client split on EL and CL.

Bugs

At first, it looked like it had not gone well at all, with no execution blocks being produced. But later, this was a block explorer issue, and the actual chain was perfectly fine. There was a 97% participation rate. Missing 3% was due to an unhealthy shutdown on Besu, which happened before TTD.

Mainnet Shadow Forking 6

Mainnet Shadow Fork 6 was also successful!!

Bugs

A couple of execution clients struggled to produce blocks with transactions. The issue of having zero lead time between prepare payload and get payload still exists on the consensus side. Here are some issues which were seen in different client combinations:

  • Prysm-erigon-2 fell out of sync but prysm-erigon-1 is doing fine.
  • Nimbus-erigon-1 fell out of sync but nimbus-erigon-2 is doing fine.
  • LH-Erigon + Teku-Erigon seem to not work at all.
  • Besu and Erigon still propose blocks with 0 transactions.
  • Nimbus-nethermind proposes with 0 transactions.

Blockipedia

Mainnet Shadow Forking 7

The main aim was to test fixes deployed on Ropsten and make sure that there were no regressions. Additionally, there was an equal client split for all validators.

Bugs

All the issues reported were not directly Merge related. Erigon had issues with peers not having the state and dropping out of sync. This issue was related to how shadow forks are set up. A fix has been deployed. Besu nodes had trouble with database corruption which was concurrency related, i.e., multiple threads updating the database at the same time. According to Justin Florentine, the Besu team believes that it was due to the non-threadsafe in-memory implementation of the Bonsai trie, and how it is being written to disk via the local RocksDB instance. As a result, consensus clients could no longer make any attestations, and network participation dropped. However, clients were still able to propose empty blocks. A deadlock issue also showed up a little later in a Nethermind–Teku, where both were waiting for each other.

Mainnet Shadow Forking 8

The main aim of this shadow fork was to test a mix of forest/bonsai nodes for besu and using static peers for erigon.

Bugs

Due to syncing issues in Erigon, erigon nodes were ~900k blocks behind the head. Therefore, they were not attesting/proposing the blocks. On the other hand, Besu nodes proposed blocks with 0 transactions.

Mainnet Shadow Forking 9

Mainnet Shadow Fork 9 was successful.

Bugs

Lighthouse nodes were falling out of sync and were not catching up until the next epoch. Later, a fix was deployed. 4 of 5 nodes hit an invalid block in Besu. Erigon nodes did not sync in time.

Mainnet Shadow Forking 10

Mainnet Shadow Fork 10 was also successful and there were no client incompatibilities during the transition.

Bugs

Some besu nodes ran an older version and needed an update/resync. As a result, the client participation rate dropped down. Lodestar-erigon were facing trouble fetching a block, but this was due to shadow fork peering setup rather than a real issue.

Mainnet Shadow Forking 11

MSF 11 was successful!

Bugs

According to Tim Beiko, 34/35 nodes make it through The Merge without issues, with the 1/35 simply not having synced. Erigon was producing bad blocks which had happened on Goerli before. There was an issue with Nethermind's block production when interacting with certain client pairs. Both of these have fixes are in progress, and devs test them on future shadow forks once merged.

Mainnet Shadow Forking 12

According to Parithosh, MSF 12 was completed without any bugs, and he used all the versions recommended in the official Mainnet Merge Announcement. But some client teams have new updates after that. So MSF 13 will be done using the latest releases.

Bugs

So far, Developers have found 0 issues. There were no missed proposals, and the attestation rate didn't change pre-post merge. Justin Florentine also said that MSF 12 was Perfection or Perfect Merge.

Mainnet Shadow Forking 13

Mainnet Shadow Fork 13 was merged on 9th September 2022. In the last Ethereum's Consensus Layer Call #95, Parithosh told everyone that this would be the last shadow fork before The Merge upgrade.

Bugs

Developers saw that the attestation rate had dropped to ~97%. This was due to some stale data on one node that developers forgot to clear up, and the node thought it was on the wrong shadow fork. Also, No other client incompatibility or issues were seen.

Also, 20% of the network was running mev-boost. There was an issue reported by Ξnrico Del Fante and bloxRoute team. An older version of the builder was running by mistake for the shadow fork. This version wasn't properly sending bids to the relay. This led to the relay accepting signed blinded blocks, but not providing the payload. Therefore devs saw missed proposals on the shadow fork. The builder has now been patched and the issue fixed. Bloxroute Team has ensured that mainnet runs the correct version, so this should not happen on mainnet.

Related Videos

______________________________________________________________________

Disclaimer: The information contained on this web page is for education purposes only. Readers are suggested to conduct their own research, review, analyze and verify the content before relying on them.

To publish press releases, project updates and guest posts with us, please email at contact@etherworld.co.

Subscribe to EtherWorld YouTube channel for ELI5 content.

Support us at Gitcoin

You've something to share with the blockchain community, join us on Discord!

Follow us at Twitter, Facebook, LinkedIn, and Instagram.


Share Tweet Send
0 Comments
Loading...
You've successfully subscribed to EtherWorld.co
Great! Next, complete checkout for full access to EtherWorld.co
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.