Why Ethereum Clients prefer SSZ over RLP?

Serialization Formats Overview

Why Ethereum Clients prefer SSZ over RLP?

After The Merge, Ethereum uses two serialization formats for its two underlined layers. Execution clients use RLP, whereas Consensus clients use SSZ for data storage and transmitting. Because of the different structured format, it creates additional overhead and complexity for building wallets and Ethereum light clients. One of the solutions proposed by Etan Kissling, Developer Nimbus, is to use SSZ in EL. In this article, we will get an overview of SSZ and RLP.

RLP

Recursive Length Prefix (RLP) standardizes the data transfer between nodes in a space-efficient format. It is an encoding/decoding algorithm that helps Ethereum to serialize data and reconstruct it quickly. Data serialization is necessary for many complex data forms to be stored or transmitted in only one formal format.

Encoding

RLP encoding function takes an item. The item here can be defined as a string (i.e., byte array), and a list of items is also considered an item. For example:

  • an empty string
  • a string containing the word "cat"
  • a list containing any number of strings;
  • complex data structures like ["cat", ["puppy", "cow"], "horse", [[]], "pig", [""], "sheep"].

Points to Remember:

  1. 0x represents Hexadecimal Number.
  2. ASCII table will help understand the relationship between decimal, hexadecimal, and characters.

RLP Encoding Rules:

  1. Input = non-value (null , '' , false )
    => RLP Encoding = [0x80]

  2. Input = Empty list([]):
    => RLP Encoding = [0xc0]

  3. Input p is a single byte where p ∈ [0x00, 0x7f]
    => RLP Encoding = [p]

  4. Input s is a string that is 1 byte in length.
    => RLP Encoding = [s]

  5. Input s is a string that is 1–55 bytes long.
    len = length of s
    first byte f = 0x80+ len | where f ∈ [0x81, 0xb7]
    => RLP Encoding = [f, ...s]

  6. Input s is a string > 55 bytes long.
    len = length of s
    b = bytes required to represent len
    first byte f = 0xb7+ b| where f ∈ [0xb8, 0xbf]
    => RLP Encoding = [f,...len, ...s]

  7. Input l is a list with a payload of 1–55 bytes.
    len = length of each RLP encoded list item summed together
    first byte f = 0xc0+ len| where f ∈ [0xc1, 0xf7]
    concat = concatenation of RLP encodings of list items
    => RLP Encoding = [f, ...concat]

  8. Input l is a list that has a payload > 55 bytes:
    len = length of each RLP encoded list item summed together
    b = bytes required to represent len
    first byte f = 0xf7+ b| where f ∈ [0xf8, 0xff]
    concat = concatenation of RLP encodings of list items
    => RLP Encoding = [f, ...len, ...concat]

Examples:

  1. Let s = [0x64, 0x6f, 0x67]
    s is the hex byte array of the string “dog”.
    For the readers who need clarification on how these values above hex values are here. We recommend they go to ASCII Table in the Points to Remember. For lowercase d , the hex value is 0x64.
    len = 3 (3 elements in the byte array)
    Rule No. 5 will be used here.
    f = 0x80 + 3 = 0x83
    => RLP Encoding = [f, ...s]= [0x83, 0x64, 0x6f, 0x67]

  2. Let p = '' (empty string ε)
    Rule No. 1 will be used here.
    RLP Encoding = [0x80]

Working:

  • Step 1: The algorithm receives the input.
  • Step 2: It checks the input with the rules described above.
  • Step 3: The appropriate encoding process is applied based on the conditions satisfied.
  • Step 4: We get RLP encoded output.

Decoding

The RLP decoding process works as follows:

  • Step 1: According to the first byte of input, RLP decoding analyses data type, the length of the actual data, and offset.
  • Step 2: According to the data's type and offset, it decodes the data correspondingly.
  • Step 3: It then continues to decode the rest of the input data if possible.

RLP Decoding Rules:

  • Step 1: Look at the first byte, and it should fall in one of the following ranges:
    a) [0x00 .. 0x7f] : Data is of type String and should be decoded as it is
    b) [0x80 .. 0xb7] : String, and it's a short string
    c) [0xb8 .. 0xbf] : String, and it's a long string
    c) [0xc0 .. 0xf7] : List, and it's a short list
    d) [0xf8 .. 0xff] : List, and it's a long list

  • Step 2: Get the length of the byte array:

First Byte — First Byte from the Byte Range = length of the data
  • Step 3: Perform Step 1 and 2 until the end of the byte array.

Example:

Let's take a string "dog" encoded into RLP as:
"dog" = [0x83, 0x64, 0x6f, 0x67]
Input = [0x83, 0x64, 0x6f, 0x67]
1st byte = 0x83
It falls in [0x80 .. 0xb7], then the data type is a string, and it's a short string.
length = 0x83 - 0x80 = 3
Data is of type string, and its length is 3
With these facts, we can quickly parse until the end of the string, i.e., 0x64, 0x6f, 0x67.

SSZ

Simple Serialize (SSZ) is the serialization method used on the Beacon Chain. It replaces the RLP used on the execution layer everywhere across the consensus layer except the peer discovery protocol.

Why SSZ was selected over RLP?

It is designed to be deterministic and also to Merkleize efficiently. It is not self-describing but relies on a schema that must be known in advance.

When we serialize an object of a certain type and then deserialize the result, we end up with an object identical to the one we started with. This is essential for the communications protocol.

When we serialize two objects of the same type and get the same result, the two objects are identical. However, if we have two different objects of the same type, then their serializations will differ. This is essential for the consensus protocol.

The main goal of SSZ is to be able to represent complex internal data structures such as the BeaconState as strings of bytes.

Timeline

Serialization & Deserialization

Serialization is the process of taking structured information and transforming it into a representation that can be stored or transmitted.

It is used for consensus, peer-to-peer communication, and users accessing a beacon node API. In addition, data must be serialized before being written to disk.

SSZ's Basic Types:

  1. Unsigned integers: a uintN is an N-bit unsigned integer, where N can be 8, 16, 32, 64, 128 or 256.
  2. Booleans: True or False.

uintN types are encoded as the little-endian representation in N/8 bytes.

In little-endian machines, the last byte of the multibyte data type is stored first. On the other hand, in big-endian machines, the first byte is stored first.

Serialization Examples:

  1. The decimal no. 12345, i.e., 0x3039 in hexadecimal:
  • As a uint16 type, it is serialized as 0x3930 (2 bytes).
  • As a uint32 type, it is serialized as 0x39300000 (4 bytes).
  1. Boolean types are always one byte and serialized as 0x01 for true and 0x00 for false.

SSZ's Composite Types:

  1. Vectors: An ordered fixed-length homogeneous collection with exactly N values. In the SSZ spec, a vector is denoted by Vector[type, N]. For example, Vector[uint8, 32] is a 32-element list of uint8 types.

  2. Lists: An ordered variable-length homogeneous collection with maximum N values. In the SSZ spec, a list is denoted by List[type, N]. For example, List[uint64, 100] contains anywhere between zero and one hundred uint64 types.

  3. Bitvectors: An ordered fixed-length collection of boolean values with N bits. In the SSZ spec, a bitvector is denoted by Bitvector[N].

  4. Bitlists: An ordered variable-length collection of boolean values with a maximum of N bits. In the SSZ spec, a bitlist is denoted by Bitlist[N].

  5. Containers: An ordered heterogeneous collection of values. In the SSZ spec, a container is denoted by BitlistUnion[type_0, type_1, ...].

SSZ distinguishes between fixed and variable size types and treats them differently when they are contained within other types.

We recursively define the serialize function, which consumes an object value of the type specified and returns a byte string of type bytes.

To deserialize an object requires a schema.

Importance of Schema:

  • It defines the precise layout of the serialized data so that each specific element can be deserialized from a blob of bytes into some meaningful object with the elements having the right type, value, size, and position.

  • It tells the deserializer which values are actual values and which ones are offsets.

Readers can follow the example in ethereum.org and ethereum/consensus-specs to learn more.

SSZ Vs RLP

Criteria Compact Expressiveness Hashing Indexing
RLP Yes Flexible Possible No
SSZ No Yes Yes Poor

These results are based on the findings by Piper Merriam.

Ethereum needs Serialization in Networking for transporting objects between clients across a network and in the Consensus for manipulating objects within the protocol's logic.

Compactness tells us how space-efficient is the serialized bytes representation of the data.

RLP is efficient under the conditions that the data structure has relatively few elements. But SSZ is not efficient due to the 4-byte length prefixes it uses for dynamic-sized data structures and the length prefixes it uses for containers that are not needed.

Expressiveness tells us whether the serialization supports the data types we use.

RLP only supports dynamic length byte strings and dynamic sized lists of dynamic length byte strings. All of the additional data types are supported via additional abstraction layers provided by the various RLP libraries. SSZ supports all of the needed data types.

Hashing tells whether data structures can be efficiently hashed and re-hashed after minor modifications.

SSZ allows efficient re-hashing of objects with minor modifications. But RLP does not provide the same performance gains as needed.

Indexing is the act of accessing the inner values of a data structure without fully deserializing it.

SSZ also allows fast indexing. But RLP does not allow fast indexing, and this approach can lead to O(N) complexity.

Implementations

Here is the list of active SSZ implementations, i.e., maintained by client teams and other members of the ethereum community.

Language Project Implementation
C++ Mammon potuz/mammon
Dafny Eth2 spec ConsenSys/eth2.0-dafny
Go ZRNT protolambda/ztyp
Go Prysm ferranbt/fastssz
Java Teku PegaSysEng/teku/ssz
Nim Nimbus status-im/nim-beacon-chain/ssz.nim
Python Trinity ethereum/py-ssz
Python Eth2.py protolambda/remerkleable
Rust - ralexstokes/ssz_rs
Rust Lighthouse consensus/ssz
Typescript Lodestar ChainSafe/lodestar/ssz
Zig - gballet/ssz.zig
C# - hexafluoride/SszSharp

Conclusion

Where Ethereum Client developers seem to be in agreement of switching to SSZ for Execution clients, there was also a general consensus of having the implementation worked on in the Cancun-Deneb upgrade, which is expected to be on the mainnet sometime later in 2023. At present, testing of Shanghai-Capella upgrade is the priority where Execution Clients will continue to have RLP.

Resources: ethereum.org, eth2book.info, notes.ethereum.org, medium/@markodayansa, medium/@derao & consensus-specs/issues/2138

Watch Videos

Read More Articles

______________________________________________________________________

Disclaimer: The information contained on this web page is for education purposes only. Readers are suggested to conduct their own research, review, analyze and verify the content before relying on them.

To publish press releases, project updates and guest posts with us, please email at contact@etherworld.co.

Subscribe to EtherWorld YouTube channel for ELI5 content.

Support us at Gitcoin

You've something to share with the blockchain community, join us on Discord!

Follow us at Twitter, Facebook, LinkedIn, and Instagram.


Share Tweet Send
0 Comments
Loading...
You've successfully subscribed to EtherWorld.co
Great! Next, complete checkout for full access to EtherWorld.co
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.