Towards a Comprehensive ERC Test Suite

Runtime Verification
4 min readOct 30

--

Developing ERC tokens can be particularly challenging because of their inherent immutability, the intricacy of their behavior, and the adversarial nature of the environment. Hence, ensuring bug-freedom and mission realization is a difficult and long process. Common practices include manual code review (which is time-consuming and costly) and unit testing (which cover very few behavior of the token contract). One of the most versatile techniques to get almost-immediate feedback on some contract code is property testing. However, writing a comprehensive test suite that catches all possible bugs (aka a complete test suite) requires a considerable effort, and the process is as hard and error-prone as writing the token contract itself.

At Runtime Verification, we wanted to address this problem and embarked on a mission to provide the community with handcrafted, comprehensive, and complete test suites for the most used ERC standards. Starting on the same journey most developers from web2 to web3 are likely to follow (the Open Zeppelin ERC templates), we committed to writing hundreds of tests to cover the most common standards: ERC-20, ERC-721, ERC-777, ERC-1155, and ERC-4626.

  • Ensuring quality. We spent hours reading the EIP (Ethereum Improvement Proposal) standard description to cover each function with the required tests. Not only we reviewed and proofread our test suites, we also tested them (yes, we test our test suites) to ensure that they can indeed find all possible bugs a programmer could introduce. One of the methods we used was mutation testing, where specific components of the source code are modified to ensure that the test suite can detect the changes. This ensures consistency and covers for the cases that a human being would miss.
  • Ensuring accessibility. Beyond technical accuracy and quality, we wanted to ensure that our test suites are usable by as many users as possible, from token investors (with possibly no Solidity background) to experienced Solidity developers or auditors. Hence, our test suites are available through several interfaces:
  • a website where one can input some address or Solidity code and get a detailed and thorough evaluation results in a minute.
  • a VSCode plugin enabling using our test suite as in unit testing triggering tests with the click of a button.
  • an open Application Programming Interface allowing the developers to interact with our test suite from any programming language and to do batch evaluations of tokens.

This blog post acts as an intermediate report to the community, showing where we are and announcing where we go. So far, we have released two test suites for some of the most arguably common standards: ERC-20 (fungible tokens) and ERC-4626 (tokenized vaults). In the following, we show tables comparing ERCx offering vs available tools and tool suites on the market.

Both tables use the following dimensions in the comparison, categorized as follows.
First the tables report on the fuzzing capability: only tools relying on the Forge framework allows fuzzing. This is one of the most import criteria since it determines how “deep” the contract under test is evaluated.

Then, the tables report on the forms of supported contracts:

The third category focuses on the interfaces:

  • Console: ability to use a terminal for running the test suite.
  • Website: ability to use a website UI for running the test suite.
  • API: availability of an API for interacting with the test suite.
  • VSC plugin: availability of a plugin to run the tool in VSCode.

The next category is User-friendliness, essentially reflecting how technical aspects of the underlying tool are hidden to the user.

  • No installation required; it indicates whether the user has to use some (command-line) installation before using the tool or is it available out of the box.
  • Usability is an evaluation of the ease of use; while this is a subjective metric, we measure this based on the time spent setting up and running the test suite.
  • Feedback on results indicates whether the tool provides any form of feedback, that can serve as a hint to the user to fix the reported problems.
  • Test runtime as the name suggests is important in that it indicates how long the user has to wait to receive results.

The last category, Coverage, indicates how “deep” the tool investigates for finding bugs.

  • Number of tests, as the name suggests, indicates the number of test cases in the test suite. We note that the precise number do not matter since two tests can exercise different behavior but the order of the number still gives a sense of the precision of thoroughness of the test suite.
  • Fuzzing per test indicates how complete is the execution of one test in, for instance, the number of input values it will provide to the tested contract or the number of initial configurations for the tested contract it will consider.
  • Functions from standards indicates how much of the functions described in the corresponding EIP standard are exercised by the test suite.

The first table focuses on ERC-20, which is the most common standard, with several offers available: from OpenZeppelin, Slither.

The second table focuses on ERC-4626, and we compare it with the test suite from a16z crypto.

Stay tuned as we are preparing the release of two new test suites soon!

Originally published at https://runtimeverification.com.

--

--

Runtime Verification

Runtime Verification Inc. is a technology startup providing cutting edge formal verification tools and services for aerospace, automotive, and the blockchain.