Testing ERC-20 Tokens Part 1: An Arsenal for Bug Detection Towards a Benchmark for ERC-20 Test Suites

7 min readOct 18, 2023

Abstract

Runtime Verification and Certora partnered to analyze a set of tools for detecting bugs in ERC-20 token contracts. In this series of two blog posts, we dive into the tools for detecting bugs in ERC-20 smart contracts. In the first part, we introduce the existing tools and compare them using high-level usability criteria. In the second part, we get more technical and focus on the bug-detection capabilities of these tools, introducing the notion of bug-detection rate. In publishing the second part of this blog post, we will offer an extensible benchmark to the community to evaluate and compare tools.

Introduction

The ERC-20 standard is a set of rules and guidelines for creating smart contracts that govern the behavior of tokens on the Ethereum blockchain. Introduced in 2015, it has become the most widely adopted standard for creating and managing digital assets on the Ethereum platform. ERC-20 tokens are self-executing contracts that represent fungible assets, such as cryptocurrencies or utility tokens. The standard defines a specific set of functions and events that a token contract must implement to ensure compatibility and interoperability with other tokens and decentralized exchanges.

While the ERC-20 standard has contributed to the growth of the Ethereum ecosystem, it has also presented security challenges. The decentralized nature of Ethereum and the complexity of smart contracts make them attractive targets for hackers and malicious actors. Vulnerabilities like reentrancy attacks and integer overflow have resulted in significant financial losses in the past. Testing plays a crucial role when it comes to mitigating these risks. By thoroughly testing smart contracts, developers can identify and fix potential bugs and vulnerabilities, safeguarding the security and reliability of their tokens.

The Ethereum community has come up with testing frameworks like Forge, Hardhat, and Truffle. Using these tools, developers can write tests, either in Javascript or in Solidity, to test smart contracts. However, a major challenge is to determine whether the current test suite is good enough. This poses the issue of measuring the quality of a test suite, and there is currently no reliable method or tool to do so in the DeFi community. Moreover, while the community has come up with several test suites, such as the ones from Slither and OpenZeppelin, developers are again left alone in evaluating the quality of such test suites.

In the rest of this blog post, we will undertake a comparison of some of the most known and publicly available tools used for testing/evaluating ERC-20 tokens. As testing plays a critical role in ensuring the security and reliability of smart contracts, it becomes imperative for developers, especially beginners, to have a good understanding of the available testing tools and their features. By exploring and comparing these tools, we aim to provide valuable insights into their functionalities, ease of use, and effectiveness in detecting potential bugs and vulnerabilities in ERC-20 tokens. Whether you are a developer looking to enhance the quality of your smart contracts or an enthusiast interested in the technical aspects of blockchain, this comparison will equip you with the knowledge needed to make informed decisions while testing ERC-20 tokens. Let us dive into the world of ERC-20 testing tools and uncover their strengths and weaknesses to empower you on your journey to building secure and efficient smart contracts.

Existing Tools and their Test Suites for ERC-20

We introduce some background about testing and then present the existing test suites for testing ERC-20 tokens.

Background: Parametric/Fuzz vs. Unit Testing

A test exercises the behavior of a contract by providing it with inputs and observing its outputs. As such, a test examines how the contract responds to various scenarios and checks if it meets certain predefined criteria or requirements (for instance, that some of the values of variables respect some constraint).

A parametric test is a test with one (or several) parameter(s); typically, it means the test has parameters/variables (e.g., some amount, or address), and the exact behavior of the test is obtained by providing values to these parameters. Parametric tests are often used to validate the overall performance, reliability, and robustness of a system.

While a non-parametric test will exercise only a unique fixed behavior defined in it, a parametric test, in theory, exercises as many behaviors as values that can be taken by its variables.

Fuzzing or fuzz testing refers to techniques used for selecting or generating random values for the parameters of parametric tests. An example strategy for obtaining values for parameters consists in using a combination of random values with values obtained from the constants identified in the contract source or bytecode.

A unit test is a more specific type of test that checks individual components or units of a program in isolation. It aims to ensure that each unit of code, such as a function or a method, behaves correctly and produces the expected output when given a specific input. Unit tests are different from system tests, which exercise several units or features of the contract.

Finally, we note that oftentimes, terminology is not strict, and a unit test may refer to a non-parametric test. We will follow this choice in the rest of this blog post.

Tools and Test Suites

We compare four different tools and their test suites. Each test suite employs a specific setup and approach to assess compliance and/or robustness. In the next section, we evaluate their effectiveness in testing ERC-20 contracts.

ERCx

ERCx provides a test suite of 116 parametric tests. ERCx relies on the Forge testing framework for test execution. Tests in ERCx are structured into six testing levels corresponding to different levels of demand on the contract. ERCx can test ERC-20 contracts described by their address or source code, allowing for evaluation and verification at different stages in the lifecycle of a contract.

OpenZeppelin

The OpenZeppelin test suite comprises 158 unit tests. The test suite is specifically designed to test ERC-20 contracts developed by OpenZeppelin as it is tailored to using some of the internal functions of the OpenZeppelin ERC-20 contract. The test suite is publicly available on the OpenZeppelin repository and uses the Hardhat testing framework.

Slither

The Slither test suite comprises 23 detectors. Slither’s analysis aims to identify potential vulnerabilities and security issues in the implementations of ERC-20 contracts. More specifically, Slither performs syntactic checks on functions, their signatures, etc. See this page for more details. While Slither uses static analysis as opposed to tests, it is an effective tool for finding bugs and vulnerabilities in smart contracts due to its security-oriented static analyses.

ChatGPT

To explore alternative approaches for test suite design and implementation, we generated a test suite using ChatGPT. The suite comprises 22 unit tests generated through the Hardhat testing framework using the ChatGPT 3.5 language model. While we do not claim that the generated test suite exhausted the possibilities of the tool, the generated test suite constitutes a point of reference about what can automatically be created with a moderate effort of interacting for a few hours with the prompt until no further tests could be generated.

Comparing Test Suites

We now report on the results of comparing the test suites. See the table below.

Evaluation Criteria

We used the following criteria for our comparison:

Fuzzing capability: only parametric tests allow fuzzing.
From source code: ability to run the test suite on the source code of a contract.
From address: ability to run the test suite by only providing the address of an already deployed contract.
Console: ability to use a terminal for running the test suite.
Website: ability to use a website UI for running the test suite.
API: availability of an API for interacting with the test suite.
VSC plugin: availability of a plugin to run the tool in VSCode.
Usability: evaluation of the ease of use; while this is a subjective metric, we measure this based on the time spent setting up and running the test suite.

Results

The table above summarizes our findings. Each column represents a test suite indicated by their corresponding tool logos. The comparison criteria, as presented in the previous section, are shown in the rows.

Some Notes on Tools

It is important to note that the OpenZeppelin test suite we evaluated was specifically created to test ERC-20 contracts developed by OpenZeppelin. As such, its primary focus is on ensuring the correctness and reliability of OpenZeppelin’s ERC-20 implementation. While this test suite provided a comprehensive set of unit tests, its usability may be limited when testing other ERC-20 contracts not developed by OpenZeppelin. Developers using this test suite for their own ERC-20 contracts may encounter challenges in adapting and customizing the tests to their specific contract implementations.

The Slither tool, specifically the slither-check-erc command used for testing ERC-20 contracts, focuses primarily on checking only the contract's ABI (Application Binary Interface). However, since Slither performs efficient static analysis, it is capable of detecting many vulnerabilities. Nonetheless, it also suffers from the limitations of any static analysis tool in that it may produce false positives (i.e., it may report non-existent bugs) and is not capable of checking advanced behavioral properties.

Final Thoughts: Bringing Advanced Testing to Everyone

In the context of ERC-20 testing, our aim was to present a thorough behavioral analysis of various test suites to users of Web3 with different profiles and objectives in mind. The ERCx website makes testing and obtaining a report about a token, a one-click operation. Our API facilitates integration with different service platforms. Our VS code plugin brings our tests to developers in a manner that is familiar to them.

Coming up Next: Bug-Detection Capabilities and Community Benchmark

In the following blog, we will dive into the bug-detection capabilities of the tools introduced in this blog post. In particular, we partnered with Certora to demonstrate how we used the Gambit tool developed by Certora to define a bug-detection score, allowing us to objectively compare the tools from a bug-detection perspective and provide a reference benchmark for the community.