Assert with Retry

November 11, 2024 · 3 min read

Besides asserting the internal states or values in memory, we sometimes assert values impacted by external systems in a test. These can become a source of flakiness as external systems are inherently unpredictable. Examples of such assertions include:

Verifying that a file has been created
Checking if a specific line of text has been written to a file
Confirming that a local database is updated with specific values
Ensuring that a process is either running or has terminated

These assertions could fail for a variety of reasons. A simple one is that it might take longer than expected for the system to perform I/O-related operations, such as spawning a process or writing to a file. The mismatch between test code execution speed and system performance means an assertion might trigger before the system has had time to respond to the action within a test.

A straightforward but imperfect solution is adding timeouts. However, knowing the exact timing to wait or pause is more of a guessing game. Too short, and tests fail prematurely; too long, and overall execution time suffers. In parameterized tests, excessive timeouts can exponentially slow down test suites.

An intuitive and more flexible solution is to wrap assertions in a retry mechanism. Instead of a single wait -> check cycle, perform the check periodically at short, frequent intervals until the assertion passes. For instance, instead of waiting 5 seconds and then checking that a file has been created at a certain location, perform a check every second until the file is found. Note that this doesn't mean we should omit timeouts altogether. A catch-all timeout, like 5, 10, or 30 seconds, is still needed to ensure we don’t continue the check indefinitely.

Assert with retry can be implemented simply by creating an assertion function that takes in configurable parameters, such as:

A function to call until it returns a boolean value
A specified interval at which the function will be called
A timeout for terminating the check
A custom error message to raise if the timeout is reached

def assert_with_retry(func, interval=1, timeout=10, error_message="Assertion failed"):
    start_time = time.time()
    while True:
        if func():
            return
        if time.time() - start_time > timeout:
            raise AssertionError(error_message)
        time.sleep(interval)

This retry mechanism can also be adapted with back-off strategies, like Fibonacci or exponential back-off, which may improve time/compute-efficiency in certain cases.

To conclude, next time you’re tempted to add a timeout before an assertion in a test, consider wrapping the assertion with a retry mechanism for better results: if the condition is met quickly, you save time; if it isn’t, the test will fail within the same timeout limit. This approach optimizes test reliability with minimal impact on performance.