Skip to main content

Go WARNING DATA RACE despite atomic operations

· 6 min read

The sync/atomic Package

The sync/atomic package provides utilities for atomic operations, such as integer addition. It is sometimes used to guard mutable states when accessed by multiple goroutines. However, atomic operations, when used without care, do not prevent data races. It can be detected using the -race flag when running tests. Let's illustrate this with an example.

Example: The Book Struct

package book

type Book struct {
Title string
CurrentPage int32
}

func newBook(title string) *Book {
return &Book{Title: title}
}

// OnPage returns the current page of the book
func (b *Book) OnPage() int32 {
return b.CurrentPage
}

// TurnPage will increment the CurrentPage by 1
func (b *Book) TurnPage() {
b.CurrentPage++
}

// SkipFirstPage will increment the CurrentPage by 2 if the CurrentPage is 0
func (b *Book) SkipFirstPage() {
if b.CurrentPage == 0 {
b.CurrentPage+=2
}
}

The Book struct contains CurrentPage, which tracks the current page. The OnPage method reads the state, while TurnPage and SkipFirstPage modify it.

When accessed concurrently by multiple goroutines, a data race can occur.

Example: Failing Test

import (
"fmt"
"sync"
"testing"
)

func TestRaceConditionBookNoAtomicNoLock(t *testing.T) {
book := newBook("Atomic Habits")

wg := sync.WaitGroup{}
wg.Add(1)

go func() {
defer wg.Done()
book.TurnPage()
}()

fmt.Printf("Current page: %d\n", book.OnPage())
wg.Wait()

if book.OnPage() != 1 {
t.Error("Expected page to be 1, Actual page is ", book.OnPage())
}
}

Running the test with go test -race will produce a race condition warning:

--- FAIL: TestRaceConditionBookNoAtomicNoLock (0.00s)
testing.go:1490: race detected during execution of test
FAIL
exit status 1

Understanding the Data Race Warning

Let's understand the key parts of a data race warning:

  • WARNING: DATA RACE
    • There is a conflicting read-write operation (can also be write-read)
  • Conflicting read/write operations:
    • Write at 0x... by goroutine X
    • Previous read at 0x... by goroutine Y
      • 0x... is the troubled memory location
      • Location of the troubled code is typically revealed here
  • Stack traces of both goroutines
  • Name of the failed test due to data race

The full log:

==================
WARNING: DATA RACE
Write at 0x00c000010670 by goroutine 8:
github.com/tlylt/learn-go/book.(*Book).TurnPage()
/Users/yong/Documents/GitHub/learn-go/book/book.go:37 +0x98
github.com/tlylt/learn-go/book.TestRaceConditionBookNoAtomicNoLock.func1()
/Users/yong/Documents/GitHub/learn-go/book/book_test.go:27 +0x80

Previous read at 0x00c000010670 by goroutine 7:
github.com/tlylt/learn-go/book.(*Book).OnPage()
/Users/yong/Documents/GitHub/learn-go/book/book.go:21 +0x17c
github.com/tlylt/learn-go/book.TestRaceConditionBookNoAtomicNoLock()
/Users/yong/Documents/GitHub/learn-go/book/book_test.go:30 +0x188
testing.tRunner()
/usr/local/go/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/usr/local/go/src/testing/testing.go:1851 +0x40

Goroutine 8 (running) created at:
github.com/tlylt/learn-go/book.TestRaceConditionBookNoAtomicNoLock()
/Users/yong/Documents/GitHub/learn-go/book/book_test.go:25 +0x16c
testing.tRunner()
/usr/local/go/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/usr/local/go/src/testing/testing.go:1851 +0x40

Goroutine 7 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:1851 +0x684
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:2279 +0x7c
testing.tRunner()
/usr/local/go/src/testing/testing.go:1792 +0x180
testing.runTests()
/usr/local/go/src/testing/testing.go:2277 +0x77c
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:2142 +0xb68
main.main()
_testmain.go:51 +0x110
==================
--- FAIL: TestRaceConditionBookNoAtomicNoLock (0.00s)
testing.go:1490: race detected during execution of test
FAIL
exit status 1

Let's interpret the test failure.

The example test is failing because the CurrentPage state at 0x00c000010670 is being concurrently operated by the write from TurnPage() and the read from OnPage().

Fixing the Data Race with Atomic Operations

Atomic operations ensure safe concurrent access:

func (b *Book) OnPageAtomic() int32 {
return atomic.LoadInt32(&b.CurrentPage)
}

func (b *Book) TurnPageAtomic() {
atomic.AddInt32(&b.CurrentPage, 1)
}

Issues with Conditional Operations

When operations conditionally modify a mutable state, they can become a source of errors if atomic operations are not used consistently. For example, consider a function designed to skip the first page only if the book hasn't been read. Here is an incorrect implementation:

func (b *Book) SkipFirstPage() {
if b.CurrentPage == 0 {
atomic.AddInt32(&b.CurrentPage, 2)
}
}

The conditional check on b.CurrentPage is not atomic, leading to a race condition.

The test:

func TestRaceConditionBookWithAtomic(t *testing.T) {
book := newBook("Atomic Habits")

wg := sync.WaitGroup{}
wg.Add(1)

go func() {
defer wg.Done()
book.SkipFirstPage()
fmt.Printf("After SkipFirstPage: page: %d\n", book.OnPageAtomic())
}()

book.TurnPageAtomic()
fmt.Printf("After TurnPageAtomic: page: %d\n", book.OnPageAtomic())
fmt.Println(book.OnPageAtomic())

wg.Wait()
fmt.Printf("End of test: page: %d\n", book.OnPageAtomic())
if book.OnPageAtomic() != 1 {
t.Error("Expected page to be 1, Actual page is ", book.OnPageAtomic())
}
}

A race condition warning will be shown. In summary, a conflict occurred:

  • Read at 0x00c00019a040 by goroutine 11: github.com/tlylt/learn-go/book.(*Book).SkipFirstPage()
  • Previous write at 0x00c00019a040 by goroutine 10: sync/atomic.AddInt32()
  • The SkipFirstPage method has a if b.CurrentPage == 0 { that read the mutable stage without an atomic guard.

To fix it, we can also load it atomically.

func (b *Book) SkipFirstPageAtomic() {
if atomic.LoadInt32(&b.CurrentPage) == 0 {
atomic.AddInt32(&b.CurrentPage, 2)
}
}

Note that while the above does fix the data race, it is not necessarily logically sound. This is because the read and update are done as two atomic operations, meaning the state could still be mutated after the load and right before the addition. If so, the result may become unexpected. To make both operations atomic, a mutex/lock could be used to ensure that no other goroutines can access the state while executing the function.

Flakiness Despite Atomic Operations

Even with atomic operations, test behavior can be unpredictable due to scheduling differences. Adding a delay introduces flakiness:

  • time.Sleep(1 * time.Microsecond) is added
func TestRaceConditionBookWithAtomic(t *testing.T) {
book := newBook("Atomic Habits")

wg := sync.WaitGroup{}
wg.Add(1)

go func() {
defer wg.Done()
book.SkipFirstPage()
fmt.Printf("After SkipFirstPage: page: %d\n", book.OnPageAtomic())
}()

time.Sleep(1 * time.Microsecond)
book.TurnPageAtomic()
fmt.Printf("After TurnPageAtomic: page: %d\n", book.OnPageAtomic())
fmt.Println(book.OnPageAtomic())

wg.Wait()
fmt.Printf("End of test: page: %d\n", book.OnPageAtomic())
if book.OnPageAtomic() != 1 {
t.Error("Expected page to be 1, Actual page is ", book.OnPageAtomic())
}
}

With time.sleep before TurnPageAtomic, it is now difficult to determine which operation (SkipFirstPage or TurnPageAtomic) will execute first. Additionally, since SkipFirstPage will only mutate the page number if it is 0, the result can be either 1 or 3. The test may randomly pass or fail because the execution order is not guaranteed.

Using Locks Instead of Atomic Operations

Locks can also prevent data races:

package book

import (
"sync"
)

type Book struct {
Title string
CurrentPage int32
lock sync.RWMutex
}

func (b *Book) OnPageLock() int32 {
b.lock.RLock()
defer b.lock.RUnlock()
return b.CurrentPage
}

func (b *Book) TurnPageLock() {
b.lock.Lock()
defer b.lock.Unlock()
b.CurrentPage++
}

func (b *Book) SkipFirstPageLock() {
b.lock.Lock()
defer b.lock.Unlock()
if b.CurrentPage == 0 {
b.CurrentPage += 2
}
}

However, using a read lock incorrectly can still introduce data races:

func (b *Book) SkipFirstPageLock() {
b.lock.RLock()
defer b.lock.RUnlock()
if b.CurrentPage == 0 {
b.CurrentPage += 2
}
}

Here, a read lock is used, but the function modifies CurrentPage, leading to a data race.

Summary

  1. Use atomic operations carefully, ensuring all reads and writes are guarded.
  2. Even with atomic operations, test flakiness can occur due to goroutine scheduling.
  3. Be cautious with locks; incorrect usage can still introduce race conditions.

Reference

The code discussed is available in my repository: