Go WARNING DATA RACE despite atomic operations
The sync/atomic
Package
The sync/atomic
package provides utilities for atomic operations, such as integer addition. It is sometimes used to guard mutable states when accessed by multiple goroutines. However, atomic operations, when used without care, do not prevent data races. It can be detected using the -race
flag when running tests. Let's illustrate this with an example.
Example: The Book
Struct
package book
type Book struct {
Title string
CurrentPage int32
}
func newBook(title string) *Book {
return &Book{Title: title}
}
// OnPage returns the current page of the book
func (b *Book) OnPage() int32 {
return b.CurrentPage
}
// TurnPage will increment the CurrentPage by 1
func (b *Book) TurnPage() {
b.CurrentPage++
}
// SkipFirstPage will increment the CurrentPage by 2 if the CurrentPage is 0
func (b *Book) SkipFirstPage() {
if b.CurrentPage == 0 {
b.CurrentPage+=2
}
}
The Book
struct contains CurrentPage
, which tracks the current page. The OnPage
method reads the state, while TurnPage
and SkipFirstPage
modify it.
When accessed concurrently by multiple goroutines, a data race can occur.
Example: Failing Test
import (
"fmt"
"sync"
"testing"
)
func TestRaceConditionBookNoAtomicNoLock(t *testing.T) {
book := newBook("Atomic Habits")
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
book.TurnPage()
}()
fmt.Printf("Current page: %d\n", book.OnPage())
wg.Wait()
if book.OnPage() != 1 {
t.Error("Expected page to be 1, Actual page is ", book.OnPage())
}
}
Running the test with go test -race
will produce a race condition warning:
--- FAIL: TestRaceConditionBookNoAtomicNoLock (0.00s)
testing.go:1490: race detected during execution of test
FAIL
exit status 1
Understanding the Data Race Warning
Let's understand the key parts of a data race warning:
WARNING: DATA RACE
- There is a conflicting read-write operation (can also be write-read)
- Conflicting read/write operations:
Write at 0x... by goroutine X
Previous read at 0x... by goroutine Y
0x...
is the troubled memory location- Location of the troubled code is typically revealed here
- Stack traces of both goroutines
- Name of the failed test due to data race
The full log:
==================
WARNING: DATA RACE
Write at 0x00c000010670 by goroutine 8:
github.com/tlylt/learn-go/book.(*Book).TurnPage()
/Users/yong/Documents/GitHub/learn-go/book/book.go:37 +0x98
github.com/tlylt/learn-go/book.TestRaceConditionBookNoAtomicNoLock.func1()
/Users/yong/Documents/GitHub/learn-go/book/book_test.go:27 +0x80
Previous read at 0x00c000010670 by goroutine 7:
github.com/tlylt/learn-go/book.(*Book).OnPage()
/Users/yong/Documents/GitHub/learn-go/book/book.go:21 +0x17c
github.com/tlylt/learn-go/book.TestRaceConditionBookNoAtomicNoLock()
/Users/yong/Documents/GitHub/learn-go/book/book_test.go:30 +0x188
testing.tRunner()
/usr/local/go/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/usr/local/go/src/testing/testing.go:1851 +0x40
Goroutine 8 (running) created at:
github.com/tlylt/learn-go/book.TestRaceConditionBookNoAtomicNoLock()
/Users/yong/Documents/GitHub/learn-go/book/book_test.go:25 +0x16c
testing.tRunner()
/usr/local/go/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/usr/local/go/src/testing/testing.go:1851 +0x40
Goroutine 7 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:1851 +0x684
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:2279 +0x7c
testing.tRunner()
/usr/local/go/src/testing/testing.go:1792 +0x180
testing.runTests()
/usr/local/go/src/testing/testing.go:2277 +0x77c
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:2142 +0xb68
main.main()
_testmain.go:51 +0x110
==================
--- FAIL: TestRaceConditionBookNoAtomicNoLock (0.00s)
testing.go:1490: race detected during execution of test
FAIL
exit status 1
Let's interpret the test failure.
The example test is failing because the CurrentPage
state at 0x00c000010670
is being concurrently operated by the write from TurnPage()
and the read from OnPage()
.
Fixing the Data Race with Atomic Operations
Atomic operations ensure safe concurrent access:
func (b *Book) OnPageAtomic() int32 {
return atomic.LoadInt32(&b.CurrentPage)
}
func (b *Book) TurnPageAtomic() {
atomic.AddInt32(&b.CurrentPage, 1)
}
Issues with Conditional Operations
When operations conditionally modify a mutable state, they can become a source of errors if atomic operations are not used consistently. For example, consider a function designed to skip the first page only if the book hasn't been read. Here is an incorrect implementation:
func (b *Book) SkipFirstPage() {
if b.CurrentPage == 0 {
atomic.AddInt32(&b.CurrentPage, 2)
}
}
The conditional check on b.CurrentPage
is not atomic, leading to a race condition.
The test:
func TestRaceConditionBookWithAtomic(t *testing.T) {
book := newBook("Atomic Habits")
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
book.SkipFirstPage()
fmt.Printf("After SkipFirstPage: page: %d\n", book.OnPageAtomic())
}()
book.TurnPageAtomic()
fmt.Printf("After TurnPageAtomic: page: %d\n", book.OnPageAtomic())
fmt.Println(book.OnPageAtomic())
wg.Wait()
fmt.Printf("End of test: page: %d\n", book.OnPageAtomic())
if book.OnPageAtomic() != 1 {
t.Error("Expected page to be 1, Actual page is ", book.OnPageAtomic())
}
}
A race condition warning will be shown. In summary, a conflict occurred:
- Read at
0x00c00019a040
by goroutine 11:github.com/tlylt/learn-go/book.(*Book).SkipFirstPage()
- Previous write at
0x00c00019a040
by goroutine 10:sync/atomic.AddInt32()
- The
SkipFirstPage
method has aif b.CurrentPage == 0 {
that read the mutable stage without an atomic guard.
To fix it, we can also load it atomically.
func (b *Book) SkipFirstPageAtomic() {
if atomic.LoadInt32(&b.CurrentPage) == 0 {
atomic.AddInt32(&b.CurrentPage, 2)
}
}
Note that while the above does fix the data race, it is not necessarily logically sound. This is because the read and update are done as two atomic operations, meaning the state could still be mutated after the load and right before the addition. If so, the result may become unexpected. To make both operations atomic, a mutex/lock could be used to ensure that no other goroutines can access the state while executing the function.
Flakiness Despite Atomic Operations
Even with atomic operations, test behavior can be unpredictable due to scheduling differences. Adding a delay introduces flakiness:
time.Sleep(1 * time.Microsecond)
is added
func TestRaceConditionBookWithAtomic(t *testing.T) {
book := newBook("Atomic Habits")
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
book.SkipFirstPage()
fmt.Printf("After SkipFirstPage: page: %d\n", book.OnPageAtomic())
}()
time.Sleep(1 * time.Microsecond)
book.TurnPageAtomic()
fmt.Printf("After TurnPageAtomic: page: %d\n", book.OnPageAtomic())
fmt.Println(book.OnPageAtomic())
wg.Wait()
fmt.Printf("End of test: page: %d\n", book.OnPageAtomic())
if book.OnPageAtomic() != 1 {
t.Error("Expected page to be 1, Actual page is ", book.OnPageAtomic())
}
}
With time.sleep
before TurnPageAtomic
, it is now difficult to determine which operation (SkipFirstPage
or TurnPageAtomic
) will execute first. Additionally, since SkipFirstPage
will only mutate the page number if it is 0
, the result can be either 1
or 3
. The test may randomly pass or fail because the execution order is not guaranteed.
Using Locks Instead of Atomic Operations
Locks can also prevent data races:
package book
import (
"sync"
)
type Book struct {
Title string
CurrentPage int32
lock sync.RWMutex
}
func (b *Book) OnPageLock() int32 {
b.lock.RLock()
defer b.lock.RUnlock()
return b.CurrentPage
}
func (b *Book) TurnPageLock() {
b.lock.Lock()
defer b.lock.Unlock()
b.CurrentPage++
}
func (b *Book) SkipFirstPageLock() {
b.lock.Lock()
defer b.lock.Unlock()
if b.CurrentPage == 0 {
b.CurrentPage += 2
}
}
However, using a read lock incorrectly can still introduce data races:
func (b *Book) SkipFirstPageLock() {
b.lock.RLock()
defer b.lock.RUnlock()
if b.CurrentPage == 0 {
b.CurrentPage += 2
}
}
Here, a read lock is used, but the function modifies CurrentPage
, leading to a data race.
Summary
- Use atomic operations carefully, ensuring all reads and writes are guarded.
- Even with atomic operations, test flakiness can occur due to goroutine scheduling.
- Be cautious with locks; incorrect usage can still introduce race conditions.
Reference
The code discussed is available in my repository: