Go Data Race Pop Quiz Analyzed
Context
I recently wrote an article on data races. Today, I came across Dave Cheney's post, Wednesday pop quiz: spot the race, and I wanted to apply what I learned to analyze the example given.
The Program with a Data Race
package main
import (
"fmt"
"time"
)
type RPC struct {
result int
done chan struct{}
}
func (rpc *RPC) compute() {
time.Sleep(time.Second) // strenuous computation intensifies
rpc.result = 42
close(rpc.done)
}
func (RPC) version() int {
return 1 // never going to need to change this
}
func main() {
rpc := &RPC{done: make(chan struct{})}
go rpc.compute() // kick off computation in the background
version := rpc.version() // grab some other information while we're waiting
<-rpc.done // wait for computation to finish
result := rpc.result
fmt.Printf("RPC computation complete, result: %d, version: %d\n", result, version)
}
Investigation
Since the program is short, we can quickly identify that the suspicious part is the modification of the result
field and the invocation of the version
method. Let's dive deeper.
Write Operation
The compute
method waits for a second before writing to the result
field. It then closes the done
channel to signal completion.
Read Operation
The statement result := rpc.result
is a read operation, fetching the result
value from the struct.
The version
Method
The version
method is interesting because it does not read or write to the struct. However, since it is a method with a value receiver, a copy of the struct is created when the method is called—this is crucial.
Goroutine and Channel
The go rpc.compute()
statement launches the compute
function in a separate goroutine, while <-rpc.done
ensures the main thread waits until the done
channel is closed.
Quick Answer Revealed
The data race occurs due to a conflict between the write operation (rpc.result = 42
in compute
) and the read operation (struct copying when calling version
). These two operations run in different goroutines and may interfere with each other.
- The solution is simple:
func (*RPC) version() int {
(turn it into a pointer receiver).
The reason why result := rpc.result
does not cause a conflict is that it occurs only after <-rpc.done
, ensuring that the write in compute
completes before the read in the main goroutine.
We can confirm the conflicting data by inspecting the logs.
Running go test data-race/race_test.go -race
in my repo produces the following output:
==================
WARNING: DATA RACE
Write at 0x00c000028250 by goroutine 7:
command-line-arguments.(*RPC).compute()
/Users/yong/Documents/GitHub/learn-go/data-race/race_test.go:16 +0x44
...
Previous read at 0x00c000028250 by goroutine 6:
command-line-arguments.TestRace()
/Users/yong/Documents/GitHub/learn-go/data-race/race_test.go:28 +0x120
...
Goroutine 7 (running) created at:
command-line-arguments.TestRace()
/Users/yong/Documents/GitHub/learn-go/data-race/race_test.go:27 +0x114
...
Goroutine 6 (running) created at:
...
main.main()
_testmain.go:45 +0x110
==================
RPC computation complete, result: 42, version: 1
--- FAIL: TestRace (1.00s)
testing.go:1490: race detected during execution of test
As explained in my previous article, we can now interpret this log easily:
- There is a write-read conflict.
- The write occurs in
compute()
, modifyingresult
. - The read occurs in
race_test.go:28
, the line invokingversion()
.
What does GPT say?
I tested GPT by asking it why the data race occurred. It incorrectly pointed to result := rpc.result
as the issue and suggested adding a mutex:
- see code in learn-go/data-race/race2_test.go
package race
import (
"fmt"
"sync"
"testing"
"time"
)
type RPCWithMutex struct {
result int
done chan struct{}
mu sync.Mutex
}
func (rpc *RPCWithMutex) compute() {
time.Sleep(time.Second) // strenuous computation intensifies
rpc.result = 42
close(rpc.done)
}
func (RPCWithMutex) version() int {
return 1 // never going to need to change this
}
func TestRaceRPCWithMutex(t *testing.T) {
rpc := &RPCWithMutex{done: make(chan struct{})}
go rpc.compute() // kick off computation in the background
version := rpc.version() // grab some other information while we're waiting
<-rpc.done // wait for computation to finish
rpc.mu.Lock()
result := rpc.result
rpc.mu.Unlock()
fmt.Printf("RPC computation complete, result: %d, version: %d\n", result, version)
}
While this approach is incorrect (it still results in the same race condition), an IDE may now issue a warning:
version passes lock by value: github.com/tlylt/learn-go/data-race.RPCWithMutex contains sync.Mutexcopylocksdefault
This warning suggests that version
creates a copy of the struct, which includes the mutex—a bad practice.
Without Waiting on Done
If we want to trigger a data race between compute
and result := rpc.result
, we can remove the wait, allowing the read and write operations to potentially conflict:
package race
import (
"fmt"
"testing"
"time"
)
type RPCNoWait struct {
result int
done chan struct{}
}
func (rpc *RPCNoWait) compute() {
time.Sleep(time.Second) // strenuous computation intensifies
rpc.result = 42
close(rpc.done)
}
func (RPCNoWait) version() int {
return 1 // never going to need to change this
}
func TestRaceRPCNoWait(t *testing.T) {
rpc := &RPCNoWait{done: make(chan struct{})}
go rpc.compute() // kick off computation in the background
// version := rpc.version() // grab some other information while we're waiting
// <-rpc.done // wait for computation to finish
result := rpc.result
fmt.Printf("RPC computation complete, result: %d\n", result)
}
This produces a different log indicating a read-write conflict.
RPC computation complete, result: 0
PASS
==================
WARNING: DATA RACE
Write at 0x00c000118210 by goroutine 7:
command-line-arguments.(*RPCNoWait).compute()
/Users/yong/Documents/GitHub/learn-go/data-race/race3_test.go:16 +0x44
...
Previous read at 0x00c000118210 by goroutine 6:
command-line-arguments.TestRaceRPCNoWait()
/Users/yong/Documents/GitHub/learn-go/data-race/race3_test.go:31 +0x114
...
Goroutine 7 (running) created at:
command-line-arguments.TestRaceRPCNoWait()
/Users/yong/Documents/GitHub/learn-go/data-race/race3_test.go:27 +0x10c
...
Goroutine 6 (finished) created at:
testing.(*T).Run()
...
main.main()
_testmain.go:45 +0x110
==================
Found 1 data race(s)
This confirms that removing <-rpc.done
allows a race condition to occur between compute
and the direct read of rpc.result
.