Back to blog
Blog

A Small Performance Fix That Turned Into Test-Case Hell

29 June 2026

Under the hood

go53.eu

A small performance fix that turned into test-case hell

go53 issue #45 was tagged "Low priority", and it looked the part: stop grabbing a lock every time we read the live config on the DNS hot path. The fix itself was small and standard. The cleanup it set off across the test suite turned out to be the larger half of the job — which is the part worth writing down.

The problem: a tiny lock, taken a lot

go53 keeps its runtime configuration in a single LiveConfig snapshot. Every reader went through GetLive(), which took a read lock and returned a copy:

func (cm *ConfigManager) GetLive() LiveConfig {
    cm.mu.RLock()
    defer cm.mu.RUnlock()
    return cm.Live
}

Individually cheap. The catch is that a single DNS query calls it several times — DNSSEC checks, EDNS sizing, NSID, rate-limiting — and an RWMutex read lock isn't free under load. Its internal reader counter is a shared, atomically-updated cache line. Under many concurrent queries, every reader contends on that one line, and the "read lock" quietly becomes a scalability ceiling.

The fix: read-mostly wants an atomic pointer

Config is written rarely and read constantly — the textbook case for a lock-free copy-on-write swap. We moved the snapshot behind an atomic.Pointer[LiveConfig]. Readers do a single atomic load and a value copy; no mutex, no contention. Writers serialize on a dedicated mutex and publish a brand-new snapshot, so anything GetLive() hands out is immutable once it's seen.

type ConfigManager struct {
    Base    BaseConfig
    writeMu sync.Mutex                 // serializes writers (copy-on-write)
    live    atomic.Pointer[LiveConfig] // lock-free reads
}

func (cm *ConfigManager) GetLive() LiveConfig {
    if p := cm.live.Load(); p != nil {
        return *p
    }
    return LiveConfig{}
}

While we were there, we also stopped re-fetching the snapshot mid-request: handleRequest now reads live once and reuses it. Fewer reads, and the reads that remain are nearly free.

Did it help? Yes — and it scales the right way

A micro-benchmark of the two patterns with the real LiveConfig struct (20-core machine, zero allocations either way):

Scenario RWMutex (old) atomic.Pointer (new) Speedup
Serial, no contention25.2 ns/op13.3 ns/op~1.9×
Parallel, 4 cores45.0 ns/op4.4 ns/op~10×
Parallel, 20 cores49.4 ns/op2.6 ns/op~19×

The shape matters more than any single number. The RWMutex got slower as we added cores (27 ns → 49 ns) — that is the cache line bouncing between CPUs. The atomic version got faster per op (13.7 ns → 2.6 ns), because independent reads have nothing to contend over. Negative scaling became near-linear scaling.

A note on scale: these are nanoseconds, and a real DNS query spends far longer parsing packets and looking up zones. This won't double your QPS. What it removes is a contention point that only shows up under heavy concurrency — which is exactly when you'd rather not have one.

…and then the test suite had opinions

This is where the "Low priority" label started to look optimistic. The old config exposed a public Live field, and 180 lines across 20 test files in 8 packages had been reaching in and setting it directly:

config.AppConfig.Live.DNSSECEnabled = false   // no longer compiles

You can't set a field through an atomic.Pointer, so every one of those call sites had to move to GetLive(), a new SetLive(), or a clearly-labelled, test-only LiveForTest() helper. Five struct literals that set Live: directly had to be unrolled by hand. A one-line production change pulled twenty test files along with it.

Running the suite under Go's race detector also surfaced a data race in the DNSSEC tests — background signing goroutines reading config while a test mutated it. We checked the previous commit and confirmed the race was already present, so this change didn't introduce it; it just made it easy to see. It's logged for a separate follow-up.

Takeaways

  • Read-mostly state suits atomic pointers: lock-free reads, immutable snapshots, copy-on-write writes — faster and safer at once.
  • Benchmark the shape, not the spot: the win here isn't a few nanoseconds, it's a curve that bends the right way as cores grow.
  • "Low priority" can be misleading: the implementation took minutes; the blast radius into the test suite took the rest of the afternoon. Encapsulation deferred is encapsulation billed later.

A small fix, a clear result, and a useful reminder about where the real cost of a change lives. Explore go53 →