Go Regex You Should Know in Golang

Go Regex: Why Safety Beats Speed — Understanding the Thompson NFA Design

image.png|300

1. Introduction

You might have noticed that Go’s regexp package is often benchmarked as 30x slower than Zig’s regex engine, or even slower than Python in certain workloads. This seems alarming for a language celebrated for its performance. But once you understand why Go made this choice, you’ll realize it’s one of the most deliberate and defensible decisions in the entire standard library.

In this article, we’ll explore the internals of Go’s regex engine, the algorithm it’s built on, and why the performance trade-off is not a bug — it’s the feature.

2. The Algorithm: Thompson NFA vs. Backtracking

Most popular regex engines — PCRE, Python’s re, Perl — use a backtracking depth-first search approach. When a match fails, the engine backtracks to a previous state and tries an alternative path. This is powerful and expressive, but it has a critical flaw: worst-case complexity is O(2^n), where n is the length of the input string.

Go, on the other hand, uses a Thompson NFA (Nondeterministic Finite Automaton) approach, which guarantees O(n) linear time complexity regardless of the input pattern or data.

Here’s what this means in practice. Consider matching the pattern (a+)+ against a string like "aaaaaaaaab". A backtracking engine will explore an exponential number of paths before concluding there’s no match. A Thompson NFA walks through the input exactly once.

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package main

import (
"fmt"
"regexp"
"time"
)

func main() {
// This pattern is catastrophic for backtracking engines
re := regexp.MustCompile(`(a+)+b`)
input := "aaaaaaaaaaaaaaaaaaaaac" // no 'b' at end

start := time.Now()
matched := re.MatchString(input)
fmt.Printf("matched: %v, elapsed: %v\n", matched, time.Since(start))
// Output: matched: false, elapsed: ~microseconds (always linear)
}

Run it: Better Go Playground

From this output, we can observe:

  1. Go’s regex returns nearly instantly, regardless of input length.
  2. A PCRE-based engine on the same pattern would hang for seconds or minutes on long inputs.
  3. Linear time is guaranteed — there is no “bad” input for Go’s regex.

3. Why This Matters: The ReDoS Threat

ReDoS (Regular Expression Denial of Service) is a real-world attack vector. An attacker crafts input specifically designed to trigger catastrophic backtracking in a regex engine, effectively hanging the server.

The most notorious example: in 2019, a single misconfigured regex caused a Cloudflare outage that took 80% of their proxied traffic offline for 27 minutes. The pattern (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\|-|+)+[)];?((?:\s|-|~|!|{}||||+).(?:.=.*))` triggered catastrophic backtracking on certain inputs.

Go simply cannot be affected by this class of vulnerability. The Thompson NFA approach makes ReDoS structurally impossible.

4. The Performance Costs

Choosing safety over peak performance does come with real costs. Go’s regex engine is slower than PCRE for three specific reasons:

1. Pure Go implementation.
Go’s regexp is written entirely in Go, without the decades of C-level micro-optimizations that PCRE has accumulated. There’s no SIMD intrinsics or assembly-level tricks by default.

2. UTF-8 parsing overhead.
Go treats strings as UTF-8 natively. The regex engine frequently decodes bytes into rune units for Unicode-aware matching. This adds substantial overhead compared to byte-level processing.

3. NFA state queue management.
The Thompson NFA must maintain a set of active states at each input position. This involves memory allocation and slice operations that add up across large inputs.

Here’s a simple benchmark to see the difference:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package main

import (
"regexp"
"testing"
)

var re = regexp.MustCompile(`\b\w+\b`)
var input = []byte("The quick brown fox jumps over the lazy dog")

func BenchmarkFindAll(b *testing.B) {
for i := 0; i < b.N; i++ {
re.FindAll(input, -1)
}
}
// run
// BenchmarkFindAll-8 500000 2341 ns/op 480 B/op 1 allocs/op

5. When You Need Raw Speed

If your application genuinely requires PCRE-level performance and you can guarantee your inputs are not adversarial, Go does offer an escape hatch. The golang.org/x/exp/regexp package, and third-party packages like github.com/dlclark/regexp2, expose PCRE-compatible engines.

Note that using these trades away the ReDoS safety guarantee. Only do this when:

  • The regex patterns are fully controlled by you (not user-supplied)
  • The inputs are bounded in size
  • Performance profiling has confirmed regex as a bottleneck

6. The Design Philosophy

Go’s choice here reflects a broader principle woven throughout the language:

Don’t chase peak performance ceilings — protect the safety floor.

This same philosophy appears in goroutine stack growth (safe, not maximally fast), the garbage collector (predictable latency, not zero cost), and the memory model (defined behavior, even for data races that would be UB in C).

The Go team made a conscious decision: a program that takes 30 microseconds instead of 1 microsecond for a regex match is acceptable. A program that hangs for 27 minutes and takes down a global CDN is not.

7. Conclusion

Go’s regex engine is not slow because of poor engineering — it’s slow because it chose a fundamentally safer algorithm. The Thompson NFA guarantees linear-time matching at the cost of some peak throughput. For the vast majority of server-side applications, this trade-off is exactly right.

Have you encountered performance issues with Go’s regex in production? Have you ever had to reach for a PCRE-compatible alternative? Share your experience in the comments!


More in the “You Should Know In Golang” series:
https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a

go:fix inline You Should Know in Golang Stack Allocation Optimizations You Should Know in Golang

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×