What a side‑by‑side gRPC benchmark reveals about modern concurrency, scheduling, and the real cost of performance

C++ didn’t lose its edge. It didn’t suddenly become slow, unsafe, or obsolete.
What changed is that languages like Go — and increasingly Rust — got good enough that raw performance differences are often small, while the cost of building, operating, and maintaining equivalent systems in C++ remains high.
That shift matters more than most benchmarks suggest.
We benchmarked the same production gRPC proxy written twice: once in Go, once in C++. The results don’t show a clear performance winner — they show why runtime design, tail latency, and engineering economics now drive the decision more than peak throughput ever did.
The experiment
We recently open sourced Arke — a production gRPC proxy for message brokers written in Go.
The predictable response was:
“Sure, but how much faster would this be in C++?”
So we answered it directly.
We ported arke to C++, kept the .proto files identical, and ran both implementations through the same benchmark harness. No synthetic loops, no micro‑benchmarks — just real gRPC clients exercising real concurrency and I/O.
What arke does (briefly)
Arke sits between your application and a message broker (RabbitMQ, etc.) and exposes a uniform gRPC API regardless of backend.
It provides three services:
Producer — Connect, PublishOne, Publish (bidi), Disconnect
Consumer — Connect, Consume (bidi), SourceStats, Disconnect
Healthz — Check (bidi heartbeat)
The differences are strictly runtime and ecosystem:
- Go: amqp091-go, goroutines, zerolog
- C++: librabbitmq, std::thread, spdlog
Both use the same generated gRPC interfaces.
Benchmark setup (short version)
- Apple M1 Pro, localhost RabbitMQ
- grpc‑go vs grpc‑cpp
- Warm‑up runs, real clients
- Throughput plus p50 / p95 / p99 / p99.9 latency recorded
The goal wasn’t micro‑performance. It was understanding behavior under load.
Scenario 1 — Connection‑heavy paths
Producer.Connect is called on existing connections, and negotiates AMQP sessions with the broker.
What happened
- At low concurrency, Go and C++ performed similarly.
- At moderate concurrency, results fluctuated.
- At higher concurrency, Go consistently showed better tail latency.
At 10 workers (p99.9 latency):
- Go: ~1.4 ms
- C++: ~3.5 ms
With raw throughput so similar, the focus is on tail behavior. Goroutines park cooperatively. std::threads block in the kernel. Under burst load, that difference matters.
Scenario 2 — Steady‑state publishing
Producer.PublishOne represents the typical case: established connections publishing messages as fast as possible.
What happened
- C++ had a slight edge at very low concurrency.
- At ~10 workers, both saturated RabbitMQ at ~16K requests/sec.
- Beyond that point, scheduling effects dominated.
Key point:
Once the broker becomes the bottleneck, language choice largely disappears from the equation.
The saturation point is a product of the runtime environment.
Scenario 3 — Pure gRPC overhead
Healthz.Check removes the broker entirely, isolating the runtime.
What happened
- At very low concurrency, C++ was faster.
- At moderate and high concurrency, Go pulled clearly ahead on tail latency.
At 10 workers (p99.9 latency):
- Go: ~5.8 ms
- C++: ~14.6 ms
This gap exists with no I/O, minimal allocation pressure, and no GC involvement. It comes directly from scheduling and blocking behavior.
How big is the performance difference?
Smaller than it appears in isolation.
Across the benchmarks, throughput numbers were often close, peak rates frequently aligned, and measured differences commonly fell within single‑digit percentages. In many cases, both implementations reached similar limits despite very different runtime designs.
Which leads to the more consequential point.
The cost that does differ significantly
Performance gaps have narrowed, but C++ development and maintenance costs have not. Extracting marginal gains typically requires more complex concurrency, stricter lifetime management, and ongoing vigilance against subtle correctness issues. These costs don’t appear in benchmarks, but they compound over time — and when performance gains are small, they dominate the trade‑off.
What this actually shows
This benchmark isn’t saying “Go is faster than C++.”
It shows something more important:
- C++ excels when absolute control matters
- Go excels when concurrency and tail‑latency behavior dominate
- Rust targets a similar space, trading simplicity for stronger safety guarantees
Modern systems fail less often because of raw speed, and more often because of scheduling pathologies, queue buildup, and unpredictable tails.
That’s where Go — and increasingly Rust — change the equation.
Should you rewrite a C++ system?
Maybe — if the gains justify the cost.
In most systems, the performance difference is small. Throughput converges, saturation points align, and wins are often negligible.
The development and maintenance cost is not.
If C++’s added complexity isn’t buying you meaningful gains in latency, reliability, or capacity, then Go or Rust can deliver near‑equivalent performance at a much lower long‑term cost.
The takeaway
The modern trade‑off isn’t speed versus safety.
It’s marginal performance gains versus ongoing engineering cost.
C++ still defines the floor — but Go and Rust are reshaping the ceiling.
C++ Didn’t Get Slower — Go Got Better was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.