Backpressure rarely shows up in textbooks — but it is an integral part of enterprise production systems.

In Part I of the Mastering Backpressure in Go Series we explored why unbounded concurrency quietly breaks otherwise correct Go systems, and why backpressure must be treated as a design requirement, not an optimization for enterprise software.
Part II focuses on a more specific problem:
How do you strictly bound in‑flight work, block producers at that limit, and still guarantee a clean, deterministic drain on shutdown — without sacrificing throughput when load is present?
This turns out to be a distinct concurrency problem, and one that most common Go primitives aren’t designed to handle directly.
The Problem: Bounded Work Under Sustained Pressure
Consider an AMQP stream consumer.
Messages are delivered asynchronously via callbacks. The broker continues streaming data as long as the consumer keeps reading, and acknowledgements are decoupled from delivery. There is no built‑in mechanism to limit how many messages may be in flight at once.
That makes flow control entirely the responsibility of the consumer.
If message delivery outpaces downstream processing, in‑flight work grows silently. The consumer may continue reading from the stream and buffering internally, appearing healthy while memory pressure increases and shutdown behavior becomes increasingly difficult to sort out.
What the system needs is structure that provides the following:
- Blocking at a ceiling, limiting in‑flight work
- Immediate reuse of freed capacity as work completes
- Runtime adjustment of that ceiling
- (Optionally) deterministic drain to zero during shutdown
This is not just about limiting concurrency. It’s about controlling pressure.
Without a hard boundary, overload never propagates outward. It accumulates internally instead, where it’s hardest to observe.
The Core Idea: Enforce Capacity and Maintain It
BlockingLatch is designed to do two things at once: enforce a hard ceiling so load never exceeds the system’s capacity and sustain constant pressure at that ceiling whenever demand exists.
That distinction is important.
The goal isn’t to smooth bursts or absorb spikes. The goal is to make overload visible immediately and keep the system operating at its intended limit for as long as work is available.
Once capacity frees up, new work should proceed immediately. No lag. No idle time.
Channels are the Obvious Choice
Buffered channels are the typical way to limit concurrency in Go. With a fixed capacity, sends block when the channel is full and unblocks as work completes. For systems with a static capacity, that can work well.
The difficulty starts when capacity is not static. In some situations, the level of in-flight work changes at runtime. As things change within a system, we may want to raise or lower pressure in response to load, resource availability, or downstream health without restarting the process.
Channel capacity is fixed at creation time, and changing it means introducing a second channel, draining the first, and carefully coordinating the handoff while work is still flowing. At that point, the channel alone can’t meet our requirements.
Channels still do exactly what they were designed to do. They just aren’t the best fit when the ceiling itself needs to be adjusted while pressure is present.
Sync Package to the Rescue
A sync.Cond lets goroutines wait until a shared condition becomes true, without spinning. It’s always paired with a lock that protects the underlying state.
lock := new(sync.Mutex)
cond := sync.NewCond(lock)
Key behavior:
Wait() atomically releases the lock and suspends execution
Signal() wakes one waiting goroutine
Broadcast() wakes all waiting goroutines
Correct usage always checks the condition in a loop:
lock.Lock()
for !conditionIsTrue() {
cond.Wait()
}
// condition holds
lock.Unlock()
This pattern allows multiple goroutines to sleep on the same invariant, wake selectively, and re‑validate safety conditions after resuming.
That capability maps exactly to our situation.
BlockingLatch: Making Capacity Explicit
When capacity changes at runtime, it can’t be a static property of a data structure. BlockingLatch is a small abstraction from the Arke codebase that treats capacity as state, not structure.
type BlockingLatch struct {
count uint
max uint
lock *sync.Mutex
notMax *sync.Cond
}
func (l *BlockingLatch) Increment()
func (l *BlockingLatch) Decrement()
func (l *BlockingLatch) SetMax(max uint)
func (l *BlockingLatch) WaitForEmpty()We store count and max together behind a single lock, and every producer checks the same invariant before proceeding. When SetMax is called, blocked producers wake and re‑evaluate against the new limit. No buffers are replaced. No channels are swapped.
Capacity is no longer inferred from channels or buffers. Producers block because the system is at its limit, not because a queue happens to be full. In Arke, this keeps backpressure precise even as limits change in response to downstream health or available resources.
All coordination flows through one rule: in‑flight work never exceeds the current ceiling. At the core of this implementation is sync.Cond.
Real‑World Use: AMQP Stream Prefetch
Arke uses BlockingLatch to enforce broker prefetch exactly:
latch := util.NewBlockingLatch(uint(source.GetPrefetchCount()))
handleMessages := func(ctx stream.ConsumerContext, message *amqp.Message) {
latch.Increment() // blocks at the ceiling
messageChannel <- message
stm.Ack = func() { latch.Decrement() }
stm.Nack = func() { latch.Decrement() }
}
This ensures that no more than prefetchCount messages are ever in flight, and that capacity is reused immediately as acknowledgements arrive.
Backpressure propagates to the broker boundary instead of being absorbed internally.
When This Pattern Fits
This pattern fits anywhere the system needs a hard limit on in‑flight work, immediate backpressure instead of buffering, and an explicit, observable notion of capacity.
It works well for worker pools and rate limiting, where concurrency must be capped while still allowing the system to wait deterministically for work to finish. Incrementing before work begins and decrementing on completion enforces the ceiling, while WaitForEmpty provides a clean synchronization point during shutdown.
The same approach applies to outbound HTTP request throttling. Bounding the number of in‑flight requests forces pressure to propagate immediately rather than accumulating in queues, and the limit can be adjusted at runtime based on upstream health or observed latency.
In batch pipelines, BlockingLatch fits naturally between a fast producer and a slower downstream stage. When the downstream stage is saturated, the producer blocks, yielding backpressure without introducing an intermediate buffer that obscures where capacity is consumed.
Finally, the pattern works well for connection or file descriptor pools. Acquiring and releasing capacity ensures the system never exceeds its footprint, while WaitForEmpty guarantees all resources are returned before exit.
In short, this approach fits systems that must remain correct under sustained pressure, where in-flight work must be provably bounded and that bound needs to change at runtime. Capacity is enforced directly, pressure block immediately, and the ceiling can be raised or lowered in response to observed conditions rather than fixed.
Closing Thought
Backpressure isn’t about controlling goroutines — it’s about enforcing boundaries.
Once work crosses a boundary, the system has accepted responsibility for it. Structures like BlockingLatch force that responsibility to surface immediately, instead of being deferred into buffers, cleanup phases, or operational surprises.
sync.Cond is often forgotten, but it’s just waiting for problems that actually demand this level of precision.
The BlockingLatch implementation discussed here is part of the open‑source Arke project:
https://github.com/sassoftware/arke
Mastering Backpressure in Go (Part II) was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.