Lock-Free Queue: Building High-Performance Systems Without Locks
Mukesh Choudhary3 min read·Just now--
What is a Lock-Free Queue?
A Lock-Free Queue (LFQueue) is a concurrent data structure that allows multiple threads to push and pop elements without using locks (mutexes).
Instead of blocking threads, it uses atomic operations (like Compare-And-Swap) to ensure correctness.
Goal: high performance + low latency + no blocking
Why Lock-Free?
Traditional queues use mutex:
- Thread waits → lock acquired → work → unlock
Problems:
- Context switching
- Thread blocking
- Unpredictable latency
Lock-free approach:
- No waiting
- Threads retry instead of blocking
- Better for real-time systems
Types of Lock-Free Queues
1. SPSC (Single Producer Single Consumer)
- One producer thread
- One consumer thread
- Simplest and fastest
Used in pipelines (very common in trading)
2. MPSC (Multi Producer Single Consumer)
- Multiple producers
- One consumer
Used when many threads generate data but one processes
3. MPMC (Multi Producer Multi Consumer)
- Multiple producers + consumers
- Most complex
Used in general concurrent systems
How It Works (Concept)
- Uses atomic variables
- Avoids locks
- Uses CAS (Compare-And-Swap)
- Threads retry if conflict occurs
No thread is blocked → system keeps progressing
Simple Example (SPSC Queue — C++)
#pragma once#include <iostream>
#include <vector>
#include <atomic>#include <cstring>/// Branch prediction hints.
#define LIKELY(x) __builtin_expect(!!(x), 1)
#define UNLIKELY(x) __builtin_expect(!!(x), 0)/// Check condition and exit if not true.
inline auto ASSERT(bool cond, const std::string &msg) noexcept {
if (UNLIKELY(!cond)) {
std::cerr << "ASSERT : " << msg << std::endl;exit(EXIT_FAILURE);
}
}inline auto FATAL(const std::string &msg) noexcept {
std::cerr << "FATAL : " << msg << std::endl;exit(EXIT_FAILURE);
}namespace Common {
template<typename T>
class LFQueue final {
public:
explicit LFQueue(std::size_t num_elems) :
store_(num_elems, T()) /* pre-allocation of vector storage. */ {
}auto getNextToWriteTo() noexcept {
return &store_[next_write_index_];
}auto updateWriteIndex() noexcept {
next_write_index_ = (next_write_index_ + 1) % store_.size();
num_elements_++;
}auto getNextToRead() const noexcept -> const T * {
return (size() ? &store_[next_read_index_] : nullptr);
}auto updateReadIndex() noexcept {
next_read_index_ = (next_read_index_ + 1) % store_.size(); // wrap around at the end of container size.
ASSERT(num_elements_ != 0, "Read an invalid element in:" + std::to_string(pthread_self()));
num_elements_--;
}auto size() const noexcept {
return num_elements_.load();
}/// Deleted default, copy & move constructors and assignment-operators.
LFQueue() = delete;LFQueue(const LFQueue &) = delete;LFQueue(const LFQueue &&) = delete;LFQueue &operator=(const LFQueue &) = delete;LFQueue &operator=(const LFQueue &&) = delete;private:
/// Underlying container of data accessed in FIFO order.
std::vector<T> store_;/// Atomic trackers for next index to write new data to and read new data from.
std::atomic<size_t> next_write_index_ = {0};
std::atomic<size_t> next_read_index_ = {0};std::atomic<size_t> num_elements_ = {0};
};
}
When to Use Lock-Free Queues
Use when:
- Low latency is critical
- High throughput required
- Threads must not block
- Real-time systems (trading, gaming, networking)
Avoid when:
- Simplicity matters more than performance
- Low contention system
- we don’t understand concurrency deeply
Real Use in Trading Systems
In HFT systems:
- Market Data Thread → pushes data into queue
- Strategy Thread → consumes data
- Order Thread → sends orders
Each stage uses SPSC queues between components
Why?
- No blocking
- Ultra-fast communication
- Predictable latency
Advantages
1. No Blocking
Threads never wait
2. Low Latency
Critical for trading systems
3. High Throughput
Handles millions of ops/sec
4. Scalable
Works well with multiple cores
Disadvantages
1. Complex to Implement
Harder than mutex-based design
2. Debugging is Difficult
Race conditions are tricky
3. CPU Usage Can Increase
Retry loops (busy-waiting)
4. Memory Ordering Complexity
Need deep understanding of atomics
Lock-Free vs Mutex Queue
FeatureLock-Free QueueMutex QueueLatencyVery LowHigherBlockingNoYesComplexityHighLowPredictabilityHighMedium
Best Practices
- Use SPSC wherever possible (fastest)
- Avoid dynamic memory allocation
- Align data for cache (cache line awareness)
- Use correct memory order (acquire/release)
- Benchmark always
Resources to Learn
Books
- Building Low Latency Applications with C++ — Sourabh Ghosh
- C++ Concurrency in Action — Anthony Williams
Topics to Explore
- Atomic operations (std::atomic)
- Memory ordering (acquire/release)
- Compare-And-Swap (CAS)
- CPU cache & false sharing
Final Thoughts
Lock-Free Queues are one of the most powerful tools in high-performance systems. They remove bottlenecks caused by locks and allow systems to scale efficiently.
But they come with complexity.
Use them where performance truly matters.
No locks. No waiting. Just speed.