Member-only story
Keeping Clean Data in the CPU and Disk
Ian Mihura4 min read·Just now--
High-performance engineering depends on memory layout. Software developers call it AoS vs SoA, and data engineers call it Row vs Column tables, but it is the same concept.
Array of Structs (AoS) / Row Tables
In an Array of Structs, the CPU stores objects contiguously. Every field of an object is stored together:
typedef struct {
uint64_t timestamp;
float price;
uint32_t volume;
} Order;
Order history_aos[1024];In memory
How this works
The CPU fetches data in 64-byte chunks called Cache Lines. Accessing history_aos[0].price pulls the entire Order into L1 cache. This is fast for single CRUD operations but slow for bulk operations.
In a database this is called Online Transaction Processing (OLTP) storage. A single write operation locks the whole object, and ACID transactions are easier because the data is physically together (guaranteeing transaction atomicity).
This is the design philosophy of Object-Oriented Programming (OOP), where one class (object) is mapped to a real-world (or domain) object.
But calculating an average for 1 million rows fetches every field into the cache for every record. In our example you waste 66%…