Member-only story
FFT in C++
How Shazam Identifies Any Song in Seconds
Gealleh18 min read·Just now--
You hold your phone up to a speaker. A few seconds of music plays. Shazam returns the song title, artist, and album in under ten seconds, having compared what it heard against a database of tens of millions of songs.
This is not machine learning. There is no neural network listening to the music and guessing what it sounds like. Shazam works by converting audio into a mathematical fingerprint and looking up that fingerprint in a hash table. The fingerprint is designed to be robust to noise, compression, and recording quality. Two recordings of the same song, one from a studio and one from a phone held near a bar speaker, produce fingerprints close enough to match.
The mathematical foundation of the entire system is the Fast Fourier Transform: an algorithm that converts a signal from the time domain into the frequency domain in O(n log n) time. Understanding FFT means understanding not just Shazam, but JPEG compression, MP3 encoding, noise cancellation, medical imaging, radar, and sonar.
This article explains how FFT works from first principles and builds a complete C++ implementation.
The Core Idea: Time Domain vs Frequency Domain
Every sound is a wave: air pressure oscillating over time. When you record audio, you capture thousands of these pressure measurements per second (44,100 per second for CD-quality audio). This is the time domain: a sequence of amplitude…