This AI Compressed 'All Human Cooking' Into 2 Megabytes
A London startup trained an AI on 4.1 million recipes across seven languages—and the whole thing is smaller than a song file.
By Jose Antonio LanzEdited by Guillermo JimenezMay 28, 2026May 28, 20264 min read
In brief
- KAIKAKU.AI published Epicure, a family of three ingredient AI models trained on 4.14 million multilingual recipes.
- The model doesn't store recipes—it stores what was learned from them, letting users navigate cooking knowledge mathematically.
- Three variants—Cooc, Chem, and Core—sit at different points on a recipe-context vs. flavor-chemistry spectrum, each answering a slightly different culinary question from the same 2MB file.
Josef Chen says he compressed all of human cooking into two megabytes. That's a bold claim. It also checks out.
Chen, co-founder and CEO of London food AI startup KAIKAKU.AI, published a paper on arXiv this week, alongside researcher Jakub Radzikowski, presenting Epicure—three AI models trained on 4.14 million recipes pulled from 11 datasets across seven languages. The result: a map of 1,790 ingredients, each described by 300 numbers, that fits in your email attachment limit with room to spare.
"4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions," Chen wrote on X. "All of human cooking compressed into 2 megabytes."
Launching our new paper on arXiv: we trained the largest multilingual food model ever built.
4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions.
All of human cooking compressed into 2 megabytes. pic.twitter.com/b4GiZ62UMt
— Josef Chen (@josefchen) May 26, 2026
It's not storing recipes
Before you imagine a two-megabyte USB stick jammed with stir-fry instructions, the model doesn't store a single recipe. The two megabytes is more a coordinate table than it is a cookbook.
Think of it as a map. Every ingredient gets a precise location based on how it behaves across millions of real dishes worldwide. The math is blunt: 1,790 ingredients × 300 numbers per ingredient × 4 bytes each ≈ 2.05 megabytes. Those numbers encode which ingredients appear together, which share flavor compounds, and which belong to the same culinary tradition. Once the model learns all that from the recipes, the recipes can go. The knowledge lives in the coordinates.
This is essentially the same trick word2vec pulled on language back in 2013, when Google researchers showed that you could do arithmetic with meaning. Epicure does that for food. Take beef, point it toward America and you’ll get bread, lettuce, maybe beer. Point it toward South East Asia and the model stops thinking about burgers and grills and starts thinking about soy sauce, ginger, and sesame oil.
This happens through what the paper describes as a steering operator called SLERP rotation. Take a seed ingredient—chicken—and rotate it mathematically toward a cuisine direction. At 30 degrees you start seeing Tex-Mex territory. At 60 degrees, chicken and beef converge on the same Mexican pantry: corn tortilla, salsa, monterey jack, poblano pepper. The angle is a dial between "stay near this ingredient" and "land somewhere new."
Epicure comes in three versions, and picking the right one depends on what you're actually asking. Cooc learns from recipe co-occurrence—what shows up together in real dishes. Chem learns from flavor chemistry—which ingredients share aroma compounds from the FlavorDB chemical database. Core is a mix between the previous two.
Ask Cooc what pairs with chocolate and you may get dessert-pantry companions: cocoa powder, vanilla, almond. Ask Chem and you get flavor-chemistry peers: toffee, fudge, ganache.
Same ingredient, different question. A chef looking for a substitute has different needs than a chef mapping flavor compatibility.
Why this isn't ChatGPT for food
Epicure has no general knowledge, no language generation, and no ability to hallucinate an ingredient it's never seen. It knows 1,790 ingredients. That's the whole world, as far as this model is concerned. What it gives up in breadth it gains in reliability—unlike recipe chatbots that will confidently suggest poison as a cooking ingredient if you push them the wrong way.
The previous state of the art here was FlavorGraph, a 2021 model that combined chemical data with the English-only Recipe1M+ dataset. Epicure brings in a multilingual corpus more than four times larger and cleans the vocabulary for efficiency.
Practical uses aren't hard to picture. A chef asks what the East Asian equivalent of a Mediterranean ingredient looks like. A food product developer asks what minimally processed swap lands in the same flavor zone as an additive. A recipe app needs a coherent substitution when an ingredient is missing from the pantry. That last one is the gap where purpose-built small models quietly outperform the big generalist ones.
The Epicure paper is a research release. The trained models are live on Hugging Face and an interactive ingredient map is publicly accessible at epicure.kaikaku.ai. They even released an MCP for your agents. Full training code is not released at this time.