Member-only story
|LLM|ADAPTERS|CONTEXT INTERNALIZATION|
Teaching LLMs to Remember: Internalizing Documents Without Fine-Tuning
Never forget a document: Moving from fine-tuning to context internalization
Salvatore Raieli10 min read·Just now--
There are lots of people who mistake their imagination for their memory. — Josh Billings
Large language models (LLMs) are increasingly used to analyze long documents. In the standard setup, the document must be inserted into the prompt each time the model is used. This approach is transient: the model does not retain the document after the interaction. It is also computationally expensive, as inference cost grows rapidly with context length, increasing latency and usage costs.
Supervised fine-tuning offers an alternative by embedding the document’s knowledge directly into the model’s parameters. However, fine-tuning is itself resource-intensive and impractical for frequent or dynamic updates.
Is there a way to internalize the knowledge of a document efficiently? One that is reusable, fast, and does not require repeatedly writing the full text as part of the prompt?