Cache-Augmented Generation (CAG) - An efficient framework for knowledge-intensive tasks using preloaded data and KV cache.
## Definition of CAG
Cache-Augmented Generation (CAG) is a framework designed to optimize large language models (LLMs) for knowledge-intensive tasks. It preloads data and computes key-value (KV) caches, eliminating the need for real-time retrieval. This approach reduces latency and errors compared to traditional Retrieval-Augmented Generation (RAG) methods.
## Definition of CAG
Unlike Retrieval-Augmented Generation (RAG), which relies on real-time retrieval of external documents, CAG preloads data and computes KV caches in advance. This eliminates retrieval latency and document selection errors, making CAG more efficient for tasks where the knowledge base is manageable in size.
## CAG Mechanism Steps
The CAG mechanism consists of three key steps:
1. **Knowledge Preloading**: Documents are preprocessed and stored as KV caches using the formula `C_{KV} = KV-Encode(D)`.
2. **Inference Generation**: The model generates responses using preloaded KV caches and user queries, following the formula `r = M(q | C_{KV})`.
3. **Cache Reset**: KV caches are efficiently reset (e.g., by truncating new tokens) to support multiple inference sessions.
## Advantages of CAG
CAG offers several advantages over RAG:
- **Reduced Latency**: Preloading data eliminates retrieval delays.
- **Lower Error Rates**: Avoids document selection errors common in RAG.
- **Simplified Architecture**: Reduces system complexity and maintenance costs.
- **Performance**: Outperforms RAG in small to medium-sized knowledge bases (e.g., 16-64 documents, 21k-85k tokens).
## Use Cases for CAG
CAG is particularly suited for scenarios with manageable knowledge bases, such as:
- Internal company documents (e.g., policy manuals, technical documentation).
- Frequently Asked Questions (FAQ) systems.
- Customer support logs.
- Domain-specific databases (e.g., medical or legal knowledge bases).
## Limitations of CAG
CAG has limitations when applied to large datasets, as current LLM context windows may not support efficient preloading. However, these limitations are expected to diminish as LLM context lengths and hardware capabilities improve.
## CAG Resources
Resources related to CAG include:
- **GitHub Repository**: [CAG GitHub](https://github.com/hhhuang/CAG).
- **Research Paper**: ["Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks"](https://arxiv.org/pdf/2412.15605).
- **Supplementary Materials**: [arXiv Paper Supplements](http://arxiv.org/ps/2412.15605v2).
### Citation sources:
- [Cache-Augmented Generation (CAG)](https://github.com/hhhuang/CAG) - Official URL
Updated: 2025-04-01