🗄️Cache
Caching can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
1)InMemory Cache
Caches LLM response in local memory, will be cleared when app is restarted.

The InMemory Cache is designed to store responses from Large Language Models (LLMs) in the local memory of an application. This cache improves performance by reducing the need to repeatedly request the same data. However, the cached data is temporary and will be cleared when the application is restarted.
Features
• Local Memory Storage: Stores cached data in the local memory of the application.
• Performance Boost: Reduces latency by retrieving data from memory instead of making repeated requests to the LLM.
• Automatic Clearing: The cache is automatically cleared upon application restart, ensuring that it does not persist beyond the session.
2)InMemory Embedding Cache
Cache generated Embeddings in memory to avoid needing to recompute them.

The InMemory Embedding Cache is designed to store generated embeddings in local memory. This cache eliminates the need to recompute embeddings for the same data, thereby enhancing performance and efficiency in applications that require frequent embedding computations.
Features
• Local Memory Storage: Stores embeddings in the local memory of the application.
• Performance Enhancement: Reduces computation time by retrieving precomputed embeddings from memory.
• Automatic Clearing: Cached embeddings are cleared when the application is restarted, ensuring that memory usage is managed effectively.
3)Redis Cache
Cache LLM response in Redis, useful for sharing cache across multiple processes or servers.

Redis Cache is an in-memory data structure store used for caching LLM (Large Language Model) responses. It is highly efficient for sharing cache across multiple processes or servers, providing quick access to frequently used data and improving application performance.
Features
• Embedding Storage: Stores embedding vectors in Redis for quick access.
• Fast Retrieval: Enables fast lookup of embeddings during similarity search or retrieval tasks.
• Cost Optimization: Avoids repeated embedding generation, reducing compute and API usage.
• Scalable Architecture: Works across distributed systems and multiple application instances.
• Efficient Vector Reuse: Improves performance in applications using retrieval-augmented generation (RAG) or semantic search.
4)Redis Embeddings Cache
Cache LLM response in Redis, useful for sharing cache across multiple processes or servers.

Redis Embeddings Cache is used to store embedding vectors in Redis for faster retrieval and reuse. It helps reduce repeated embedding computations by caching previously generated embeddings. This improves performance and reduces API costs when working with vector searches or similarity-based retrieval systems.
Features
• Embedding Storage: Stores embedding vectors in Redis for quick access.
• Fast Retrieval: Enables fast lookup of embeddings during similarity search or retrieval tasks.
• Cost Optimization: Avoids repeated embedding generation, reducing compute and API usage.
• Scalable Architecture: Works across distributed systems and multiple application instances.
• Efficient Vector Reuse: Improves performance in applications using retrieval-augmented generation (RAG) or semantic search.
Last updated