🗄️Cache

Caching can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.

1)InMemory Cache

Caches LLM response in local memory, will be cleared when app is restarted.

The InMemory Cache is designed to store responses from Large Language Models (LLMs) in the local memory of an application. This cache improves performance by reducing the need to repeatedly request the same data. However, the cached data is temporary and will be cleared when the application is restarted.

Features

· Local Memory Storage: Stores cached data in the local memory of the application.

· Performance Boost: Reduces latency by retrieving data from memory instead of making repeated requests to the LLM.

· Automatic Clearing: The cache is automatically cleared upon application restart, ensuring that it does not persist beyond the session.

2)InMemory Embedding Cache

Cache generated Embeddings in memory to avoid needing to recompute them.

The InMemory Embedding Cache is designed to store generated embeddings in local memory. This cache eliminates the need to recompute embeddings for the same data, thereby enhancing performance and efficiency in applications that require frequent embedding computations.

Features

· Local Memory Storage: Stores embeddings in the local memory of the application.

· Performance Enhancement: Reduces computation time by retrieving precomputed embeddings from memory.

· Automatic Clearing: Cached embeddings are cleared when the application is restarted, ensuring that memory usage is managed effectively.

3)Momento Cache

Cache LLM response using Momento, a distributed, serverless cache.

Momento Cache is a distributed, serverless caching solution designed to store responses from Large Language Models (LLMs). It leverages the capabilities of Momento, a high-performance caching service, to enhance application performance and scalability by caching LLM responses.

Features

· Distributed Caching: Uses a distributed architecture to store cached data across multiple servers, ensuring high availability and scalability.

· Serverless: Operates without the need for server management, simplifying deployment and maintenance.

· Performance Enhancement: Reduces latency by caching LLM responses and serving them quickly.

· Automatic Expiration: Configurable expiration policies to automatically clear outdated cache entries.

4)Redis Cache

Cache LLM response in Redis, useful for sharing cache across multiple processes or servers.

Redis Cache is an in-memory data structure store used for caching LLM (Large Language Model) responses. It is highly efficient for sharing cache across multiple processes or servers, providing quick access to frequently used data and improving application performance.

Features

· In-Memory Storage: Stores data in memory for rapid access.

· Distributed Caching: Can be deployed across multiple servers for high availability and scalability.

· Persistence: Optionally persist data to disk to prevent data loss.

· Data Structures: Supports various data types including strings, hashes, lists, sets, and sorted sets.

· Expiration Policies: Configurable TTL (Time-To-Live) to automatically remove outdated cache entries.

5)Redis Embeddings Cache

Cache LLM response in Redis, useful for sharing cache across multiple processes or servers.

6)Upstash Redis Cache

Cache LLM response in Upstash Redis, serverless data for Redis and Kafka.

Upstash Redis is a serverless Redis service designed for scalable and cost-effective caching solutions. By caching LLM (Large Language Model) responses in Upstash Redis, applications can achieve high performance, scalability, and efficient resource management without the need for server maintenance.

Features

· Serverless Architecture: Automatically scales based on demand, eliminating the need for server management.

· High Performance: Provides low-latency data access by storing data in memory.

· Cost-Effective: Pay-as-you-go pricing model ensures cost efficiency.

· Persistent Storage: Optionally persists data to disk to prevent data loss.

· Global Distribution: Offers multi-region deployment for reduced latency and high availability.

Last updated