Core Concepts of Semantic Caching

2. Core Concepts of Semantic Caching

To effectively use Redis LangCache, it’s essential to understand the underlying principles of semantic caching. This chapter will break down these core concepts, providing detailed explanations and practical examples.

2.1 What is Semantic Caching?

Traditional caching works by storing and retrieving data based on exact matches. If you query “What is the capital of France?”, a traditional cache would only return a stored value if the exact string “What is the capital of France?” was previously cached.

Semantic caching takes this a step further. It stores information based on its meaning or semantics, not just its exact wording. So, if “What is the capital of France?” is cached, a semantically similar query like “Capital city of France?” or “Which city is France’s capital?” could still trigger a cache hit. This is achieved through the magic of embeddings and vector similarity.

2.2 Embeddings: The Language of Meaning

At the heart of semantic caching are embeddings. An embedding is a numerical representation (a vector of numbers) of text, images, or other data, where items with similar meanings have similar numerical representations.

Think of it like this: words or phrases that are semantically close to each other will have their embedding vectors located close to each other in a high-dimensional space.

For example, the phrase “What is the capital of France?” and “Capital city of France?” would be transformed into two different vectors, but these vectors would be very close to each other in the embedding space, indicating their similar meaning.

Redis LangCache handles the generation of these embeddings for you using an underlying embedding model (either its own or one you provide, like OpenAI’s).

2.3 Vector Similarity: Finding the Closest Match

Once text is converted into embeddings (vectors), Redis LangCache needs a way to determine how “similar” two vectors are. This is done using vector similarity metrics. Common metrics include:

Cosine Similarity: Measures the cosine of the angle between two vectors. A cosine of 1 means identical direction (most similar), 0 means orthogonal (no similarity), and -1 means opposite direction (least similar).
Euclidean Distance: Measures the straight-line distance between two vectors in space. Smaller distance means higher similarity.

LangCache uses these metrics to compare the embedding of a new incoming prompt against the embeddings of prompts already stored in the cache. If the similarity score exceeds a predefined similarity threshold, it’s considered a match.

2.4 Cache Hit and Cache Miss Lifecycle

Understanding how LangCache processes requests is crucial:

User sends a prompt to your AI app.
Your app sends the prompt to LangCache.
LangCache generates an embedding for the new prompt.
LangCache searches its internal vector database to find stored embeddings that are semantically similar to the new prompt’s embedding, based on the configured similarity threshold.
If a match is found (Cache Hit): LangCache retrieves the cached LLM response associated with the similar prompt and returns it to your app. Your app then sends this cached response to the user, effectively bypassing the LLM.
If no match is found (Cache Miss): LangCache returns an empty response to your app. Your app then proceeds to query the actual LLM (e.g., OpenAI, Google Gemini) to generate a fresh response.
Store New Response (on Cache Miss): After your app receives a fresh response from the LLM, it should send both the original prompt and the new response back to LangCache. LangCache stores this new prompt-response pair (along with its embedding) for future use.

This cycle ensures that frequently asked or similar questions are quickly served from the cache, saving time and money.

2.5 Code Examples: Illustrating Concepts

Let’s see how these concepts translate into code. We’ll use the LangCache SDKs for Python and Node.js. Remember to have your .env file set up with LANGCACHE_API_HOST, LANGCACHE_CACHE_ID, and LANGCACHE_API_KEY.

Python Example: Simple Cache Interaction

This example demonstrates a basic cache interaction: storing a prompt and response, and then searching for a similar prompt.

# python-examples/simple_cache.py
import os
from dotenv import load_dotenv
from langcache import LangCache

load_dotenv()

# Retrieve LangCache credentials from environment variables
LANGCACHE_API_HOST = os.getenv("LANGCACHE_API_HOST")
LANGCACHE_CACHE_ID = os.getenv("LANGCACHE_CACHE_ID")
LANGCACHE_API_KEY = os.getenv("LANGCACHE_API_KEY")

# Initialize LangCache client
lang_cache = LangCache(
    server_url=f"https://{LANGCACHE_API_HOST}",
    cache_id=LANGCACHE_CACHE_ID,
    api_key=LANGCACHE_API_KEY
)

async def main():
    print("--- LangCache Basic Interaction (Python) ---")

    # 1. Store a prompt and response
    prompt1 = "What is the capital of France?"
    response1 = "Paris"
    print(f"Storing: Prompt='{prompt1}', Response='{response1}'")
    store_key = await lang_cache.set(prompt=prompt1, response=response1)
    print(f"Stored with key: {store_key}")

    # Give a moment for indexing (optional, but good practice for immediate search)
    # In a real app, this might be handled asynchronously by the LangCache service.
    import asyncio
    await asyncio.sleep(1)

    # 2. Search for a semantically similar prompt (Cache Hit expected)
    search_prompt_similar = "Capital city of France?"
    print(f"\nSearching for a similar prompt: '{search_prompt_similar}'")
    search_results_similar = await lang_cache.search(
        prompt=search_prompt_similar,
        similarity_threshold=0.85 # Using the default or a reasonable threshold
    )

    if search_results_similar:
        print(f"Cache Hit! Found response: {search_results_similar[0].response}")
        print(f"Similarity score: {search_results_similar[0].score:.4f}")
    else:
        print("Cache Miss for similar prompt.")

    # 3. Search for a completely different prompt (Cache Miss expected)
    search_prompt_different = "Tell me about quantum physics."
    print(f"\nSearching for a different prompt: '{search_prompt_different}'")
    search_results_different = await lang_cache.search(
        prompt=search_prompt_different,
        similarity_threshold=0.85
    )

    if search_results_different:
        print(f"Cache Hit! Found response: {search_results_different[0].response}")
        print(f"Similarity score: {search_results_different[0].score:.4f}")
    else:
        print("Cache Miss for different prompt. (Expected)")
        # In a real app, you'd call the LLM here and then store the new result.
        response_different = "Quantum physics is the study of matter and energy at the most fundamental level. It aims to describe the properties and behavior of particles like electrons and photons."
        await lang_cache.set(prompt=search_prompt_different, response=response_different)
        print(f"Stored new response for '{search_prompt_different}'.")


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

To run this Python example:

Make sure you are in the learn-redis-langcache/python-examples directory.
Activate your virtual environment: source venv/bin/activate (or .\venv\Scripts\activate on Windows).
Run the script: python simple_cache.py

Node.js Example: Simple Cache Interaction

This Node.js example mirrors the Python one, demonstrating the same store and search operations.

// nodejs-examples/simple_cache.js
require('dotenv').config({ path: '../.env' }); // Load .env from parent directory
const { LangCache } = require('@redis-ai/langcache');

// Retrieve LangCache credentials from environment variables
const LANGCACHE_API_HOST = process.env.LANGCACHE_API_HOST;
const LANGCACHE_CACHE_ID = process.env.LANGCACHE_CACHE_ID;
const LANGCACHE_API_KEY = process.env.LANGCACHE_API_KEY;

// Initialize LangCache client
const langCache = new LangCache({
    serverURL: `https://${LANGCACHE_API_HOST}`,
    cacheId: LANGCACHE_CACHE_ID,
    apiKey: LANGCACHE_API_KEY,
});

async function main() {
    console.log("--- LangCache Basic Interaction (Node.js) ---");

    // 1. Store a prompt and response
    const prompt1 = "What is the capital of France?";
    const response1 = "Paris";
    console.log(`Storing: Prompt='${prompt1}', Response='${response1}'`);
    const storeResult = await langCache.set({ prompt: prompt1, response: response1 });
    console.log(`Stored with entry ID: ${storeResult.entryId}`);

    // Give a moment for indexing (optional, but good practice for immediate search)
    await new Promise(resolve => setTimeout(resolve, 1000));

    // 2. Search for a semantically similar prompt (Cache Hit expected)
    const searchPromptSimilar = "Capital city of France?";
    console.log(`\nSearching for a similar prompt: '${searchPromptSimilar}'`);
    const searchResultsSimilar = await langCache.search({
        prompt: searchPromptSimilar,
        similarityThreshold: 0.85 // Using the default or a reasonable threshold
    });

    if (searchResultsSimilar && searchResultsSimilar.results.length > 0) {
        const firstResult = searchResultsSimilar.results[0];
        console.log(`Cache Hit! Found response: ${firstResult.response}`);
        console.log(`Similarity score: ${firstResult.score.toFixed(4)}`);
    } else {
        console.log("Cache Miss for similar prompt.");
    }

    // 3. Search for a completely different prompt (Cache Miss expected)
    const searchPromptDifferent = "Tell me about the theory of relativity.";
    console.log(`\nSearching for a different prompt: '${searchPromptDifferent}'`);
    const searchResultsDifferent = await langCache.search({
        prompt: searchPromptDifferent,
        similarityThreshold: 0.85
    });

    if (searchResultsDifferent && searchResultsDifferent.results.length > 0) {
        const firstResult = searchResultsDifferent.results[0];
        console.log(`Cache Hit! Found response: ${firstResult.response}`);
        console.log(`Similarity score: ${firstResult.score.toFixed(4)}`);
    } else {
        console.log("Cache Miss for different prompt. (Expected)");
        // In a real app, you'd call the LLM here and then store the new result.
        const responseDifferent = "The theory of relativity, developed by Albert Einstein, includes two theories: special relativity and general relativity. Special relativity deals with the relationship between space and time for objects moving at constant speeds, while general relativity extends this to include gravity, describing it as a curvature of spacetime.";
        await langCache.set({ prompt: searchPromptDifferent, response: responseDifferent });
        console.log(`Stored new response for '${searchPromptDifferent}'.`);
    }
}

main().catch(console.error);

To run this Node.js example:

Make sure you are in the learn-redis-langcache/nodejs-examples directory.
Run the script: node simple_cache.js

Exercise/Mini-Challenge: Simulate Chatbot Interaction

Objective: Extend one of the above examples to simulate a simple chatbot interaction. The chatbot should first try to get an answer from LangCache. If it’s a cache miss, it should simulate calling an LLM (by using a predefined answer or a mock LLM function) and then store the new prompt-response pair in LangCache.

Instructions:

Choose either the Python or Node.js example.
Create a function get_llm_response(prompt) (Python) or getLlmResponse(prompt) (Node.js) that returns a hardcoded response for specific keywords or a generic fallback.
Modify the main logic to:
- Take a user prompt.
- Call lang_cache.search() (or langCache.search()).
- If a cache hit: print the cached response.
- If a cache miss:
  - Call your get_llm_response() (or getLlmResponse()) function with the user prompt.
  - Print the LLM’s response.
  - Store the user prompt and the LLM’s response in LangCache using lang_cache.set() (or langCache.set()).
Run your script multiple times with variations of the same question to observe cache hits.

Example Prompts for Testing:

“Hello, how are you?”
“Tell me about the weather today.”
“What’s the weather like?”
“How is the climate outdoors?”
“What is the capital of Japan?”
“Japan’s capital city?”

This exercise will solidify your understanding of the cache hit/miss lifecycle and how to integrate LangCache into a basic AI application flow.