LLM Adapters

Configure and use Claude, GPT, and Gemini adapters for embedding, summarization, and relation extraction.

Overview

Cerememory's standard mode runs entirely without an external LLM. The CLI default features (default = []) ship with no provider compiled in, and [llm].provider = "none" is the supported production configuration. External adapters are an optional, experimental extension for callers that want automatic embeddings, summarization, or relation extraction in-engine.

When you do enable an adapter, two trait-based interfaces drive the integration:

LLMProvider -- Handles embedding generation, summarization, and relation extraction
LLMAdapter -- Serializes memory context into LLM-consumable format

Without a configured provider, Cerememory uses caller-supplied embeddings, returns truncation summaries, and skips relation extraction. Most agentic workflows produce richer summaries in the calling agent itself, then encode them into Cerememory through store / update (with meta_json for the why), so adapters are rarely the right answer.

Building With an Adapter

The CLI default features include no LLM adapter. Opt in at build time:

bash

# Single provider
cargo build -p cerememory-cli --release --features llm-claude
 
# Multiple providers
cargo build -p cerememory-cli --release --features "llm-claude,llm-openai,llm-gemini"

A binary built without the matching feature flag will report the configured provider as unsupported at startup and refuse to serve until you either rebuild or set [llm].provider = "none".

Server Auth Key vs LLM Provider Key

These are different settings and solve different problems:

auth.api_keys protects access to the Cerememory server itself
llm.api_key lets Cerememory call OpenAI, Anthropic, or Gemini

You only need llm.api_key for features that actually invoke a provider.

Supported Providers

Provider	Config Value	Crate	Features
Anthropic Claude	`claude` or `anthropic`	`adapter-claude`	Summarization, relation extraction
OpenAI GPT	`openai`	`adapter-openai`	Text embedding, summarization, relation extraction, image embedding, audio transcription
Google Gemini	`gemini` or `google`	`adapter-gemini`	Text embedding, summarization, relation extraction, image embedding

Configuration

toml

[llm]
provider = "claude"
api_key = "sk-ant-api03-..."
model = "claude-sonnet-4-20250514"  # optional
base_url = null            # optional, for proxy setups

Environment variables:

bash

export CEREMEMORY_LLM__PROVIDER=claude
export CEREMEMORY_LLM__API_KEY=sk-ant-api03-...
export CEREMEMORY_LLM__MODEL=claude-sonnet-4-20250514

Provider Capabilities

Each provider advertises its capabilities via the ProviderCapabilities struct:

rust

pub struct ProviderCapabilities {
    pub text_embedding: bool,      // generate text embeddings
    pub image_embedding: bool,     // describe image → embed description
    pub audio_transcription: bool, // transcribe audio to text
}

Capability	Claude	OpenAI	Gemini	No Provider
Text embedding	No	Yes	Yes	No
Image embedding	No	Yes (vision)	Yes (vision)	No
Audio transcription	No	Yes (Whisper)	No	No
Summarization	Yes	Yes	Yes	Truncation fallback
Relation extraction	Yes	Yes	Yes	Skipped

Which Features Actually Need An LLM Key?

Requires llm.api_key:

Automatic embedding generation
Image recall cues and image embedding
Audio recall cues and transcription
Consolidation summarization
Relation extraction

Does not require llm.api_key:

Basic text/structured store and recall
Search by precomputed embeddings you provide yourself
Timeline, graph, inspect, stats, forget, export, import

LLMProvider Trait

The provider trait defines the core LLM operations:

rust

pub trait LLMProvider: Send + Sync {
    /// Generate an embedding vector for text.
    fn embed(&self, text: &str) -> Pin<Box<dyn Future<Output = Result<Vec<f32>>> + Send>>;
 
    /// Summarize multiple texts into a concise summary.
    fn summarize(&self, texts: &[String], max_tokens: usize) -> Pin<Box<dyn Future<Output = Result<String>> + Send>>;
 
    /// Extract semantic relations (subject-predicate-object triples).
    fn extract_relations(&self, text: &str) -> Pin<Box<dyn Future<Output = Result<Vec<ExtractedRelation>>> + Send>>;
 
    /// Generate embedding for an image (via vision → text → embed pipeline).
    fn embed_image(&self, data: &[u8], format: &str) -> Pin<Box<dyn Future<Output = Result<Vec<f32>>> + Send>>;
 
    /// Transcribe audio to text.
    fn transcribe_audio(&self, data: &[u8], format: &str) -> Pin<Box<dyn Future<Output = Result<String>> + Send>>;
 
    /// Advertise supported capabilities.
    fn capabilities(&self) -> ProviderCapabilities;
}

LLMAdapter Trait

The adapter trait handles context serialization for LLM consumption:

rust

pub trait LLMAdapter: Send + Sync {
    /// Serialize memories into LLM-consumable format within a token budget.
    fn serialize_context(&self, memories: &[MemoryRecord], budget_tokens: usize) -> String;
 
    /// Estimate token count for memory content.
    fn estimate_tokens(&self, content: &MemoryContent) -> usize;
 
    /// Model metadata.
    fn model_info(&self) -> ModelInfo;
}

Each adapter serializes memories optimally for its target model, respecting the model's context window and token limits.

Fallback Behavior

When no LLM provider is configured (provider = "none"):

Operation	Fallback Behavior
Embedding generation	Returns empty vector (manual embeddings required)
Summarization	Truncates to first 200 characters
Relation extraction	Returns empty list (no graph enrichment)
Image embedding	Returns `ModalityUnsupported` error
Audio transcription	Returns `ModalityUnsupported` error

Error Handling and Retries

All LLM adapter HTTP requests share an in-tree exponential-backoff helper from cerememory-adapter-common. The backoff crate dependency was removed in 0.2.6 — Cerememory now carries the retry loop itself.

Default RetryPolicy:

Field	Default	Meaning
`initial_interval`	500 ms	First sleep before attempt 2
`multiplier`	2.0	Each subsequent interval doubles
`max_interval`	30 s	Cap on a single sleep window
`max_retries`	3	Retries after the initial attempt — so up to 4 total tries

Errors are classified as either RetryError::Transient(_) (retried up to the cap) or RetryError::Permanent(_) (returned immediately). The OpenAI adapter classifies HTTP 429 / 5xx and network failures as transient; the Claude and Gemini adapters apply the same status-based rule. Auth (401), validation (400), and unknown-route errors fail fast as permanent.

The shared HTTP client is built with a 30 s request timeout (DEFAULT_TIMEOUT_SECS).

Next Steps

Configuration

Full configuration options including LLM settings

API Endpoints

HTTP REST endpoint reference