Skip to content

LLM Adapters

Configure and use Claude, GPT, and Gemini adapters for embedding, summarization, and relation extraction.

Overview

Cerememory's standard mode runs entirely without an external LLM. The CLI default features (default = []) ship with no provider compiled in, and [llm].provider = "none" is the supported production configuration. External adapters are an optional, experimental extension for callers that want automatic embeddings, summarization, or relation extraction in-engine.

When you do enable an adapter, two trait-based interfaces drive the integration:

  • LLMProvider -- Handles embedding generation, summarization, and relation extraction
  • LLMAdapter -- Serializes memory context into LLM-consumable format

Without a configured provider, Cerememory uses caller-supplied embeddings, returns truncation summaries, and skips relation extraction. Most agentic workflows produce richer summaries in the calling agent itself, then encode them into Cerememory through store / update (with meta_json for the why), so adapters are rarely the right answer.

Building With an Adapter

The CLI default features include no LLM adapter. Opt in at build time:

bash
# Single provider
cargo build -p cerememory-cli --release --features llm-claude
 
# Multiple providers
cargo build -p cerememory-cli --release --features "llm-claude,llm-openai,llm-gemini"

A binary built without the matching feature flag will report the configured provider as unsupported at startup and refuse to serve until you either rebuild or set [llm].provider = "none".

Server Auth Key vs LLM Provider Key

These are different settings and solve different problems:

  • auth.api_keys protects access to the Cerememory server itself
  • llm.api_key lets Cerememory call OpenAI, Anthropic, or Gemini

You only need llm.api_key for features that actually invoke a provider.

Supported Providers

ProviderConfig ValueCrateFeatures
Anthropic Claudeclaude or anthropicadapter-claudeSummarization, relation extraction
OpenAI GPTopenaiadapter-openaiText embedding, summarization, relation extraction, image embedding, audio transcription
Google Geminigemini or googleadapter-geminiText embedding, summarization, relation extraction, image embedding

Configuration

toml
[llm]
provider = "claude"
api_key = "sk-ant-api03-..."
model = "claude-sonnet-4-20250514"  # optional
base_url = null            # optional, for proxy setups

Environment variables:

bash
export CEREMEMORY_LLM__PROVIDER=claude
export CEREMEMORY_LLM__API_KEY=sk-ant-api03-...
export CEREMEMORY_LLM__MODEL=claude-sonnet-4-20250514

Provider Capabilities

Each provider advertises its capabilities via the ProviderCapabilities struct:

rust
pub struct ProviderCapabilities {
    pub text_embedding: bool,      // generate text embeddings
    pub image_embedding: bool,     // describe image → embed description
    pub audio_transcription: bool, // transcribe audio to text
}
CapabilityClaudeOpenAIGeminiNo Provider
Text embeddingNoYesYesNo
Image embeddingNoYes (vision)Yes (vision)No
Audio transcriptionNoYes (Whisper)NoNo
SummarizationYesYesYesTruncation fallback
Relation extractionYesYesYesSkipped

Which Features Actually Need An LLM Key?

Requires llm.api_key:

  • Automatic embedding generation
  • Image recall cues and image embedding
  • Audio recall cues and transcription
  • Consolidation summarization
  • Relation extraction

Does not require llm.api_key:

  • Basic text/structured store and recall
  • Search by precomputed embeddings you provide yourself
  • Timeline, graph, inspect, stats, forget, export, import

LLMProvider Trait

The provider trait defines the core LLM operations:

rust
pub trait LLMProvider: Send + Sync {
    /// Generate an embedding vector for text.
    fn embed(&self, text: &str) -> Pin<Box<dyn Future<Output = Result<Vec<f32>>> + Send>>;
 
    /// Summarize multiple texts into a concise summary.
    fn summarize(&self, texts: &[String], max_tokens: usize) -> Pin<Box<dyn Future<Output = Result<String>> + Send>>;
 
    /// Extract semantic relations (subject-predicate-object triples).
    fn extract_relations(&self, text: &str) -> Pin<Box<dyn Future<Output = Result<Vec<ExtractedRelation>>> + Send>>;
 
    /// Generate embedding for an image (via vision → text → embed pipeline).
    fn embed_image(&self, data: &[u8], format: &str) -> Pin<Box<dyn Future<Output = Result<Vec<f32>>> + Send>>;
 
    /// Transcribe audio to text.
    fn transcribe_audio(&self, data: &[u8], format: &str) -> Pin<Box<dyn Future<Output = Result<String>> + Send>>;
 
    /// Advertise supported capabilities.
    fn capabilities(&self) -> ProviderCapabilities;
}

LLMAdapter Trait

The adapter trait handles context serialization for LLM consumption:

rust
pub trait LLMAdapter: Send + Sync {
    /// Serialize memories into LLM-consumable format within a token budget.
    fn serialize_context(&self, memories: &[MemoryRecord], budget_tokens: usize) -> String;
 
    /// Estimate token count for memory content.
    fn estimate_tokens(&self, content: &MemoryContent) -> usize;
 
    /// Model metadata.
    fn model_info(&self) -> ModelInfo;
}

Each adapter serializes memories optimally for its target model, respecting the model's context window and token limits.

Fallback Behavior

When no LLM provider is configured (provider = "none"):

OperationFallback Behavior
Embedding generationReturns empty vector (manual embeddings required)
SummarizationTruncates to first 200 characters
Relation extractionReturns empty list (no graph enrichment)
Image embeddingReturns ModalityUnsupported error
Audio transcriptionReturns ModalityUnsupported error

Error Handling and Retries

All LLM adapter HTTP requests share an in-tree exponential-backoff helper from cerememory-adapter-common. The backoff crate dependency was removed in 0.2.6 — Cerememory now carries the retry loop itself.

Default RetryPolicy:

FieldDefaultMeaning
initial_interval500 msFirst sleep before attempt 2
multiplier2.0Each subsequent interval doubles
max_interval30 sCap on a single sleep window
max_retries3Retries after the initial attempt — so up to 4 total tries

Errors are classified as either RetryError::Transient(_) (retried up to the cap) or RetryError::Permanent(_) (returned immediately). The OpenAI adapter classifies HTTP 429 / 5xx and network failures as transient; the Claude and Gemini adapters apply the same status-based rule. Auth (401), validation (400), and unknown-route errors fail fast as permanent.

The shared HTTP client is built with a 30 s request timeout (DEFAULT_TIMEOUT_SECS).

Next Steps

Configuration

Full configuration options including LLM settings

API Endpoints

HTTP REST endpoint reference