LLM Adapters
Configure and use Claude, GPT, and Gemini adapters for embedding, summarization, and relation extraction.
Overview
Cerememory's standard mode runs entirely without an external LLM. The CLI default features (default = []) ship with no provider compiled in, and [llm].provider = "none" is the supported production configuration. External adapters are an optional, experimental extension for callers that want automatic embeddings, summarization, or relation extraction in-engine.
When you do enable an adapter, two trait-based interfaces drive the integration:
- LLMProvider -- Handles embedding generation, summarization, and relation extraction
- LLMAdapter -- Serializes memory context into LLM-consumable format
Without a configured provider, Cerememory uses caller-supplied embeddings, returns truncation summaries, and skips relation extraction. Most agentic workflows produce richer summaries in the calling agent itself, then encode them into Cerememory through store / update (with meta_json for the why), so adapters are rarely the right answer.
Building With an Adapter
The CLI default features include no LLM adapter. Opt in at build time:
# Single provider
cargo build -p cerememory-cli --release --features llm-claude
# Multiple providers
cargo build -p cerememory-cli --release --features "llm-claude,llm-openai,llm-gemini"A binary built without the matching feature flag will report the configured provider as unsupported at startup and refuse to serve until you either rebuild or set [llm].provider = "none".
Server Auth Key vs LLM Provider Key
These are different settings and solve different problems:
auth.api_keysprotects access to the Cerememory server itselfllm.api_keylets Cerememory call OpenAI, Anthropic, or Gemini
You only need llm.api_key for features that actually invoke a provider.
Supported Providers
| Provider | Config Value | Crate | Features |
|---|---|---|---|
| Anthropic Claude | claude or anthropic | adapter-claude | Summarization, relation extraction |
| OpenAI GPT | openai | adapter-openai | Text embedding, summarization, relation extraction, image embedding, audio transcription |
| Google Gemini | gemini or google | adapter-gemini | Text embedding, summarization, relation extraction, image embedding |
Configuration
[llm]
provider = "claude"
api_key = "sk-ant-api03-..."
model = "claude-sonnet-4-20250514" # optional
base_url = null # optional, for proxy setupsEnvironment variables:
export CEREMEMORY_LLM__PROVIDER=claude
export CEREMEMORY_LLM__API_KEY=sk-ant-api03-...
export CEREMEMORY_LLM__MODEL=claude-sonnet-4-20250514Provider Capabilities
Each provider advertises its capabilities via the ProviderCapabilities struct:
pub struct ProviderCapabilities {
pub text_embedding: bool, // generate text embeddings
pub image_embedding: bool, // describe image → embed description
pub audio_transcription: bool, // transcribe audio to text
}| Capability | Claude | OpenAI | Gemini | No Provider |
|---|---|---|---|---|
| Text embedding | No | Yes | Yes | No |
| Image embedding | No | Yes (vision) | Yes (vision) | No |
| Audio transcription | No | Yes (Whisper) | No | No |
| Summarization | Yes | Yes | Yes | Truncation fallback |
| Relation extraction | Yes | Yes | Yes | Skipped |
Which Features Actually Need An LLM Key?
Requires llm.api_key:
- Automatic embedding generation
- Image recall cues and image embedding
- Audio recall cues and transcription
- Consolidation summarization
- Relation extraction
Does not require llm.api_key:
- Basic text/structured store and recall
- Search by precomputed embeddings you provide yourself
- Timeline, graph, inspect, stats, forget, export, import
LLMProvider Trait
The provider trait defines the core LLM operations:
pub trait LLMProvider: Send + Sync {
/// Generate an embedding vector for text.
fn embed(&self, text: &str) -> Pin<Box<dyn Future<Output = Result<Vec<f32>>> + Send>>;
/// Summarize multiple texts into a concise summary.
fn summarize(&self, texts: &[String], max_tokens: usize) -> Pin<Box<dyn Future<Output = Result<String>> + Send>>;
/// Extract semantic relations (subject-predicate-object triples).
fn extract_relations(&self, text: &str) -> Pin<Box<dyn Future<Output = Result<Vec<ExtractedRelation>>> + Send>>;
/// Generate embedding for an image (via vision → text → embed pipeline).
fn embed_image(&self, data: &[u8], format: &str) -> Pin<Box<dyn Future<Output = Result<Vec<f32>>> + Send>>;
/// Transcribe audio to text.
fn transcribe_audio(&self, data: &[u8], format: &str) -> Pin<Box<dyn Future<Output = Result<String>> + Send>>;
/// Advertise supported capabilities.
fn capabilities(&self) -> ProviderCapabilities;
}LLMAdapter Trait
The adapter trait handles context serialization for LLM consumption:
pub trait LLMAdapter: Send + Sync {
/// Serialize memories into LLM-consumable format within a token budget.
fn serialize_context(&self, memories: &[MemoryRecord], budget_tokens: usize) -> String;
/// Estimate token count for memory content.
fn estimate_tokens(&self, content: &MemoryContent) -> usize;
/// Model metadata.
fn model_info(&self) -> ModelInfo;
}Each adapter serializes memories optimally for its target model, respecting the model's context window and token limits.
Fallback Behavior
When no LLM provider is configured (provider = "none"):
| Operation | Fallback Behavior |
|---|---|
| Embedding generation | Returns empty vector (manual embeddings required) |
| Summarization | Truncates to first 200 characters |
| Relation extraction | Returns empty list (no graph enrichment) |
| Image embedding | Returns ModalityUnsupported error |
| Audio transcription | Returns ModalityUnsupported error |
Error Handling and Retries
All LLM adapter HTTP requests share an in-tree exponential-backoff helper from cerememory-adapter-common. The backoff crate dependency was removed in 0.2.6 — Cerememory now carries the retry loop itself.
Default RetryPolicy:
| Field | Default | Meaning |
|---|---|---|
initial_interval | 500 ms | First sleep before attempt 2 |
multiplier | 2.0 | Each subsequent interval doubles |
max_interval | 30 s | Cap on a single sleep window |
max_retries | 3 | Retries after the initial attempt — so up to 4 total tries |
Errors are classified as either RetryError::Transient(_) (retried up to the cap) or RetryError::Permanent(_) (returned immediately). The OpenAI adapter classifies HTTP 429 / 5xx and network failures as transient; the Claude and Gemini adapters apply the same status-based rule. Auth (401), validation (400), and unknown-route errors fail fast as permanent.
The shared HTTP client is built with a 30 s request timeout (DEFAULT_TIMEOUT_SECS).
Next Steps
Full configuration options including LLM settings
HTTP REST endpoint reference