OpenLCM is a drop-in Python SDK that gives your AI agents a permanent, searchable, lossless memory — without hitting the context limit. Works with LangGraph, CrewAI, AutoGen, Google ADK and any Python framework.
LLMs have a hard token limit. As conversations grow, agents either fail silently — or replace old turns with a flat, irreversible summary. Details fall out permanently.
lcm_grep, lcm_expandA standard LLM setup holds the conversation in a fixed token budget. Every message consumes space. The budget isn't just a hard cap — there needs to be room left over for the model to generate a summary when the time comes.
Early on, there's plenty of headroom. Messages accumulate. Tokens tick up.
When context usage crosses a threshold — typically around 80% of the budget — the system fires a summarization call. As many messages as possible are sent to the model in one batch.
This call takes 1–2 minutes. The agent is blocked. And the result is just one flat summary.
Everything is replaced by a single summary. Space is reclaimed — but details are lost.
A flat summary can't hold the full fidelity of a long conversation. The model will confidently misremember specifics, contradict earlier decisions, or ask again for information it was already given. And there's no way to go back.
LCM starts the same way — a conversation accumulating messages — but instead of truncating, it compacts them into layered summaries where nothing is ever lost.
Each message is persisted to SQLite with a stable ID. The original is always recoverable.
Each user message and assistant reply appends to the context. Tokens accumulate. Early on there's plenty of headroom.
The most recent messages get a FRESH TAIL marker — they're protected from compaction so the model always sees the latest context verbatim.
When usage crosses the threshold, LCM sends the eligible chunk to the model with a structured prompt. A summary node (D0) is created — but the source messages are preserved in the immutable store.
Each new cohort of messages outside the fresh tail automatically triggers another compaction pass. Watch the DAG grow as compactions accumulate.
The SUMMARY DAG below the context window shows each depth-0 node, its source range, and the raw messages it compresses.
The user asks about something from 8 turns ago — well outside the fresh tail, already compacted into sum_01.
The agent calls lcm_expand(node_id=1). The engine retrieves the original 8 verbatim messages from the immutable store. The answer is complete and exact.
Nothing was lost. Nothing was slow. The DAG made it instant.
Old messages become D0 leaf nodes. Leaf nodes condense into D1 session arcs. Arcs condense into D2 durable history. Active context stays bounded. Everything stays queryable.
Every message written to SQLite with FTS5 indexing. Stable store_id. Nothing discarded.
When pressure crosses threshold, oldest messages are summarized into D0 leaf nodes. 3-level escalation: L1 detail → L2 bullets → L3 deterministic.
3-level escalation · circuit breakerWhen enough D0 nodes accumulate, they merge into D1 arcs. D1s merge into D2 durable history. Depth is unbounded.
DAG · D0 → D1 → D2 → ...Agent uses lcm_grep (FTS5), lcm_expand (recover source), or lcm_expand_query (synthesize) to access any past moment.
Every message persisted with stable store_id. Recoverable even if compacted 10 sessions ago.
Summarization always terminates. L1 → L2 → L3 deterministic truncation. Circuit breaker prevents retry storms.
L1 → L2 → L3 · SummaryCircuitBreakerShort conversations pay zero overhead. Compaction only fires when the configurable threshold is exceeded.
LCM_CONTEXT_THRESHOLD=0.75Framework-agnostic. One import, zero changes to your agent logic.
Replace MemorySaver with LCMCheckpointer. thread_id maps to session_id.
Plug LCMStorage into LongTermMemory. Auto-compacted as sessions grow.
LCMContext subclasses AutoGen's model context. Fully transparent to agent code.
LCMSessionService replaces the default session service. Events compressed automatically.
Run openlcm viz to open a live browser dashboard. Watch DAG nodes form as compactions fire, token pressure build turn by turn, and drill into any summary node.
http://localhost:7842 · auto-refreshes · zero config
session_bound, compaction_start, node_added, node_condensed, compaction_end, and token_pressure.OOLONG benchmark (Opus 4.6). 8K to 1M token contexts. LCM advantage begins at 32K+.
| context | openlcm (volt) | claude code | delta | visual |
|---|---|---|---|---|
| 8K–16K | ~equal | ~equal | — | |
| 32K | +12.4 | +8.1 | +4.3 | |
| 128K | +28.6 | +21.4 | +7.2 | |
| 512K | +42.4 | +29.8 | +12.6 | |
| 1M | +51.3 | +47.0 | +4.3 | |
| avg all ctx | 74.8 | 70.3 | +4.5 |
Zero required deps. Provider agnostic. Drop-in for any Python framework.