framework-agnostic lossless context management

Unbounded
Memory.
Bounded
Context.

OpenLCM is a drop-in Python SDK that gives your AI agents a permanent, searchable, lossless memory — without hitting the context limit. Works with LangGraph, CrewAI, AutoGen, Google ADK and any Python framework.

$ pip install openlcm
~/agent via ◆ py 3.12
python3 agent.py LCM bound · session=research-001 context=200,000 threshold=75% turn 47 · prompt=148,320 tok (74.2%) compaction triggered... lcm_compress(messages=47, chunk=20k) L1 summary · 3,840 → 280 tokens node_added D0 #12 · 280t ← 3,840t src 47 msgs → 8 msgs · 148K → 14K tokens all 47 messages recoverable turn 48 ready
# ── problem ─────────────────────────────────────────────────────────

Every agent hits
the context cliff.

LLMs have a hard token limit. As conversations grow, agents either fail silently — or replace old turns with a flat, irreversible summary. Details fall out permanently.

✗ without openlcm
Lossy Compression
Old turns replaced with flat, irreversible summary
Decisions, constraints, file paths — permanently gone
No way to recover original content
✓ with openlcm
Lossless Retrieval
+Every message persisted verbatim in SQLite
+Hierarchical DAG — D0 leaf → D1 arc → D2 durable
+Agent drills back with lcm_grep, lcm_expand
↓ Scroll through the demo
Traditional
Traditional Context Management

A standard LLM setup holds the conversation in a fixed token budget. Every message consumes space. The budget isn't just a hard cap — there needs to be room left over for the model to generate a summary when the time comes.

Early on, there's plenty of headroom. Messages accumulate. Tokens tick up.

Traditional
The Threshold

When context usage crosses a threshold — typically around 80% of the budget — the system fires a summarization call. As many messages as possible are sent to the model in one batch.

This call takes 1–2 minutes. The agent is blocked. And the result is just one flat summary.

Traditional
One Flat Summary

Everything is replaced by a single summary. Space is reclaimed — but details are lost.

A flat summary can't hold the full fidelity of a long conversation. The model will confidently misremember specifics, contradict earlier decisions, or ask again for information it was already given. And there's no way to go back.

↳ Enter LCM
There's a better way.

LCM starts the same way — a conversation accumulating messages — but instead of truncating, it compacts them into layered summaries where nothing is ever lost.

Each message is persisted to SQLite with a stable ID. The original is always recoverable.

LCM
Messages Arrive

Each user message and assistant reply appends to the context. Tokens accumulate. Early on there's plenty of headroom.

The most recent messages get a FRESH TAIL marker — they're protected from compaction so the model always sees the latest context verbatim.

LCM
Summary Compaction

When usage crosses the threshold, LCM sends the eligible chunk to the model with a structured prompt. A summary node (D0) is created — but the source messages are preserved in the immutable store.

Compaction Prompt · Depth 0
Summarize this conversation segment for future turns.
Preserve decisions, rationale, constraints, active tasks.
Remove repetition and conversational filler.

End with: "Expand for details about: <what was compressed>"
⚡ Fast Forward
The Conversation Keeps Growing

Each new cohort of messages outside the fresh tail automatically triggers another compaction pass. Watch the DAG grow as compactions accumulate.

The SUMMARY DAG below the context window shows each depth-0 node, its source range, and the raw messages it compresses.

sum_01 · Turns 1–4
sum_02 · Turns 5–8
LCM
Recalling from Compacted History

The user asks about something from 8 turns ago — well outside the fresh tail, already compacted into sum_01.

The agent calls lcm_expand(node_id=1). The engine retrieves the original 8 verbatim messages from the immutable store. The answer is complete and exact.

Nothing was lost. Nothing was slow. The DAG made it instant.

Threshold crossed — summarization will fire
Context Window
Traditional
Token Budget 0 / 8,000 (0%)
Waiting for messages…
Summary DAG
0 d0
# ── solution ────────────────────────────────────────────────────────

A hierarchical DAG
that never forgets.

Old messages become D0 leaf nodes. Leaf nodes condense into D1 session arcs. Arcs condense into D2 durable history. Active context stays bounded. Everything stays queryable.

lcm_describe() · session=research-001
DAG structure (47 msgs → 12 nodes → 14.2K active tokens)
─── D2 durable arc ──────────────────
#12full session arc840t / 67K src
├── condenses ──▶
─── D1 session arc ──────────────────
#7morning session380t / 28K src
#9afternoon session420t / 31K src
├── condenses ──▶
─── D0 leaf (recent) ────────────────
#10turns 37–4295t
#11turns 43–4688t
└── fresh tail: turns 47–50 (raw)
active context = system + D2(#12) + tail(turns 47-50)
1

Ingest

Every message written to SQLite with FTS5 indexing. Stable store_id. Nothing discarded.

MessageStore · SQLite · FTS5
2

Compact

When pressure crosses threshold, oldest messages are summarized into D0 leaf nodes. 3-level escalation: L1 detail → L2 bullets → L3 deterministic.

3-level escalation · circuit breaker
3

Condense

When enough D0 nodes accumulate, they merge into D1 arcs. D1s merge into D2 durable history. Depth is unbounded.

DAG · D0 → D1 → D2 → ...
4

Retrieve

Agent uses lcm_grep (FTS5), lcm_expand (recover source), or lcm_expand_query (synthesize) to access any past moment.

lcm_grep · lcm_expand · lcm_expand_query
# ── guarantees ──────────────────────────────────────────────────────

Three properties. Always.

Lossless Retrievability

Every message persisted with stable store_id. Recoverable even if compacted 10 sessions ago.

lcm_expand · lcm_grep · lcm_load_session
Deterministic Convergence

Summarization always terminates. L1 → L2 → L3 deterministic truncation. Circuit breaker prevents retry storms.

L1 → L2 → L3 · SummaryCircuitBreaker
Zero-Cost Continuity

Short conversations pay zero overhead. Compaction only fires when the configurable threshold is exceeded.

LCM_CONTEXT_THRESHOLD=0.75
# ── adapters ────────────────────────────────────────────────────────

Works with your
existing stack.

Framework-agnostic. One import, zero changes to your agent logic.

LG
LangGraph
BaseCheckpointSaver

Replace MemorySaver with LCMCheckpointer. thread_id maps to session_id.

from openlcm.adapters.langgraph import LCMCheckpointer
graph = StateGraph(MyState).compile(
  checkpointer=LCMCheckpointer(engine))
CR
CrewAI
StorageBackend

Plug LCMStorage into LongTermMemory. Auto-compacted as sessions grow.

from openlcm.adapters.crewai import LCMStorage
crew = Crew(memory=True,
  long_term_memory=LongTermMemory(storage=LCMStorage(engine)))
AG
AutoGen
ChatCompletionContext

LCMContext subclasses AutoGen's model context. Fully transparent to agent code.

from openlcm.adapters.autogen import LCMContext
agent = AssistantAgent("assistant",
  model_context=LCMContext(engine))
GK
Google ADK
BaseSessionService

LCMSessionService replaces the default session service. Events compressed automatically.

from openlcm.adapters.google_adk import LCMSessionService
runner = Runner(agent=my_agent,
  session_service=LCMSessionService(engine))
# ── visualization ────────────────────────────────────────────────────

See every token.
In real time.

Run openlcm viz to open a live browser dashboard. Watch DAG nodes form as compactions fire, token pressure build turn by turn, and drill into any summary node.

Live
Connected
Compressions3
DAG Nodes5
Messages47
⬡ Token Pressure
62.0%
Prompt 124K
Threshold 150K
Max 200K
◈ Summary DAG
D1 — Session Arc
#3
840t
D0 — Leaf (Recent)
#4
280t
#5
310t
◎ Session
Sessionresearch-001
Platformlanggraph
Threshold75%
Fresh tail64 msgs
DB size1.2 MB
Sessions
research-001 · 47 msgs
dev-session-02 · 18 msgs
ASSISTANT · Turn 2
…set the redirect URI to http://localhost:3000/callback
≡ Message Feed
assistant
RBAC can be layered on top of your OAuth2 setup…
user
How should I structure the permissions matrix?
assistant
A resource × action matrix works well here…
⏱ Event Timeline
compaction end
12 → 4 msgs · 48K → 4.2K tok
node added
D0 #5 · 310t ← 3,840t src
compaction start
16 msgs · 62K tokens
$
openlcm viz --port 7842
Opens http://localhost:7842 · auto-refreshes · zero config
Token Pressure Gauge
Animated bar tracking live prompt token count against the compaction threshold and context maximum. Turns green → amber → red as pressure builds.
real-time · color-coded · threshold marker
Summary DAG Viewer
Live tree of all DAG nodes grouped by depth. Each card shows token count and compression ratio. Hover any node to preview its summary text.
D0 · D1 · D2 · expandable preview
Message Feed
Scrolling live view of recent messages with role badges, token estimates, and compacted-into-node markers. New messages animate in as they arrive.
user · assistant · tool · system
Compaction Timeline
Chronological event log of every lifecycle event: session_bound, compaction_start, node_added, node_condensed, compaction_end, and token_pressure.
SSE stream · before/after stats
Session Sidebar
Switch between stored sessions, inspect config (threshold, fresh tail, model), and run full-text FTS5 search across the entire immutable message history — including content already compacted into DAG nodes.
session switcher · lcm_grep · FTS5
# ── benchmarks ──────────────────────────────────────────────────────

Proven at scale.

OOLONG benchmark (Opus 4.6). 8K to 1M token contexts. LCM advantage begins at 32K+.

contextopenlcm (volt)claude codedeltavisual
8K–16K~equal~equal
32K+12.4+8.1+4.3
128K+28.6+21.4+7.2
512K+42.4+29.8+12.6
1M+51.3+47.0+4.3
avg all ctx74.870.3+4.5
74.8
avg score (vs 70.3)
+12.6
extra pts at 512K
32K+
LCM advantage starts
1M
max ctx tested
# ── quickstart ──────────────────────────────────────────────────────

Up and running
in 5 minutes.

step 01
install
$pip install openlcm
All adapters + providers included — no extras needed.
step 02
create engine
from openlcm import LCMEngine
from openlcm.backends.anthropic import AnthropicBackend

engine = LCMEngine(
  backend=AnthropicBackend(
    model="claude-haiku-4-5-20251001"),
  db_path="~/.openlcm/myapp.db",
)
engine.bind_session("s-1", context_length=200_000)
step 03
compress each turn
async def agent_turn(messages):
  # LCM fires automatically when needed
  messages = await engine.compress(messages)
  response = await llm.chat(messages)
  messages.append(response)
  engine.update_from_response(response.usage)
  return messages
step 04
use the CLI
# live dashboard at localhost:7842
$ openlcm viz

# search conversation history
$ openlcm grep "security constraints"

# session stats + export
$ openlcm status
$ openlcm export session-1 -o convo.json
◆ openlcm v0.1.0 · open source · MIT

Your agents deserve memory
that lasts.

Zero required deps. Provider agnostic. Drop-in for any Python framework.

$ pip install openlcm