OpenLCM — Unbounded Memory. Bounded Context.

# ── problem ─────────────────────────────────────────────────────────

Every agent hits
the context cliff.

LLMs have a hard token limit. As conversations grow, agents either fail silently — or replace old turns with a flat, irreversible summary. Details fall out permanently.

✗ without openlcm

Lossy Compression

—Old turns replaced with flat, irreversible summary

—Decisions, constraints, file paths — permanently gone

—No way to recover original content

✓ with openlcm

Lossless Retrieval

+Every message persisted verbatim in SQLite

+Hierarchical DAG — D0 leaf → D1 arc → D2 durable

+Agent drills back with lcm_grep, lcm_expand, lcm_semantic_search

+Preferences & constraints persist across sessions — auto-injected before each turn

+Fact graph: tags, bidirectional links, contradiction detection

↓ Scroll through the demo

Traditional

Traditional Context Management

A standard LLM setup holds the conversation in a fixed token budget. Every message consumes space. The budget isn't just a hard cap — there needs to be room left over for the model to generate a summary when the time comes.

Early on, there's plenty of headroom. Messages accumulate. Tokens tick up.

Traditional

The Threshold

When context usage crosses a threshold — typically around 80% of the budget — the system fires a summarization call. As many messages as possible are sent to the model in one batch.

This call takes 1–2 minutes. The agent is blocked. And the result is just one flat summary.

Traditional

One Flat Summary

Everything is replaced by a single summary. Space is reclaimed — but details are lost.

A flat summary can't hold the full fidelity of a long conversation. The model will confidently misremember specifics, contradict earlier decisions, or ask again for information it was already given. And there's no way to go back.

↳ Enter LCM

There's a better way.

LCM starts the same way — a conversation accumulating messages — but instead of truncating, it compacts them into layered summaries where nothing is ever lost.

Each message is persisted to SQLite with a stable ID. The original is always recoverable.

LCM

Messages Arrive

Each user message and assistant reply appends to the context. Tokens accumulate. Early on there's plenty of headroom.

The most recent messages get a FRESH TAIL marker — they're protected from compaction so the model always sees the latest context verbatim.

LCM

Summary Compaction

When usage crosses the threshold, LCM sends the eligible chunk to the model with a structured prompt. A summary node (D0) is created — but the source messages are preserved in the immutable store.

Compaction Prompt · Depth 0

Summarize this conversation segment for future turns.
Preserve decisions, rationale, constraints, active tasks.
Remove repetition and conversational filler.

End with: "Expand for details about: <what was compressed>"

⚡ Fast Forward

The Conversation Keeps Growing

Each new cohort of messages outside the fresh tail automatically triggers another compaction pass. Watch the DAG grow as compactions accumulate.

The SUMMARY DAG below the context window shows each depth-0 node, its source range, and the raw messages it compresses.

sum_01 · Turns 1–4

sum_02 · Turns 5–8

LCM

Recalling from Compacted History

The user asks about something from 8 turns ago — well outside the fresh tail, already compacted into sum_01.

The agent calls lcm_expand(node_id=1). The engine retrieves the original 8 verbatim messages from the immutable store. The answer is complete and exact.

Nothing was lost. Nothing was slow. The DAG made it instant.

Context Window

Traditional

Token Budget 0 / 8,000 (0%)

Waiting for messages…

Summary DAG

0 d0

# ── solution ────────────────────────────────────────────────────────

A hierarchical DAG
that never forgets.

Old messages become D0 leaf nodes. Leaf nodes condense into D1 session arcs. Arcs condense into D2 durable history. Active context stays bounded. Everything stays queryable.

lcm_describe() · session=research-001

DAG structure (47 msgs → 12 nodes → 14.2K active tokens)

─── D2 durable arc ──────────────────

◈#12full session arc840t / 67K src

├── condenses ──▶

─── D1 session arc ──────────────────

◈#7morning session380t / 28K src

◈#9afternoon session420t / 31K src

├── condenses ──▶

─── D0 leaf (recent) ────────────────

◈#10turns 37–4295t

◈#11turns 43–4688t

└── fresh tail: turns 47–50 (raw)

active context = system + D2(#12) + tail(turns 47-50)

Ingest

Every message written to SQLite with FTS5 indexing. Stable store_id. Nothing discarded.

MessageStore · SQLite · FTS5

Compact

When pressure crosses threshold, oldest messages are summarized into D0 leaf nodes. 3-level escalation: L1 detail → L2 bullets → L3 deterministic.

3-level escalation · circuit breaker

Condense

When enough D0 nodes accumulate, they merge into D1 arcs. D1s merge into D2 durable history. Depth is unbounded.

DAG · D0 → D1 → D2 → ...

Retrieve

Agent uses lcm_grep (FTS5), lcm_expand (recover source), or lcm_expand_query (synthesize) to access any past moment.

lcm_grep · lcm_expand · lcm_expand_query

Remember

Preferences, constraints, and decisions persist across sessions in the fact store. Relevant facts are auto-injected before each compression turn — no explicit tool call required. Facts can be tagged, linked, and searched semantically.

lcm_remember · lcm_recall · lcm_link · lcm_semantic_search · auto-inject

# ── guarantees ──────────────────────────────────────────────────────

Three properties. Always.

◈

Lossless Retrievability

Every message persisted with stable store_id. Recoverable even if compacted 10 sessions ago.

lcm_expand · lcm_grep · lcm_load_session

✓

Deterministic Convergence

Summarization always terminates. L1 → L2 → L3 deterministic truncation. Circuit breaker prevents retry storms.

L1 → L2 → L3 · SummaryCircuitBreaker

⚡

Zero-Cost Continuity

Short conversations pay zero overhead. Compaction only fires when the configurable threshold is exceeded.

LCM_CONTEXT_THRESHOLD=0.75

◆

Cross-Session Memory

Preferences, constraints, and decisions survive session boundaries. lcm_remember saves. Facts are tagged, linked, and auto-injected into context — no manual retrieval call needed.

lcm_remember · lcm_recall · lcm_link · auto-inject

⬡

Intelligent Retrieval

Optional semantic search via sqlite-vec finds "JWT 24h expiry" from the query "auth token" — no keyword overlap needed. Falls back to FTS5 when disabled. Zero extra infra.

lcm_semantic_search · sqlite-vec · LCM_EMBEDDING_MODEL

◉

Fact Graph

Facts carry tags and bidirectional links. lcm_link captures causal chains. Contradiction detection surfaces old values when facts change. Lightweight graph — pure SQLite, no Neo4j.

tags · related_keys · lcm_link · contradiction detection

# ── adapters ────────────────────────────────────────────────────────

Works with your
existing stack.

Framework-agnostic. One import, zero changes to your agent logic.

LangGraph

BaseCheckpointSaver

Replace MemorySaver with LCMCheckpointer. thread_id maps to session_id.

from openlcm.adapters.langgraph import LCMCheckpointer
graph = StateGraph(MyState).compile(
  checkpointer=LCMCheckpointer(engine))

CrewAI

StorageBackend

Plug LCMStorage into LongTermMemory. Auto-compacted as sessions grow.

from openlcm.adapters.crewai import LCMStorage
crew = Crew(memory=True,
  long_term_memory=LongTermMemory(storage=LCMStorage(engine)))

AutoGen

ChatCompletionContext

LCMContext subclasses AutoGen's model context. Fully transparent to agent code.

from openlcm.adapters.autogen import LCMContext
agent = AssistantAgent("assistant",
  model_context=LCMContext(engine))

Google ADK

BaseSessionService

LCMSessionService replaces the default session service. Events compressed automatically.

from openlcm.adapters.google_adk import LCMSessionService
runner = Runner(agent=my_agent,
  session_service=LCMSessionService(engine))

# ── visualization ────────────────────────────────────────────────────

See every token.
In real time.

Run openlcm viz to open a live browser dashboard. Watch DAG nodes form as compactions fire, token pressure build turn by turn, and drill into any summary node.

OpenLCM

Live

Connected

Compressions3

DAG Nodes5

Messages47

⬡ Token Pressure

62.0%

Prompt 124K

Threshold 150K

Max 200K

◈ Summary DAG

D1 — Session Arc

840t

D0 — Leaf (Recent)

280t

310t

◎ Session

Sessionresearch-001

Platformlanggraph

Threshold75%

Fresh tail64 msgs

DB size1.2 MB

Sessions

research-001 · 47 msgs

dev-session-02 · 18 msgs

ASSISTANT · Turn 2

…set the redirect URI to http://localhost:3000/callback…

≡ Message Feed

assistant

RBAC can be layered on top of your OAuth2 setup…

user

How should I structure the permissions matrix?

assistant

A resource × action matrix works well here…

⏱ Event Timeline

compaction end

12 → 4 msgs · 48K → 4.2K tok

node added

D0 #5 · 310t ← 3,840t src

compaction start

16 msgs · 62K tokens

openlcm viz --port 7842

Opens http://localhost:7842 · auto-refreshes · zero config

⬡

Token Pressure Gauge

Animated bar tracking live prompt token count against the compaction threshold and context maximum. Turns green → amber → red as pressure builds.

real-time · color-coded · threshold marker

◈

Summary DAG Viewer

Live tree of all DAG nodes grouped by depth. Each card shows token count and compression ratio. Hover any node to preview its summary text.

D0 · D1 · D2 · expandable preview

≡

Message Feed

Scrolling live view of recent messages with role badges, token estimates, and compacted-into-node markers. New messages animate in as they arrive.

user · assistant · tool · system

⏱

Compaction Timeline

Chronological event log of every lifecycle event: session_bound, compaction_start, node_added, node_condensed, compaction_end, and token_pressure.

SSE stream · before/after stats

◎

Session Sidebar

Switch between stored sessions, inspect config (threshold, fresh tail, model), and run full-text FTS5 search across the entire immutable message history — including content already compacted into DAG nodes.

session switcher · lcm_grep · FTS5

# ── codebase intelligence ───────────────────────────────────────────

One scan.
Zero re-discovery.

Parse a repository once — AST → semantic graph persisted in SQLite. Agents navigate every file, class, function, and call relationship through tools instead of reading raw code. The graph survives sessions, machines, and context resets.

✗ without LST — every session, every agent

agent · session-3 (context reset)

# session starts cold — must rediscover

Read("core/engine.py") +4,102 t

Read("payments/service.py") +3,241 t

Read("api/routes.py") +2,891 t

Read("models/user.py") +1,820 t

↑ same files as session 1 and session 2

0 tokens wasted on re-discovery

→

✓ with LST — scan once, query from any session

openlcm · session-3

# session starts with full orientation

lcm_lst_context() 487 tokens

◈ FastAPI fastapi/applications.py:41

◈ APIRouter fastapi/routing.py:418

◈ HTTPException fastapi/exceptions.py:17

◈ last session: fixed include_router bug

tools: lcm_lst_class · lcm_lst_callers ···

487 tokens · complete structural awareness

1×

scan per repo

incremental after that

10×

fewer tokens

on repeat file reads

∞

sessions served

graph lives in SQLite

100+

languages

Python AST + Universal Ctags

⬡

scan repo

openlcm scan repo .
or github.com/user/repo
incremental on re-run

◈

semantic graph

files · classes · functions
call edges · imports
docstrings · signatures

◆

agent queries

13 LST tools
lcm_lst_class · callers
lcm_read_file · context

◉

facts pinned

discoveries linked to symbols
surface automatically
in every future session

lcm_lst_contextcall first

Full repo orientation in ~500 tokens — key classes with docstrings, entry points, most active files, recent session history from the DAG. Zero file reads.

→ replaces the entire "let me explore the codebase" phase

lcm_read_file10× efficient

First read: full content. Every repeat read this session: compact LST structural view — class/function tree with signatures and docstrings, not 3000 raw lines.

→ session-level file deduplication

lcm_lst_class+ facts

Class definition, all methods with full signatures, docstrings, and any facts previously pinned to this class by the agent. No file read needed.

→ pinned discoveries surface automatically

lcm_lst_callers

Who calls this function — traverses the call graph. No grep across 1000 files. Results in milliseconds from a SQLite index.

→ call graph traversal, O(1)

lcm_lst_factscross-session

All agent discoveries pinned to a symbol from previous sessions. Rate limit bugs, hidden constraints, known gotchas — found once, remembered forever.

→ lcm_remember(symbol="Class.method")

lcm_lst_find

FTS5 search across all symbols, qualified names, signatures, and docstrings. Finds "PaymentService" or "rate limit" instantly across the entire repo.

→ FTS5 · instant across any repo size

◉ Symbol-pinned memory — agent discoveries survive every session boundary

session 4 — agent finds a bug

# pin discovery to the exact symbol
lcm_remember(
  key="payments.charge.ratelimit",
  value="Stripe hits 100 req/s — backoff needed",
  symbol="PaymentService.charge",
  category="constraint"
)

session 7 — new agent, new context

# query the class — fact surfaces automatically
lcm_lst_class("PaymentService")

→ class PaymentService (payments/service.py:12)
  def charge(amount, currency) — :45
  def refund(charge_id) — :89
  ...
  pinned_facts: [{
    constraint: "Stripe hits 100 req/s
                 — backoff needed"
  }]

CLI — scan any repo

# local path
$ openlcm scan repo /path/to/myapp

# or GitHub URL — auto-clones + caches
$ openlcm scan repo https://github.com/fastapi/fastapi

  Scanned:   1,120 files  (0 skipped, 0 errors)
  Languages: python=1120
  Symbols:   6,135
  Edges:     19,016
  Branch:    master · 5cdf820c8046

# interactive graph in browser
$ openlcm scan visualize
  → HTML force-directed graph, opens in browser

Python SDK

from openlcm import LCMEngine
from openlcm.code.graph import LSTGraph
from openlcm.code.scanner import RepoScanner

engine = LCMEngine(model="...")
graph  = LSTGraph()          # same .db as engine
scanner = RepoScanner()
scanner.scan(".", graph, repo_id="myapp")
engine.attach_lst(graph, repo_id="myapp")

engine.bind_session(session_id)
# orientation injected into every compress()
ctx = engine.get_lst_context()
# → [OpenLCM Repo Context: myapp]
#    Files: 62 · Symbols: 866 · ...

# ── benchmarks ──────────────────────────────────────────────────────

Proven at scale.

OOLONG benchmark (Opus 4.6). 8K to 1M token contexts. LCM advantage begins at 32K+.

context	openlcm (volt)	claude code	delta
8K–16K	~equal	~equal	—
32K	+12.4	+8.1	+4.3
128K	+28.6	+21.4	+7.2
512K	+42.4	+29.8	+12.6
1M	+51.3	+47.0	+4.3
avg all ctx	74.8	70.3	+4.5

74.8

avg score (vs 70.3)

+12.6

extra pts at 512K

32K+

LCM advantage starts

max ctx tested

# ── quickstart ──────────────────────────────────────────────────────

Up and running
in 5 minutes.

step 01

install

All adapters + providers included — no extras needed.

step 02

create engine

from openlcm import LCMEngine
from openlcm.backends.anthropic import AnthropicBackend

engine = LCMEngine(
  backend=AnthropicBackend(
    model="claude-haiku-4-5-20251001"),
  db_path="~/.openlcm/myapp.db",
)
engine.bind_session("s-1", context_length=200_000)

step 03

compress each turn

async def agent_turn(messages):
  # LCM fires automatically when needed
  messages = await engine.compress(messages)
  response = await llm.chat(messages)
  messages.append(response)
  engine.update_from_response(response.usage)
  return messages

step 04

use the CLI

# live dashboard at localhost:7842
$ openlcm viz

# search conversation history
$ openlcm grep "security constraints"

# session stats + export
$ openlcm status
$ openlcm export session-1 -o convo.json

UnboundedMemory.BoundedContext.

Every agent hitsthe context cliff.

A hierarchical DAGthat never forgets.

Ingest

Compact

Condense

Retrieve

Remember

Three properties. Always.

Works with yourexisting stack.

See every token.In real time.

One scan.Zero re-discovery.

Proven at scale.

Up and runningin 5 minutes.

Your agents deserve memorythat lasts.

Unbounded
Memory.
Bounded
Context.

Every agent hits
the context cliff.

A hierarchical DAG
that never forgets.

Works with your
existing stack.

See every token.
In real time.

One scan.
Zero re-discovery.

Up and running
in 5 minutes.

Your agents deserve memory
that lasts.