OpenLCM — Documentation

getting started

Installation

OpenLCM requires Python 3.10+ and SQLite (stdlib). One install — all adapters and providers included.

All framework adapters (LangGraph, Google ADK, AutoGen, CrewAI, LlamaIndex, Haystack), all provider SDKs (OpenAI, Anthropic, Gemini), and the live dashboard are included. No extras needed.

Tip — reuse your existing LLM client Every adapter accepts an llm= kwarg so you can pass your existing model client instead of configuring a separate one for summarization. No extra API keys needed.

getting started

Quick Start

The minimal pattern: create an engine, bind a session, call compress() before each LLM turn.

minimal example (any framework)python
import asyncio
from openlcm import LCMEngine

# 1. Create engine — any LiteLLM model string works
engine = LCMEngine(
    model="anthropic/claude-haiku-4-5-20251001",
    db_path="~/.openlcm/myapp.db",
)

# 2. Bind a session and declare context size
engine.bind_session("session-001", context_length=200_000)

# 3. Call compress() before every LLM turn
async def agent_turn(messages: list[dict], user_input: str) -> str:
    messages.append({"role": "user", "content": user_input})

    # LCM compresses automatically when threshold is exceeded
    messages = await engine.compress(messages)

    response = await my_llm.chat(messages)          # your LLM call
    messages.append(response)

    # Report token usage so LCM can track pressure
    engine.update_from_response(response.usage)
    return response.content, messages

asyncio.run(agent_turn([], "Hello!"))

LCM internal format compress() expects and returns a list of dicts: {"role": "user"|"assistant"|"system"|"tool", "content": "string"}. Tool calls are serialized as JSON in the content field. Use the framework message converters (see Message Converters) to convert from framework-native types.

getting started

Configuration

All parameters can be set in code via LCMConfig, via environment variables, or via a config.yaml file.

LCMConfig — all knobspython
from openlcm.core.config import LCMConfig
from openlcm import LCMEngine

config = LCMConfig.from_env()  # starts from defaults + env overrides

# ── Compression trigger ────────────────────────────────────────────────────
config.context_threshold  = 0.75   # compress at 75% of context window (default)
                                       # range: 0.30 – 0.95

# ── Fresh tail ─────────────────────────────────────────────────────────────
config.fresh_tail_count   = 64     # protect last N messages from compression (default: 64)
                                       # set lower (e.g. 8) for tool-heavy agents

# ── Leaf chunk size ────────────────────────────────────────────────────────
config.leaf_chunk_tokens  = 20_000 # tokens per D0 leaf summary (default: 20,000)

# ── DAG arc creation ───────────────────────────────────────────────────────
config.condensation_fanin = 4      # D0 nodes before creating a D1 arc (default: 4)
                                       # lower = arc nodes created sooner

engine = LCMEngine(model="...", config=config)

Environment variables

Variable	Type	Default	Description
LCM_CONTEXT_THRESHOLD	float	0.75	Compression trigger as fraction of context window
LCM_FRESH_TAIL_COUNT	int	64	Messages protected from compression at tail
LCM_LEAF_CHUNK_TOKENS	int	20000	Tokens per D0 leaf summary chunk
LCM_CONDENSATION_FANIN	int	4	D0 nodes required before D1 arc is created
LCM_DB_PATH	str	~/.openlcm/lcm.db	SQLite database path
LCM_AUTO_INJECT_MEMORY	bool	false	Auto-inject relevant facts & history into context before each compression (no LLM call — keyword-based)
LCM_AUTO_INJECT_TOP_K	int	5	Max facts injected per compression call when auto-inject is enabled
LCM_EXTRACTION_TO_FACTS_ENABLED	bool	false	Auto-extract decisions/constraints/preferences from each new D0 summary into the fact store
LCM_AUTO_PIN_PATTERNS	str	""	Comma-separated groups to auto-pin: `constraint`, `error`, `correction`
LCM_EMBEDDING_MODEL	str	""	LiteLLM model for embeddings (e.g. `openai/text-embedding-3-small`). Enables `lcm_semantic_search`.

getting started

Core Concepts

Two-layer architecture

LCM has two independent stores that work together:

Immutable Message Store — every message written verbatim to SQLite with a stable store_id. Never modified, never deleted. FTS5-indexed for full-text search.

Summary DAG — a directed acyclic graph of summary nodes. D0 leaf → D1 arc → D2 durable. Each node points back to the source message range it compresses.

DAG depth levels

Depth	Name	Created when
D0	Leaf node	Context threshold exceeded; oldest messages outside fresh tail are summarized
D1	Session arc	condensation_fanin D0 nodes have accumulated
D2+	Durable history	condensation_fanin D1 nodes have accumulated (unbounded depth)

Active context formula

what the model sees each turntext
active_context = system_prompt
                + highest_dag_node (D2 or D1 if no D2)
                + recent_d0_nodes  (any D0 not yet condensed)
                + fresh_tail       (last N raw messages, verbatim)

Sessions

One SQLite DB file holds all sessions. bind_session() sets the active session and context window size. Multiple agents can share one DB with different session IDs.

session managementpython
# One DB, multiple sessions
engine = LCMEngine(model="...", db_path="shared.db")

engine.bind_session("user-alice", context_length=128_000)
engine.bind_session("user-bob",   context_length=200_000)

# Get live stats for the current session
status = engine.get_status()
# → {"store_messages": 47, "dag_nodes": 5, "compression_count": 3,
#    "last_prompt_tokens": 14200, ...}

framework adapters

LangGraph

Two integration points: LCMCheckpointer (graph persistence) and LangChainMessages (explicit compression inside a node).

Option A — LCMCheckpointer (recommended)

Drop-in replacement for MemorySaver. LCM compresses checkpoint state automatically before each graph run.

langgraph_agent.pypython
from langgraph.graph import StateGraph, START, END
from openlcm import LCMEngine
from openlcm.adapters.langgraph import LCMCheckpointer

engine = LCMEngine(model="anthropic/claude-haiku-4-5-20251001")
engine.bind_session("lg-session", context_length=200_000)

# Replace MemorySaver with LCMCheckpointer — no other changes needed
graph = StateGraph(MyState).compile(
    checkpointer=LCMCheckpointer(engine)
)

# thread_id maps to session_id automatically
config = {"configurable": {"thread_id": "lg-session"}}
result = await graph.ainvoke({"messages": [...]}, config)

Option B — Manual compression inside a node

Use LangChainMessages to convert messages, check pressure, and compress explicitly. Gives you full control over when compression fires.

langgraph_manual.py — complete working example with toolspython
import asyncio
from typing import Annotated
from typing_extensions import TypedDict
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from openlcm import LCMEngine
from openlcm.core.config import LCMConfig
from openlcm.adapters.langchain import LangChainMessages

# ── Tools ─────────────────────────────────────────────────────────────────
@tool
def get_weather(city: str) -> dict:
    """Get current weather for a city."""
    return {"city": city, "temp_c": 22, "condition": "Sunny"}

tools = [get_weather]

# ── LLM + Engine ──────────────────────────────────────────────────────────
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
llm_with_tools = llm.bind_tools(tools)

config = LCMConfig.from_env()
config.context_threshold = 0.60
config.fresh_tail_count  = 8

engine = LCMEngine(summarize_fn=llm, config=config)
engine.bind_session("demo", context_length=6_000)

# ── State ─────────────────────────────────────────────────────────────────
class State(TypedDict):
    messages: Annotated[list, add_messages]

SYSTEM = SystemMessage(content="You are a helpful assistant with weather tools.")

# ── Nodes ─────────────────────────────────────────────────────────────────
async def chatbot(state: State):
    messages = state["messages"]
    if not messages or not isinstance(messages[0], SystemMessage):
        messages = [SYSTEM] + list(messages)

    # Convert to LCM format → compress if needed → convert back
    lcm_msgs = LangChainMessages.to_lcm(messages)
    if engine.should_compress_preflight(lcm_msgs):
        lcm_msgs = await engine.compress(lcm_msgs)
        messages = LangChainMessages.from_lcm(lcm_msgs)

    response = await llm_with_tools.ainvoke(messages)
    return {"messages": [response]}

tool_node = ToolNode(tools)

# ── Graph ─────────────────────────────────────────────────────────────────
builder = StateGraph(State)
builder.add_node("chatbot", chatbot)
builder.add_node("tools",   tool_node)
builder.add_edge(START, "chatbot")
builder.add_conditional_edges("chatbot", tools_condition)
builder.add_edge("tools", "chatbot")
graph = builder.compile()

async def main():
    conversation = []
    while True:
        user_input = input("You: ")
        conversation.append(HumanMessage(content=user_input))
        result = await graph.ainvoke({"messages": conversation})
        conversation = result["messages"]
        print(f"Agent: {conversation[-1].content}")

asyncio.run(main())

framework adapters

Google ADK

Two components work together: LCMSessionService persists every ADK event to SQLite, and lcm_compress_callback compresses context before each Gemini API call.

Setup Set GOOGLE_API_KEY in your environment. Install with pip install openlcm[google-adk].

adk_agent.py — complete working examplepython
import asyncio
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.genai import types
from openlcm import LCMEngine
from openlcm.core.config import LCMConfig
from openlcm.adapters.google_adk import LCMSessionService, lcm_compress_callback

# ── Mock tools (no extra API keys) ────────────────────────────────────────
def get_weather(city: str) -> dict:
    """Get weather for a city. Args: city: City name."""
    return {"city": city, "temp_c": 22, "condition": "Sunny"}

def get_stock_price(ticker: str) -> dict:
    """Get stock price. Args: ticker: Stock symbol e.g. AAPL."""
    return {"ticker": ticker, "price": 195.42, "change_pct": 1.2}

# ── LCM Engine ────────────────────────────────────────────────────────────
config = LCMConfig.from_env()
config.context_threshold = 0.60
config.fresh_tail_count  = 8

engine = LCMEngine(
    model="gemini/gemini-2.0-flash",
    config=config,
    db_path="adk_demo.db",
)
engine.bind_session("adk-session", context_length=500_000)

# ── ADK Agent with LCM hooks ──────────────────────────────────────────────
agent = LlmAgent(
    name="research_assistant",
    model="gemini-2.0-flash",
    instruction="You are a research assistant. Use your tools proactively.",
    tools=[get_weather, get_stock_price],
    before_model_callback=lcm_compress_callback(engine),  # compression hook
)

session_service = LCMSessionService(engine)  # persistence + dashboard
runner = Runner(
    agent=agent,
    app_name="my-app",
    session_service=session_service,
)

# ── Run ───────────────────────────────────────────────────────────────────
async def run_turn(session_id: str, user_input: str) -> str:
    # Manually ingest user message so it appears in the LCM store
    engine._ingest_messages([{"role": "user", "content": user_input}])

    content = types.Content(
        role="user", parts=[types.Part(text=user_input)]
    )
    final_text = ""
    async for event in runner.run_async(
        user_id="user", session_id=session_id, new_message=content
    ):
        # Consume ALL events — never break early (causes GeneratorExit in OTel)
        if event.is_final_response() and not final_text:
            if event.content and event.content.parts:
                final_text = "".join(
                    getattr(p, "text", "") or ""
                    for p in event.content.parts
                    if getattr(p, "text", None)
                )
    return final_text or "(no response)"

async def main():
    session = await runner.session_service.create_session(
        app_name="my-app", user_id="user", session_id="adk-session"
    )
    while True:
        user_input = input("You: ").strip()
        if not user_input: continue
        reply = await run_turn(session.id, user_input)
        print(f"Agent: {reply}")

asyncio.run(main())

Important: consume all events Never use break after is_final_response(). ADK's run_async generator runs inside OpenTelemetry spans — breaking early throws GeneratorExit into those spans and corrupts the session state. Always drain all events to natural completion.

How it works

Component	Interface	What it does
LCMSessionService	BaseSessionService	Wraps InMemorySessionService; mirrors every append_event call to SQLite for dashboard visibility
lcm_compress_callback	before_model_callback	Intercepts LlmRequest.contents before each Gemini API call and replaces it with compressed context

framework adapters

AutoGen

LCMContext is a ChatCompletionContext subclass. Pass it as model_context to any AutoGen agent — no other changes needed.

autogen_agent.pypython
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from openlcm import LCMEngine
from openlcm.adapters.autogen import LCMContext

model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

# Reuse the same client for LCM summarization — no extra API key
engine = LCMEngine(llm=model_client)
engine.bind_session("autogen-session", context_length=128_000)

agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    model_context=LCMContext(engine),   # ← LCM drop-in
)

# Multi-agent: each agent gets its own session
planner  = AssistantAgent("planner",  model_client=model_client,
    model_context=LCMContext(llm=model_client, session_id="planner"))
executor = AssistantAgent("executor", model_client=model_client,
    model_context=LCMContext(llm=model_client, session_id="executor"))

LCMContext methods

Satisfies the full ChatCompletionContext ABC:

Method	Behaviour
add_message(msg)	Persists to SQLite, triggers compression if threshold exceeded
get_messages()	Returns LCM-optimised context as typed AutoGen LLMMessage objects
clear()	Resets in-memory list and deletes session messages from store
message_count()	Returns count of messages currently held
save_state()	Returns serialisable state dict for checkpointing
load_state(state)	Restores context from a saved state dict

framework adapters

CrewAI

LCMStorage plugs into LongTermMemory as a storage backend. All crew memory goes through LCM's immutable store.

crewai_agent.pypython
from crewai import Agent, Crew, Task
from crewai.memory import LongTermMemory
from openlcm import LCMEngine
from openlcm.adapters.crewai import LCMStorage

engine = LCMEngine(model="openai/gpt-4o-mini")
engine.bind_session("crewai-session", context_length=128_000)

researcher = Agent(
    role="Research Analyst",
    goal="Gather and analyse market data",
    backstory="Expert at finding and synthesising information.",
    verbose=True,
)

crew = Crew(
    agents=[researcher],
    tasks=[Task(description="Research AI trends in 2025", agent=researcher)],
    memory=True,
    long_term_memory=LongTermMemory(
        storage=LCMStorage(engine)   # ← LCM drop-in
    ),
)

result = crew.kickoff()

framework adapters

OpenAI SDK

OpenAIMessages converts between the OpenAI message format and LCM's internal format. Compatible with Groq, Together, Mistral, Azure, Ollama, vLLM, and any OpenAI-compatible endpoint.

openai_agent.py — with tool callspython
import asyncio, json
from openai import AsyncOpenAI
from openlcm import LCMEngine
from openlcm.adapters.openai import OpenAIMessages

client = AsyncOpenAI()
engine = LCMEngine(model="openai/gpt-4o-mini")
engine.bind_session("openai-session", context_length=128_000)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

async def chat(messages: list, user_input: str) -> tuple:
    messages.append({"role": "user", "content": user_input})

    # Convert to LCM → compress if needed → convert back to OpenAI format
    lcm = OpenAIMessages.to_lcm(messages)
    if engine.should_compress_preflight(lcm):
        lcm = await engine.compress(lcm)
        messages = OpenAIMessages.from_lcm(lcm)

    response = await client.chat.completions.create(
        model="gpt-4o-mini", messages=messages, tools=tools
    )
    msg = response.choices[0].message
    messages.append(msg.model_dump())

    # Handle tool calls
    if msg.tool_calls:
        for tc in msg.tool_calls:
            args = json.loads(tc.function.arguments)
            result = {"city": args["city"], "temp_c": 22}  # mock
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": json.dumps(result),
            })
        # Recurse to get final answer after tool results
        return await chat(messages, "") if not user_input else (messages, "")

    engine.update_from_response({
        "prompt_tokens":     response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
    })
    return messages, msg.content

Groq / Together / Ollama / vLLM Use the same OpenAIMessages converter — the message format is identical. Just change the client's base_url and the LCMEngine(model=...) string.

framework adapters

Anthropic SDK

AnthropicMessages handles Anthropic's content block format. from_lcm() returns a (system_str, messages) tuple because Anthropic takes system as a separate parameter.

anthropic_agent.pypython
import asyncio
from anthropic import AsyncAnthropic
from openlcm import LCMEngine
from openlcm.adapters.anthropic import AnthropicMessages

client = AsyncAnthropic()
engine = LCMEngine(model="anthropic/claude-haiku-4-5-20251001")
engine.bind_session("anthropic-session", context_length=200_000)

SYSTEM = "You are a helpful assistant."

async def chat(messages: list, user_input: str) -> tuple:
    # Convert to LCM internal format (system is extracted from messages)
    lcm = AnthropicMessages.to_lcm(messages, system=SYSTEM)

    if engine.should_compress_preflight(lcm):
        lcm = await engine.compress(lcm)

    # Add new user message after compression
    lcm.append({"role": "user", "content": user_input})

    # from_lcm returns (system_str, anthropic_messages)
    system_out, anthropic_msgs = AnthropicMessages.from_lcm(lcm)

    response = await client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system=system_out or SYSTEM,
        messages=anthropic_msgs,
    )

    reply = response.content[0].text
    messages.append({"role": "assistant", "content": reply})
    engine.update_from_response({
        "prompt_tokens":     response.usage.input_tokens,
        "completion_tokens": response.usage.output_tokens,
    })
    return messages, reply

framework adapters

LlamaIndex

LlamaIndexMessages converts between ChatMessage objects (with MessageRole enum) and LCM's internal format.

llamaindex_agent.pypython
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.llms.anthropic import Anthropic
from openlcm import LCMEngine
from openlcm.adapters.llamaindex import LlamaIndexMessages

llm    = Anthropic(model="claude-haiku-4-5-20251001")
engine = LCMEngine(llm=llm)
engine.bind_session("llama-session", context_length=200_000)

history: list[ChatMessage] = []

async def chat(user_input: str) -> str:
    history.append(ChatMessage(role=MessageRole.USER, content=user_input))

    # Convert → compress if needed → convert back
    lcm = LlamaIndexMessages.to_lcm(history)
    if engine.should_compress_preflight(lcm):
        lcm      = await engine.compress(lcm)
        history[:] = LlamaIndexMessages.from_lcm(lcm)

    response = await llm.achat(history)
    history.append(ChatMessage(
        role=MessageRole.ASSISTANT,
        content=response.message.content
    ))
    return response.message.content

framework adapters

Haystack

HaystackMessages handles both Haystack ≥2.3 ToolCall dataclass style and legacy additional_kwargs style.

haystack_agent.pypython
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import OpenAIChatGenerator
from openlcm import LCMEngine
from openlcm.adapters.haystack import HaystackMessages

generator = OpenAIChatGenerator(model="gpt-4o-mini")
engine    = LCMEngine(model="openai/gpt-4o-mini")
engine.bind_session("haystack-session", context_length=128_000)

history: list[ChatMessage] = []

async def chat(user_input: str) -> str:
    history.append(ChatMessage.from_user(user_input))

    lcm = HaystackMessages.to_lcm(history)
    if engine.should_compress_preflight(lcm):
        lcm        = await engine.compress(lcm)
        history[:] = HaystackMessages.from_lcm(lcm)

    result  = generator.run(history)
    reply   = result["replies"][0]
    history.append(reply)
    return reply.text

framework adapters

Gemini (raw)

Gemini (raw google-genai)

GeminiMessages converts between types.Content objects (Gemini's native format) and LCM. Also used internally by lcm_compress_callback.

gemini_agent.pypython
import asyncio
from google import generativeai as genai
from google.genai import types
from openlcm import LCMEngine
from openlcm.adapters.gemini import GeminiMessages

model  = genai.GenerativeModel("gemini-2.0-flash")
engine = LCMEngine(model="gemini/gemini-2.0-flash")
engine.bind_session("gemini-session", context_length=1_000_000)

history: list = []  # list[types.Content]

async def chat(user_input: str) -> str:
    history.append(types.Content(
        role="user", parts=[types.Part(text=user_input)]
    ))

    # Convert → compress → convert back
    lcm = GeminiMessages.to_lcm(history)
    if engine.should_compress_preflight(lcm):
        lcm = await engine.compress(lcm)
        _, history[:] = GeminiMessages.from_lcm(lcm)  # (system, contents)

    response = model.generate_content(history)
    history.append(response.candidates[0].content)
    return response.text

reference

Message Converters

Every framework adapter ships a static converter class with to_lcm() and from_lcm() methods you can use independently of the higher-level adapters.

all converters at a glancepython
from openlcm.adapters.openai     import OpenAIMessages
from openlcm.adapters.anthropic  import AnthropicMessages
from openlcm.adapters.langchain  import LangChainMessages
from openlcm.adapters.llamaindex import LlamaIndexMessages
from openlcm.adapters.haystack   import HaystackMessages
from openlcm.adapters.gemini     import GeminiMessages
from openlcm.adapters.autogen    import AutoGenMessages

# All follow the same two-method interface:
lcm_msgs = OpenAIMessages.to_lcm(openai_messages)     # → list[dict]
oai_msgs = OpenAIMessages.from_lcm(lcm_msgs)          # → list[dict]

# Anthropic and Gemini return a tuple from from_lcm (system is separate):
system, msgs = AnthropicMessages.from_lcm(lcm_msgs)   # → (str, list)
system, msgs = GeminiMessages.from_lcm(lcm_msgs)      # → (str, list[Content])

# Auto-detect converter from message type:
from openlcm.adapters import auto_detect
converter = auto_detect(messages)                       # returns the right class
lcm_msgs  = converter.to_lcm(messages)

LCM internal format

All converters normalise to this format. Tool calls are JSON-serialised into the content string.

internal message schemapython
# Plain message
{"role": "user" | "assistant" | "system", "content": "string"}

# Assistant with tool calls (content is JSON string)
{"role": "assistant", "content": "{\"text\": \"...\", \"tool_calls\": [{\"id\": \"tc_1\", \"name\": \"get_weather\", \"args\": {\"city\": \"Tokyo\"}, \"type\": \"function\"}]}"}

# Tool result
{"role": "tool", "content": "{\"temp_c\": 22}", "tool_call_id": "tc_1", "name": "get_weather"}

reference

Persistent Memory

LCM's fact store gives agents durable memory that survives session boundaries — preferences, constraints, decisions, and project facts that stay true across conversations.

The problem it solves

The DAG compresses and retrieves conversation history. But some things aren't history — they're standing truths: "the user prefers pytest", "don't push to production without a review", "we chose Postgres on 2025-04-01". These facts should be present at the start of every new session, not buried under 40 turns of old conversation.

The fact store is a separate key-value layer in the same SQLite database, queryable independently of the message store or DAG.

Agent tools

Tool	Description
`lcm_remember(key, value, …)`	Store or update a fact. Supports `tags`, `related_keys`, `category`, `scope`. Upserts on `(scope, key)`. Returns `previous_value` if the fact changed.
`lcm_recall(…)`	Retrieve facts — filter by `key`, `query`, `category`, `tag`, or `related_to`. No args returns all.
`lcm_forget(key)`	Delete a fact when it's no longer true.
`lcm_link(key1, key2)`	Bidirectionally link two facts. Each fact's `related_keys` list is updated. Use to capture causal chains.
`lcm_semantic_search(query)`	Cosine-similarity search over DAG nodes and facts. Requires `LCM_EMBEDDING_MODEL`. Falls back gracefully with a hint to use `lcm_grep`.

Storing facts

lcm_remember — the agent calls thistool call
# Store a user preference (global scope = survives all future sessions)
lcm_remember(
    key="user.preferred_test_framework",
    value="pytest",
    category="preference",
    tags=["tooling", "testing"],
)

# Record a project constraint with tags
lcm_remember(
    key="constraint.no_production_push",
    value="Never push to production without a peer review. Agreed 2025-06-01.",
    category="constraint",
    tags=["deployment", "safety"],
)

# Record a decision — if key exists, response includes previous_value
lcm_remember(
    key="decision.database",
    value="Chose Postgres over MySQL — better JSON support needed for events table.",
    category="decision",
    tags=["backend", "infra"],
)
# → { "status": "stored", "key": "decision.database", ... }
# → If value changed: also includes "updated": true, "previous_value": "..."

Recalling facts

lcm_recall — at session start or before consequential actionstool call
# Load all facts at session start (no args = everything)
lcm_recall()

# Load only constraints — e.g. before any tool call that mutates state
lcm_recall(category="constraint")

# Look up a specific fact
lcm_recall(key="user.preferred_test_framework")

# Search by keyword
lcm_recall(query="database")

# Filter by tag (v0.1.2+) — returns all facts with this tag
lcm_recall(tag="deployment")

# Traverse fact graph — returns facts linked to key via related_keys or shared tags
lcm_recall(related_to="decision.database")

Scope

scope="global" (default) makes a fact visible across all sessions. scope="current" scopes it to the current session only. You can also pass any explicit session_id string as a scope for finer isolation.

scoped factstool call
# Global: visible to every future session
lcm_remember(key="user.timezone", value="America/New_York", scope="global")

# Session-local: private to the current conversation
lcm_remember(key="task.active_branch", value="feat/auth-rewrite", scope="current")

# Recall only global facts
lcm_recall(scope="global")

# Recall global + current session facts (no scope filter)
lcm_recall()

Category	Use for
`preference`	User style/tooling choices (test framework, code style, verbosity)
`constraint`	Hard rules the agent must not violate
`decision`	Recorded choices with rationale and date
`fact`	General project or user knowledge (default)

Direct Python API

FactStore accessed from the enginepython
from openlcm import LCMEngine

engine = LCMEngine(model="anthropic/claude-haiku-4-5-20251001")
engine.bind_session("my-session", context_length=200_000)

# Store a fact directly (no agent round-trip needed)
engine._facts.remember(
    "user.preferred_test_framework",
    "pytest",
    scope="global",
    category="preference",
    source_session_id=engine.current_session_id,
)

# Recall all global facts at session start
facts = engine._facts.recall_query(scope="global", limit=100)
for f in facts:
    print(f["key"], "→", f["value"])

Recommended session start pattern

Call lcm_recall() as the first tool call in every new session. This re-injects preferences, constraints, and decisions that would otherwise be invisible to the model at turn 1.

system prompt hinttext
At the start of each session, call lcm_recall() to load all persistent
facts, preferences, and constraints about this user and project.
Call lcm_recall(category="constraint") before any tool call that
modifies state.

reference

Fact Graph

Facts can carry tags and bidirectional related_keys links, forming a lightweight knowledge graph directly in SQLite — no extra infrastructure.

Linking facts

lcm_link creates a bidirectional related_keys connection between two facts. lcm_recall(related_to=key) traverses both explicit links and shared tags.

lcm_link — bidirectional edgetool call
# Create a causal link
lcm_link(key1="decision.database", key2="constraint.no_external_db")
# → decision.database.related_keys now includes "constraint.no_external_db"
# → constraint.no_external_db.related_keys now includes "decision.database"

# Traverse: returns all facts reachable via links or shared tags
lcm_recall(related_to="decision.database")
# → constraint.no_external_db, plus any other fact sharing a tag with decision.database

Contradiction detection

When a fact is updated with a substantially different value, the response includes the old value so the agent can surface the conflict.

contradiction surfacetool call
# First write
lcm_remember(key="decision.database", value="Postgres")

# Later: someone wants to switch
lcm_remember(key="decision.database", value="MySQL — cheaper licence for our scale")
# → response includes: {"updated": true, "previous_value": "Postgres"}
#    Agent can warn the user about the change before overwriting

Python API

FactStore.link / recall_relatedpython
engine._facts.link("decision.database", "service.auth", scope="global")

related = engine._facts.recall_related("decision.database", scope="global")
# Returns facts connected via related_keys AND facts sharing any tag

tagged = engine._facts.recall_query(tag="auth", scope="global")

reference

Auto Memory

Three automatic features that populate and inject memory without the agent needing to call tools: Auto Injection, Auto Extraction, and Salience Pinning.

Auto Memory Injection

Before each compression, LCM extracts keywords from the last 2–3 user messages, searches the fact store and message history per keyword, and prepends a compact [Recalled Memory] block to the system message. The agent never needs to call lcm_recall manually.

Why this matters The model has to decide to check memory — and it often doesn't. Auto injection removes that decision entirely. Relevant facts surface automatically whenever the conversation topic matches stored context.

enabling auto injectionpython
from openlcm.core.config import LCMConfig

config = LCMConfig.from_env()
config.auto_inject_memory  = True   # or LCM_AUTO_INJECT_MEMORY=true
config.auto_inject_top_k   = 5      # max facts injected per turn

# The injected block looks like this (ephemeral — never stored in SQLite):
# [Recalled Memory]
# [constraint] constraint.no_production_push: Never push without review.
# [preference] user.preferred_test_framework: pytest

Auto-Extraction to Facts

After every new D0 summary node is created, LCM fires an async LLM pass over the summary text to extract decisions, preferences, and constraints — and auto-populates the fact store. The fact store self-fills as a side-effect of compression.

enabling auto extractionpython
config.extraction_to_facts_enabled = True  # or LCM_EXTRACTION_TO_FACTS_ENABLED=true

# After the first compaction, the fact store will contain entries like:
#   decision.auth_approach → "JWT with RS256, 24h expiry"  (auto-extracted)
#   constraint.no_prod_deploy → "Agreed: require review"    (auto-extracted)
# The agent never called lcm_remember — it happened automatically.

Non-blocking Auto-extraction runs as asyncio.ensure_future() after the summary node is written. It does not block the compression path or the agent turn.

Salience Auto-Pinning

During message ingestion, messages that match high-salience patterns are automatically pinned via the existing pin() mechanism. Pinned messages are never eligible for compression and always appear in the fresh tail.

enabling salience pinningpython
config.auto_pin_patterns = ["constraint", "error", "correction"]
# or: LCM_AUTO_PIN_PATTERNS=constraint,error,correction

# Pattern groups:
#   constraint  → messages containing IMPORTANT:, never, must not, always, do not
#   error       → Traceback (most recent call last), Error:, Exception:
#   correction  → user messages starting with "No,", "Wait,", "Actually,"

reference

Semantic Search

Optional vector embeddings on DAG summary nodes and facts, stored in the same SQLite file via sqlite-vec. Zero extra infrastructure. Off by default — enable with a single env var.

How it works

After each new D0 node or fact upsert, LCM fires an async embedding call using LCM_EMBEDDING_MODEL.

Embeddings are stored as float32 blobs in lcm_embeddings(content_type, content_id, embedding) in the same .db file.

lcm_semantic_search(query) embeds the query and returns cosine-similarity ranked hits enriched with summary text or fact values.

enabling semantic searchshell
# Install sqlite-vec and set the embedding model
pip install sqlite-vec
export LCM_EMBEDDING_MODEL=openai/text-embedding-3-small

# Or any other LiteLLM-compatible embedding provider:
export LCM_EMBEDDING_MODEL=anthropic/voyage-3
export LCM_EMBEDDING_MODEL=google/text-embedding-004

lcm_semantic_search — the agent calls thistool call
# Find relevant nodes/facts by meaning, not just keywords
lcm_semantic_search(
    query="auth token expiry",
    limit=10,
    content_type="all",   # "node", "fact", or "all"
)
# → Returns cosine-ranked hits. "JWT 24h expiry" found even without those exact words.

# When LCM_EMBEDDING_MODEL is not set, returns a helpful fallback:
# "Embedding model not configured — use lcm_grep for keyword search."

What gets embedded Only DAG summary nodes and facts — NOT raw messages. Summary nodes are already small, dense, semantically rich chunks. Embedding every raw message would be expensive and noisy; FTS5 on raw messages is sufficient.

Graceful degradation

If sqlite-vec is not installed or LCM_EMBEDDING_MODEL is not set, EmbeddingStore is a complete no-op — no errors, no warnings at startup. lcm_semantic_search returns a hint message pointing to lcm_grep. No existing functionality is affected.

Python APIpython
# Direct access to EmbeddingStore
from openlcm.core.embeddings import EmbeddingStore

store = EmbeddingStore(db_path, embedding_model="openai/text-embedding-3-small")

if store.enabled:
    await store.embed("fact", fact_id, fact_value)
    results = await store.search("auth token", content_type="fact", limit=5)
    # results: [{"content_type": "fact", "content_id": 3, "distance": 0.12, ...}]

reference

Live Dashboard

Every agent automatically gets a live browser dashboard. No config required.

Start the dashboard

embedded in your agent (recommended)python
import threading
from openlcm.viz.server import create_app, serve as viz_serve

def _start_viz():
    app = create_app(engine)
    viz_serve(app, host="127.0.0.1", port=7842, open_browser=True)

threading.Thread(target=_start_viz, daemon=True).start()
# → opens http://localhost:7842 automatically

standalone CLIshell
openlcm viz                          # http://localhost:7842
openlcm viz --port 8080              # custom port
openlcm viz --db ~/.openlcm/app.db   # point at a specific DB

Dashboard panels

Panel	Shows
Token Pressure Gauge	Live prompt token count vs threshold and max. Green → amber → red.
Summary DAG Viewer	Live tree of all DAG nodes grouped by depth (D0/D1/D2). Compression ratio per node. Click any node to view full summary text.
Persistent Memory	All stored facts for the current session and global scope. Tag chips shown on each row. Filter by text or category. ＋ Add button opens a modal with key, value, category, scope, and tags fields.
SQLite Store	Every raw message with role badge, token estimate, and full content viewer. Tool calls shown with amber TOOL badge.
Event Log	Chronological stream: session_bound, compaction_start, node_added, compaction_end, token_pressure.
Sessions List	All sessions in the DB. Click to drill into any session's full history.

reference

CLI Reference

all commandsshell
# Dashboard
openlcm viz [--port 7842] [--db PATH] [--no-browser]

# Full-text search across all sessions (FTS5)
openlcm grep "search term" [--session SESSION_ID] [--limit 20]

# Session statistics
openlcm status [--session SESSION_ID] [--db PATH]

# Export session to JSON
openlcm export SESSION_ID [-o output.json]

# Recover raw messages from a DAG node
openlcm expand NODE_ID [--session SESSION_ID]

codebase intelligence

LST — Lossless Semantic Tree

Parse a repository once (AST → semantic graph) and store it in the same SQLite database. Agents query the graph instead of reading files — so the codebase never needs to live in the context window.

Core idea Without LST, an agent reads 30 files on turn 1, half scroll out by turn 10, and it re-reads on turn 11 — burning 40K tokens just staying oriented. With LST, lcm_lst_find("PaymentService") returns signatures + docstrings in ~200 tokens. One query. No re-discovery.

Scan a repo

CLI — scan and querybash

# Scan a local repo openlcm scan repo /path/to/myapp # Scan a GitHub repo (auto-clones to ~/.openlcm/repos/) openlcm scan repo https://github.com/fastapi/fastapi # Check what was indexed openlcm scan status # Export portable graph (share between machines / agents) openlcm scan export myapp.lcmgraph openlcm scan import myapp.lcmgraph

Python SDKpython

from openlcm.code.graph import LSTGraph from openlcm.code.scanner import RepoScanner graph = LSTGraph("myapp.db") scanner = RepoScanner() # Incremental scan — only re-parses changed files scanner.scan("/path/to/repo", graph, repo_id="myapp") # or scan a remote URL scanner.scan("https://github.com/user/repo", graph) # Query the graph graph.find_symbol("PaymentService") graph.get_class("PaymentService") # class + all methods + docstrings graph.get_callers("charge") # who calls this function graph.get_callees("process_payment") # what this function calls graph.get_file_symbols("payments/service.py")

LST Agent Tools (13 tools)

Tool	Purpose
`lcm_lst_scan`	Scan a repo path or URL, populate the graph (incremental)
`lcm_lst_find`	FTS5 search — find any symbol by name, kind, or file
`lcm_lst_file`	All symbols in a file (classes, functions, imports)
`lcm_lst_class`	Class definition + all methods + signatures + linked facts
`lcm_lst_callers`	Who calls a function (call-graph inbound edges)
`lcm_lst_callees`	What a function calls (call-graph outbound edges)
`lcm_lst_refs`	All edge references to a symbol name
`lcm_lst_path`	Shortest dependency path between two symbols (networkx)
`lcm_lst_ancestors`	All symbols that transitively call a function
`lcm_lst_descendants`	Full dependency footprint of a function
`lcm_lst_context`	Full repo orientation block — call once at session start
`lcm_read_file`	Smart file read: full content first time, compact LST view on repeats
`lcm_lst_facts`	Retrieve all agent discoveries pinned to a symbol

Multi-language support

Python files use the stdlib ast module (rich: docstrings, call edges, full signatures). All other languages use Universal Ctags as a subprocess — 100+ languages including TypeScript, Go, Java, Rust, Ruby, C/C++, PHP, Swift, Kotlin.

install ctagsbash

brew install universal-ctags # macOS sudo apt install universal-ctags # Ubuntu / Debian scoop install universal-ctags # Windows

codebase intelligence

Session Context & No Re-discovery

Three mechanisms that ensure agents never re-discover the codebase and never lose structural knowledge across sessions or context pressure.

1 — Boot context injection

At session start, inject a ~500-token structural summary into your system prompt. The agent immediately knows key classes, entry points, and recent session history — without reading a single file.

session bootpython

engine.bind_session(session_id, context_length=200_000) # Get the orientation block — inject into system prompt ctx = engine.get_lst_context() system_prompt = f"You are a coding agent.\n\n{ctx}" # Auto-inject mode: add to every compress() call automatically # LCM_LST_AUTO_INJECT=true (or config.lst_auto_inject = True)

The orientation block contains: repo stats, key classes with docstrings, entry-point functions, most active files, recent session history from the DAG, and tool hints.

2 — Smart file reads (dedup)

Use lcm_read_file instead of the native Read tool. First read returns full content; every subsequent read of the same file in the same session returns a compact LST structural summary (~200 tokens vs ~3000).

file read deduplicationpython

# Turn 5 — first read → full file content returned result = engine.get_file_context("payments/service.py") # result["mode"] == "full" # Turn 30 — same file → compact LST view (~200 tokens) result = engine.get_file_context("payments/service.py") # result["mode"] == "compact" # Contains: class PaymentService (:12), def charge(amount, currency) (:45) — ... # Track files seen after native reads engine.mark_file_seen("payments/service.py") engine.get_session_files_read() # set of all files read this session # Force full content even if already seen result = engine.get_file_context("payments/service.py", force_full=True)

3 — Symbol-pinned facts

Pin agent discoveries to specific symbols so they surface automatically in future sessions when those symbols are queried.

fact → symbol linkingpython

# Pin a discovery to a symbol engine.handle_tool_call("lcm_remember", { "key": "payments.charge.rate_limit", "value": "Stripe rate limit hit at 100 req/s — add backoff", "symbol": "PaymentService.charge", "category": "constraint", }) # Next session — surfaces automatically when class is queried result = engine.handle_tool_call("lcm_lst_class", {"class_name": "PaymentService"}) # result["pinned_facts"] = [{"key": "payments.charge.rate_limit", ...}] # Query all facts pinned to a symbol directly engine.handle_tool_call("lcm_lst_facts", {"symbol": "PaymentService.charge"})

Zero-config setup via env vars

.envbash

LCM_LST_ENABLED=true LCM_LST_REPO_PATH=/path/to/repo LCM_LST_REPO_ID=myapp # optional, defaults to "default" LCM_LST_AUTO_INJECT=true # inject repo context into every compress() call

codebase intelligence

Code Graph Visualizer

Generate an interactive HTML force-directed graph of the codebase — all files, classes, functions, and edges. Canvas-rendered for smooth zoom/pan at any scale.

visualizebash

# Generate HTML and open in browser (default) openlcm scan visualize # Rich tree view in terminal openlcm scan visualize --terminal # Specific repo or output path openlcm scan visualize --repo-id myapp --output myapp_graph.html # Tune for large repos openlcm scan visualize --max-symbols 500 --max-edges 1000

The HTML graph includes: node type legend with toggle filters, edge type toggles (calls / imports / inherits), hover tooltips with signatures and docstrings, click-to-highlight connected nodes, detail panel with callers/callees, sidebar node list with search, and fit/reset controls. Powered by D3.js force simulation with Canvas rendering.

Python APIpython

from openlcm.code.visualize import build_graph_data, render_html, render_terminal data = build_graph_data(graph, repo_id="myapp", max_symbols=2000) render_html(data, "myapp_graph.html") render_terminal(graph, repo_id="myapp")

evaluation

Benchmarks

Standalone scripts that measure OpenLCM compression quality against established memory benchmarks. Run them to see the improvement over naive truncation.

Benchmark	What it tests	Key metric
LoCoMo	Single-session long-context retention — can the agent answer questions about turn 5 when on turn 200?	Token F1, Exact Match, ROUGE-L
LongMemEval	Multi-session cross-session memory — can the agent connect facts from session 1 and session 4?	F1 by question type: cross_session, knowledge_update, temporal

Run

benchmark commandsbash

pip install openlcm datasets litellm export ANTHROPIC_API_KEY=sk-ant-... # Single benchmark python benchmarks/run_locomo.py \ --model anthropic/claude-haiku-4-5-20251001 \ --limit 50 python benchmarks/run_longmemeval.py \ --model anthropic/claude-haiku-4-5-20251001 \ --limit 50 # Both benchmarks, combined summary python benchmarks/run_all.py \ --model anthropic/claude-haiku-4-5-20251001 \ --limit 50

Expected output

resultstext

────────────────────────────────────────────────────────── LoCoMo Results ────────────────────────────────────────────────────────── Metric truncate openlcm ──────────────────────────────────────────────────────── Exact Match 0.2341 0.3812 (+0.147) Token F1 0.3102 0.4891 (+0.179) ROUGE-L 0.2987 0.4654 (+0.167) LongMemEval — breakdown by question type ────────────────────────────────────────────────────────── Type F1@truncate F1@openlcm cross_session 0.1230 0.4102 knowledge_update 0.1560 0.4230 single_session_user 0.4210 0.5831

reference

API Reference

LCMEngine

Method / Property	Signature	Description
LCMEngine()	model=, config=, db_path=, summarize_fn=, llm=	Create engine. Pass a LiteLLM model string, or an existing LLM via llm= or summarize_fn=.
bind_session()	(session_id, context_length=, platform="")	Activate a session and set its context window size.
compress()	async (messages: list[dict]) → list[dict]	Compress messages if threshold exceeded. No-op if not. Always returns a valid message list.
should_compress_preflight()	(messages: list[dict]) → bool	Check whether compression would fire without actually compressing.
update_from_response()	(usage: dict)	Feed token usage from the LLM response back to the engine for pressure tracking.
get_status()	() → dict	Returns store_messages, dag_nodes, compression_count, last_prompt_tokens, tokens_freed.
_ingest_messages()	(messages: list[dict])	Write messages directly to the SQLite store without triggering compression. Used by ADK adapter.

LCMConfig fields

Field	Type	Default	Description
context_threshold	float	0.75	Fraction of context_length at which compression fires.
fresh_tail_count	int	64	Messages at the tail protected from compression.
leaf_chunk_tokens	int	20000	Approximate token budget per D0 leaf summary.
condensation_fanin	int	4	Number of D0 nodes before a D1 arc is created.
dynamic_leaf_chunk_enabled	bool	False	Auto-tune leaf_chunk_tokens based on observed turn sizes.
dynamic_leaf_chunk_max	int	40000	Upper bound for dynamic leaf chunk tuning.
auto_inject_memory	bool	False	Auto-inject relevant facts and history into system message before each compression.
auto_inject_top_k	int	5	Max facts to surface per compression when auto-injection is enabled.
extraction_to_facts_enabled	bool	False	Auto-extract facts from each new D0 summary node into the persistent fact store.
auto_pin_patterns	list[str]	[]	Named pattern groups to auto-pin matching messages: `constraint`, `error`, `correction`.
embedding_model	str	""	LiteLLM model string for vector embeddings. Empty = semantic search disabled.

for ai agents

Copy for Your AI Agent

Paste this into your agent's system prompt or give it to any AI assistant. It contains everything needed to integrate OpenLCM — architecture, all tools, all imports, all adapters, and best practices.

How to use Copy the block below and paste it into your system prompt, a context file, or hand it to Claude / GPT / Gemini with the instruction: "Integrate OpenLCM into my project based on this guide."

AI Agent Prompt Complete OpenLCM integration guide · copy and paste into any LLM

# OpenLCM — Complete Integration Guide You are being asked to integrate OpenLCM into a Python project. OpenLCM is a framework-agnostic Lossless Context Management SDK that gives AI agents permanent, searchable memory without hitting context limits. Install: pip install openlcm ## ARCHITECTURE OVERVIEW OpenLCM has three independent layers, all sharing one SQLite database file: Layer 1 — Immutable Message Store Every message is written verbatim to SQLite with a stable store_id. Nothing is ever modified or deleted. FTS5-indexed for full-text search. The agent can call lcm_grep to search across the entire history including already-compressed content. Layer 2 — Summary DAG When context pressure crosses a threshold, oldest messages are summarized into a D0 leaf node — but originals stay in the store. When enough D0 nodes accumulate they condense into a D1 session arc. D1s condense into D2 durable history. Depth is unbounded. The model always sees: system + highest DAG node + recent uncondensed D0 nodes + fresh tail (last N raw messages) The agent calls lcm_expand(node_id) to recover original messages from any node. The agent calls lcm_expand_query(query) to synthesize an answer from compressed history. Layer 3 — Persistent Fact Store A separate key-value table for standing truths that survive session boundaries: user preferences, project constraints, architectural decisions. Facts support tags, bidirectional links, and contradiction detection. The agent calls lcm_remember to store, lcm_recall to retrieve, lcm_forget to delete. lcm_link creates bidirectional causal connections between facts. ## INSTALLATION pip install openlcm All adapters (LangGraph, AutoGen, CrewAI, Google ADK, LlamaIndex, Haystack) and all provider SDKs (OpenAI, Anthropic, Gemini) are included. No extras needed. Optional — semantic search: pip install sqlite-vec export LCM_EMBEDDING_MODEL=openai/text-embedding-3-small ## CORE SETUP (works with any framework) from openlcm import LCMEngine from openlcm.core.config import LCMConfig config = LCMConfig.from_env() config.context_threshold = 0.75 # compress at 75% of context window config.fresh_tail_count = 64 # protect last 64 messages from compression config.leaf_chunk_tokens = 20000 # tokens per D0 summary node config.condensation_fanin = 4 # D0 nodes before D1 arc is created # Optional: auto-inject relevant facts before each compression (no LLM call) config.auto_inject_memory = True config.auto_inject_top_k = 5 # Optional: auto-extract facts from each new summary node (async, non-blocking) config.extraction_to_facts_enabled = True # Optional: auto-pin high-salience messages so they are never compressed away config.auto_pin_patterns = ["constraint", "error", "correction"] engine = LCMEngine( model="anthropic/claude-haiku-4-5-20251001", # any LiteLLM model string config=config, db_path="~/.openlcm/myapp.db", ) engine.bind_session("session-001", context_length=200_000) # Call compress() before every LLM turn messages = await engine.compress(messages) # Feed token usage back after each response engine.update_from_response({ "prompt_tokens": response.usage.input_tokens, "completion_tokens": response.usage.output_tokens, }) ## FRAMEWORK ADAPTERS LangGraph from openlcm.adapters.langgraph import LCMCheckpointer from openlcm.adapters.langchain import LangChainMessages # Option A — drop-in for MemorySaver (recommended) graph = StateGraph(MyState).compile(checkpointer=LCMCheckpointer(engine)) # Option B — manual compression inside a node lcm_msgs = LangChainMessages.to_lcm(messages) if engine.should_compress_preflight(lcm_msgs): lcm_msgs = await engine.compress(lcm_msgs) messages = LangChainMessages.from_lcm(lcm_msgs) Google ADK from openlcm.adapters.google_adk import LCMSessionService, lcm_compress_callback agent = LlmAgent( name="my_agent", model="gemini-2.0-flash", before_model_callback=lcm_compress_callback(engine), ) runner = Runner(agent=agent, session_service=LCMSessionService(engine)) AutoGen from openlcm.adapters.autogen import LCMContext agent = AssistantAgent("assistant", model_client=model_client, model_context=LCMContext(engine)) CrewAI from openlcm.adapters.crewai import LCMStorage from crewai.memory import LongTermMemory crew = Crew(memory=True, long_term_memory=LongTermMemory(storage=LCMStorage(engine))) OpenAI SDK (also works for Groq, Together, Ollama, vLLM, Azure) from openlcm.adapters.openai import OpenAIMessages lcm = OpenAIMessages.to_lcm(messages) if engine.should_compress_preflight(lcm): lcm = await engine.compress(lcm) messages = OpenAIMessages.from_lcm(lcm) Anthropic SDK from openlcm.adapters.anthropic import AnthropicMessages lcm = AnthropicMessages.to_lcm(messages, system=SYSTEM_PROMPT) if engine.should_compress_preflight(lcm): lcm = await engine.compress(lcm) system_out, anthropic_msgs = AnthropicMessages.from_lcm(lcm) # from_lcm returns (system_str, messages) — Anthropic takes system separately LlamaIndex from openlcm.adapters.llamaindex import LlamaIndexMessages lcm = LlamaIndexMessages.to_lcm(history) if engine.should_compress_preflight(lcm): lcm = await engine.compress(lcm) history[:] = LlamaIndexMessages.from_lcm(lcm) Haystack from openlcm.adapters.haystack import HaystackMessages lcm = HaystackMessages.to_lcm(history) if engine.should_compress_preflight(lcm): lcm = await engine.compress(lcm) history[:] = HaystackMessages.from_lcm(lcm) Gemini (raw google-genai) from openlcm.adapters.gemini import GeminiMessages lcm = GeminiMessages.to_lcm(history) if engine.should_compress_preflight(lcm): lcm = await engine.compress(lcm) _, history[:] = GeminiMessages.from_lcm(lcm) # returns (system_str, contents) ## AGENT TOOLS — ALL 26 AVAILABLE TOOLS Include all tools via engine.get_tool_schemas() and route calls to: result = engine.handle_tool_call(tool_name, args_dict, messages=messages) ───────────────────────────────────────────── MEMORY TOOLS ───────────────────────────────────────────── lcm_remember — store a persistent fact Parameters: key (str, required) dot-notation key e.g. "user.timezone", "decision.database" value (str, required) the value to store category (str, optional) "fact" | "preference" | "constraint" | "decision" default: "fact" scope (str, optional) "global" (all sessions) | "current" (this session only) default: "global" tags (list, optional) topic strings e.g. ["auth", "backend"] related_keys (list, optional) keys of related facts for graph linking symbol (str, optional) ★ NEW — pin this fact to a code symbol e.g. "PaymentService.charge" Fact surfaces automatically when that symbol is queried via lcm_lst_class repo_id (str, optional) repo the symbol belongs to (auto-detected) Returns: {"status": "stored", "key": ..., "fact_id": ..., "symbol_pinned": "..."} Best practice: whenever you discover something about the code, call lcm_remember with symbol= to pin it. It will survive across sessions and surface when that symbol is next queried — no re-discovery. lcm_recall — retrieve persistent facts Parameters: key (str, optional) exact key lookup query (str, optional) LIKE search on key+value category (str, optional) filter by category scope (str, optional) filter by scope tag (str, optional) filter by tag — returns all facts with this tag related_to (str, optional) traverse fact graph — returns facts linked to this key lcm_forget — delete a fact (key required) lcm_link — bidirectionally link two facts (key1, key2 required) ───────────────────────────────────────────── HISTORY TOOLS ───────────────────────────────────────────── lcm_grep — full-text search across message history Parameters: query (str, required) session_id (str, optional) limit (int, optional) default 20 lcm_expand — recover original messages from a DAG node Parameters: node_id (int, required) max_tokens (int, optional) default 4000 lcm_expand_query — synthesize answer from compressed history Parameters: query (str, required) session_id (str, optional) max_tokens (int, optional) default 4000 lcm_semantic_search — cosine similarity search (requires LCM_EMBEDDING_MODEL) Parameters: query (str, required) limit (int, optional) default 10 content_type (str, optional) "node" | "fact" | "all" default: "all" ───────────────────────────────────────────── CODEBASE GRAPH TOOLS (LST) ───────────────────────────────────────────── Requires: openlcm scan repo <path> (run once, persists in SQLite) lcm_lst_context ★ — get full repo orientation (call FIRST in any coding session) Parameters: repo_id (str, optional) Returns: compact repo summary — key classes, entry points, active files, recent session history, tool hints. ~500 tokens. Replaces hours of codebase discovery. lcm_read_file ★ — smart file read with session deduplication Parameters: file_path (str, required) repo-relative path e.g. "openlcm/core/engine.py" force_full (bool, optional) always return raw content default: false repo_root (str, optional) override repo root for disk reads FIRST read this session: returns full file content, marks file as seen. REPEAT read this session: returns compact LST structural summary (~200 tokens). Use this instead of the native Read tool for indexed files. lcm_lst_find — find any symbol by name Parameters: name (str, required) kind (str, optional) "class" | "function" | "method" | "import" file (str, optional) restrict to a specific file limit (int, optional) default 10 lcm_lst_class — get class definition + all methods + linked facts Parameters: class_name (str, required) simple name e.g. "PaymentService" repo_id (str, optional) Returns: class info, all methods with signatures, docstrings, and any facts previously pinned to this class via lcm_remember(symbol="ClassName"). lcm_lst_file — all symbols in a file Parameters: file_path (str, required) repo_id (str, optional) lcm_lst_callers — who calls a function Parameters: function_name (str, required) limit (int, optional) default 20 lcm_lst_callees — what a function calls Parameters: function_name (str, required) limit (int, optional) default 20 lcm_lst_refs — all edge references to a symbol Parameters: symbol_name (str, required) limit (int, optional) default 30 lcm_lst_path — shortest dependency path between two symbols (requires networkx) Parameters: from_name (str, required) to_name (str, required) repo_id (str, optional) lcm_lst_ancestors — all symbols that transitively call a function Parameters: symbol_name (str, required) depth (int, optional) default 5, max 10 repo_id (str, optional) lcm_lst_descendants — full dependency footprint of a function Parameters: symbol_name (str, required) depth (int, optional) default 5, max 10 repo_id (str, optional) lcm_lst_facts ★ — all agent discoveries pinned to a symbol Parameters: symbol (str, required) e.g. "LCMEngine", "compress", "PaymentService.charge" repo_id (str, optional) Returns: all facts previously stored with lcm_remember(symbol=...) for this symbol. Use at session start instead of re-reading code you already analyzed. lcm_lst_scan — scan or re-scan a repository Parameters: repo_path (str, required) local path or remote URL repo_id (str, optional) default "default" force (bool, optional) re-parse all files default: false ## ENVIRONMENT VARIABLES Context compression LCM_CONTEXT_THRESHOLD float 0.75 Compress at this fraction of context window LCM_FRESH_TAIL_COUNT int 64 Messages protected from compression at tail LCM_LEAF_CHUNK_TOKENS int 20000 Tokens per D0 leaf summary node LCM_CONDENSATION_FANIN int 4 D0 nodes before D1 arc LCM_DB_PATH str ~/.openlcm/lcm.db SQLite database path Memory LCM_AUTO_INJECT_MEMORY bool false Auto-inject relevant facts before each compression LCM_AUTO_INJECT_TOP_K int 5 Max facts injected per compression LCM_EXTRACTION_TO_FACTS_ENABLED bool false Auto-extract facts from each D0 summary node LCM_AUTO_PIN_PATTERNS str "" Pattern groups: constraint,error,correction LCM_EMBEDDING_MODEL str "" LiteLLM model for embeddings (lcm_semantic_search) Codebase graph (LST) LCM_LST_ENABLED bool false Auto-scan repo on engine init LCM_LST_REPO_PATH str "" Path to scan (required when LST_ENABLED=true) LCM_LST_REPO_ID str default Logical repo identifier LCM_LST_AUTO_INJECT bool true Inject repo context into every compress() call ## SYSTEM PROMPT BEST PRACTICES Add the following to your agent's system prompt: --- CONTEXT MANAGEMENT: At the start of every session, call: 1. lcm_recall() — reload all persistent facts, constraints, and preferences 2. lcm_lst_context() — get full repo orientation (if working with code) Never ask the user to repeat information that may already be stored. CODING SESSIONS: Use lcm_read_file instead of the native Read tool for all indexed files. First read returns full content; repeat reads return compact structural summary. After reading a file, use lcm_lst_class / lcm_lst_callers instead of re-reading to understand relationships. When you discover something important about the code (a bug, a pattern, a constraint): lcm_remember(key="...", value="...", symbol="ClassName.method", category="constraint") This pins the discovery to the symbol — it will surface automatically next session. MEMORY: Whenever the user confirms a decision, states a constraint, or expresses a preference — immediately call lcm_remember with the appropriate category. Use dot-notation keys: decision.database, constraint.no_prod_push, preference.language. Before any state-modifying tool call (writes to DB, deploys, sends messages), call lcm_recall(category="constraint") to check for standing constraints. If you need to recall something from earlier that may have been compressed, call lcm_grep with a keyword or lcm_expand_query with a description. Never guess at something that can be looked up. --- ## FACT STORE BEST PRACTICES Key naming convention: decision.<topic> e.g. decision.database, decision.auth, decision.deployment constraint.<topic> e.g. constraint.budget, constraint.cloud, constraint.team_size preference.<topic> e.g. preference.language, preference.test_framework user.<topic> e.g. user.timezone, user.name, user.communication_style project.<topic> e.g. project.name, project.stack, project.deadline Tags group facts by domain — use consistently: ["backend"], ["frontend"], ["auth"], ["infra"], ["database"], ["testing"], ["deployment"] Scope: "global" — visible to every future session (use for most facts) "current" — private to this session (use for temporary working state) Contradiction handling: When lcm_remember returns "updated": true with a "previous_value", surface this to the user before overwriting — they may be making an unintended change. ## LIVE DASHBOARD import threading from openlcm.viz.server import create_app, serve as viz_serve def _start_viz(): app = create_app(engine) viz_serve(app, host="127.0.0.1", port=7842, open_browser=True) threading.Thread(target=_start_viz, daemon=True).start() # Opens http://localhost:7842 — shows token pressure, DAG, message store, fact store Or from CLI: openlcm viz # http://localhost:7842 openlcm viz --port 8080 openlcm viz --db ~/.openlcm/app.db ## INTERNAL MESSAGE FORMAT All converters normalize to this format. Tool calls are JSON in the content string. Plain message: {"role": "user" | "assistant" | "system", "content": "string"} Assistant with tool calls: {"role": "assistant", "content": "{\"text\": \"...\", \"tool_calls\": [{\"id\": \"tc_1\", \"name\": \"get_weather\", \"args\": {\"city\": \"Tokyo\"}, \"type\": \"function\"}]}"} Tool result: {"role": "tool", "content": "{\"temp_c\": 22}", "tool_call_id": "tc_1", "name": "get_weather"} ## MINIMAL WORKING EXAMPLE import asyncio from openlcm import LCMEngine engine = LCMEngine(model="anthropic/claude-haiku-4-5-20251001", db_path="~/.openlcm/app.db") engine.bind_session("session-001", context_length=200_000) async def agent_turn(messages: list[dict], user_input: str) -> tuple: messages.append({"role": "user", "content": user_input}) messages = await engine.compress(messages) # no-op until threshold crossed response = await my_llm.chat(messages) messages.append(response) engine.update_from_response(response.usage) return response.content, messages asyncio.run(agent_turn([], "Hello!"))

Installation

Quick Start

Configuration

Environment variables

Core Concepts

Two-layer architecture

DAG depth levels

Active context formula

Sessions

LangGraph

Option A — LCMCheckpointer (recommended)

Option B — Manual compression inside a node

Google ADK

How it works

AutoGen

LCMContext methods

CrewAI

OpenAI SDK

Anthropic SDK

LlamaIndex

Haystack

Gemini (raw google-genai)

Message Converters

LCM internal format

Persistent Memory

The problem it solves

Agent tools

Storing facts

Recalling facts

Scope

Categories

Direct Python API

Recommended session start pattern

Fact Graph

Tags

Linking facts

Contradiction detection

Python API

Auto Memory

Auto Memory Injection

Auto-Extraction to Facts

Salience Auto-Pinning

Semantic Search

How it works

Graceful degradation

Live Dashboard

Start the dashboard

Dashboard panels

CLI Reference

LST — Lossless Semantic Tree

Scan a repo

LST Agent Tools (13 tools)

Multi-language support

Session Context & No Re-discovery

1 — Boot context injection

2 — Smart file reads (dedup)

3 — Symbol-pinned facts

Zero-config setup via env vars

Code Graph Visualizer

Benchmarks

Run

Expected output

API Reference

LCMEngine

LCMConfig fields

Copy for Your AI Agent