Python SDK v1.9.2

Python SDK Reference

pip install agentbay

Local-first memory for coding agents. Methods live directly on the client (ab.store, ab.recall, ...) and return plain dicts and lists — no wrapper objects. Every snippet on this page was run against agentbay 1.9.2.

Quickstart

Cloud recommended

Sign in once via the CLI. Your key is saved to ~/.agentbay/, any local memories migrate automatically, and AgentBay() picks up the saved login from then on.

pip install agentbay
agentbay login   # opens your browser — sign in, done

from agentbay import AgentBay

ab = AgentBay()  # finds the saved login -> cloud mode

ab.store("JWT auth with 24h refresh tokens", title="Auth pattern", type="PATTERN")
results = ab.recall("authentication")

Local no signup

With no API key (no argument, no AGENTBAY_API_KEY, no saved login), the same code runs fully local against SQLite at ~/.agentbay/local.db. Real output shown below.

from agentbay import AgentBay

ab = AgentBay()  # no credentials anywhere -> local brain
# stderr: AgentBay: local brain ready (memories stay on this machine). ...

result = ab.store(
    "Next.js 16 + Prisma + PostgreSQL with pgvector",
    title="Project stack",
    type="ARCHITECTURE",
)
print(result)
# {'id': '08887ce8-35f3-4e96-8298-57c8758e6c41', 'deduplicated': False}

memories = ab.recall("what stack does this project use?", limit=3)
print(memories[0])
# {'id': '08887ce8-35f3-4e96-8298-57c8758e6c41', 'title': 'Project stack',
#  'content': 'Next.js 16 + Prisma + PostgreSQL with pgvector',
#  'type': 'ARCHITECTURE', 'tier': 'semantic', 'tags': [],
#  'confidence': 0.5, 'summary': 'Next.js 16 + Prisma + PostgreSQL with pgvector',
#  'score': 0.0625}

One-time download: the first store/recall that needs vector search fetches a small ONNX embedding model (all-MiniLM-L6-v2, ~22 MB) via fastembed. If the download fails (e.g. offline), the SDK prints a warning and falls back to full-text search — vector search resumes automatically once the model can be fetched.

StremAI

AgentBay()

ab = AgentBay(api_key=None, base_url="https://www.aiagentsbay.com", project_id=None, timeout=30, telemetry=True, local=False)

Create a client. Credential routing: no api_key argument, no AGENTBAY_API_KEY env var, and no saved login means a fully working local brain (SQLite, no signup, never raises). Credentials found anywhere means cloud mode. local=True forces local even when credentials exist.

Parameters:

api_key?	str	API key (ab_live_...). Falls back to AGENTBAY_API_KEY env var, then the login saved by `agentbay login`.
base_url?	str	API base URL. Default: https://www.aiagentsbay.com
project_id?	str	Default project for cloud memory operations. If omitted, the first memory call auto-creates a "My Brain" project.
timeout?	int	Request timeout in seconds. Default: 30
telemetry?	bool	Set False to disable anonymous usage telemetry (counts only). Equivalent to AGENTBAY_TELEMETRY=0.
local?	bool	Force local mode even when credentials exist.

Example:

# Local mode (zero config)
ab = AgentBay()

# Force local even with credentials present
ab = AgentBay(local=True)

# Cloud mode, explicit key
ab = AgentBay(api_key="ab_live_...", project_id="your-project-id")

# Cloud mode from environment
import os
os.environ["AGENTBAY_API_KEY"] = "ab_live_..."
ab = AgentBay(project_id="your-project-id")

Memory operations

All memory methods are top-level on the client and work identically in local and cloud mode. project_id arguments apply to cloud mode only.

store()

result = ab.store(content, title=None, project_id=None, type='PATTERN', tier='semantic', tags=None, user_id=None)

Store a memory. content is the first positional argument; title is optional (auto-generated from the first sentence if omitted). Local mode deduplicates semantically (cosine > 0.9) or by title + type.

Parameters:

content	str	The knowledge content to store.
title?	str	Short title. Auto-generated if omitted.
type?	str	PATTERN, PITFALL, DECISION, PROCEDURE, ARCHITECTURE, FACT, PREFERENCE, CONTEXT. Default: PATTERN
tier?	str	semantic, episodic, procedural. Default: semantic
tags?	list[str]	Tags for filtering.
user_id?	str	Optional user scoping (stored as a user:<id> tag in cloud mode).

Returns: dict — local mode: {'id': str, 'deduplicated': bool}; cloud mode: the created entry from the API

Example:

result = ab.store(
    "checkRateLimit is async - always await it",
    title="checkRateLimit is async",
    type="PITFALL",
    tags=["api", "rate-limit"],
)
print(result)
# {'id': 'da7da285-b994-4130-88ab-cc5e43d57a91', 'deduplicated': False}

recall()

memories = ab.recall(query, project_id=None, limit=5, tier=None, tags=None, user_id=None)

Search memories by semantic similarity. Local mode fuses FTS5, vector cosine similarity, and keyword TF-IDF via Reciprocal Rank Fusion.

Parameters:

query	str	Natural-language search query.
limit?	int	Max results (1-50). Default: 5
tier?	str	Filter by storage tier (cloud mode).
tags?	list[str]	Filter by tags.
user_id?	str	Optional user scoping.

Returns: list[dict] — each dict has id, title, content, type, tier, tags, confidence, summary (plus score in local mode)

Example:

memories = ab.recall("rate limiting", limit=3)
for m in memories:
    print(m["title"], m["confidence"])

verify()

ab.verify(knowledge_id, project_id=None)

Confirm a memory is still accurate. Local mode bumps helpful_count and confidence; cloud mode resets confidence decay. Call it when an entry was helpful.

Parameters:

knowledge_id

str

ID of the entry to verify.

Returns: None

Example:

ab.verify(result["id"])

forget()

ab.forget(knowledge_id, project_id=None)

Archive (soft-delete) a memory entry. The row stays on disk with archived=1 and disappears from every read path.

Parameters:

knowledge_id

str

ID of the entry to forget.

Returns: None

Example:

ab.forget(result["id"])

health()

stats = ab.health(project_id=None)

Get memory health statistics. Real local-mode output shown below.

Returns: dict

Example:

stats = ab.health()
print(stats)
# {'total_entries': 1, 'by_tier': {'semantic': 1},
#  'by_type': {'ARCHITECTURE': 1}, 'total_tokens': 14,
#  'has_embeddings': 1, 'search_methods': ['fts5', 'keyword', 'vector']}

Mem0-compatible aliases

add()

result = ab.add(data, user_id=None, agent_id=None, metadata=None)

Mem0-style store. Pass a string; StremAI auto-detects the type and extracts a title.

Returns: dict — {'id': str, 'deduplicated': bool} in local mode

Example:

r = ab.add("We decided to use pgvector over a separate vector DB")
print(r)
# {'id': 'd984890e-092d-4df9-8269-38c24027ecbe', 'deduplicated': False}

search()

results = ab.search(query, user_id=None, limit=5)

Mem0-style alias for recall().

Returns: list[dict]

Example:

hits = ab.search("vector database decision", limit=2)
print([h["title"] for h in hits])
# ['We decided to use pgvector over a separate vector DB', 'Project stack']

chat() — auto-memory LLM wrapper

chat()

response = ab.chat(messages, model=None, provider='auto', project_id=None, user_id=None, auto_recall=True, auto_store=True, recall_limit=3, **kwargs)

Wrap an LLM call with automatic memory: recalls relevant memories and injects them as context, calls the provider, then scans the response for learnings and stores them in the background. Bring your own LLM key (e.g. ANTHROPIC_API_KEY or OPENAI_API_KEY); the provider package must be installed.

Parameters:

messages	list[dict]	Chat messages in OpenAI format: [{"role": "user", "content": "..."}]
model?	str	Model name. Defaults to the provider's default model.
provider?	str	Provider name, or "auto" to detect from available API keys / local LLM servers. Default: auto
auto_recall?	bool	Recall relevant memories before the call. Default: True
auto_store?	bool	Store learnings from the response. Default: True
recall_limit?	int	Max memories to inject (1-10). Default: 3
**kwargs?	Any	Passed to the LLM client (max_tokens, temperature, api_key, ...).

Returns: The raw provider response object (Anthropic Message, OpenAI ChatCompletion, ...) — memory is a side effect

Example:

response = ab.chat(
    [{"role": "user", "content": "fix the auth session expiry bug"}],
    provider="anthropic",   # or "auto" to detect from env keys
)
# response is the raw Anthropic Message object

Supported LLM Providers

chat() supports 16 providers. Most are OpenAI-compatible and need only the openai package plus the right API key:

anthropic

openai

google (Gemini)

xai (Grok)

mistral

cohere

deepseek

together

fireworks

groq

perplexity

azure (OpenAI)

bedrock (AWS)

ollama (local)

lmstudio (local)

llamacpp (local)

provider="auto" checks API-key env vars (ANTHROPIC_API_KEY, OPENAI_API_KEY, ...) in priority order, then probes for local LLM servers (Ollama, LM Studio, llama.cpp).

LocalMemory

The local engine behind AgentBay() in local mode — pure Python + SQLite (FTS5 + optional vector search), usable directly. It exposes the same store / recall / verify / forget / health methods, plus export(), auto_learn(text) (extract learnings via Ollama, Anthropic/OpenAI API, or heuristics), and upgrade(api_key) to migrate to cloud.

from agentbay import LocalMemory

mem = LocalMemory(quiet=True)   # ~/.agentbay/local.db
print(mem)
# LocalMemory(db='/Users/you/.agentbay/local.db', entries=2, search=[fts5, keyword, vector])

CLI

Installed with the package as agentbay:

agentbay init     # choose cloud (recommended) or local-only setup
agentbay login    # open browser, sign up/in, migrate local memories to cloud
agentbay status   # local memory count + cloud connection state
agentbay sync     # push any unsynced local memories to cloud