OpenClaw: Architecture, Components, and Deployment Notes
On November 24, 2025, an open source project named OpenClaw quietly appeared on GitHub. Four months later it had passed 343,000 stars, putting it among the fastest-growing non-aggregating open source projects in recent memory and ahead of React, Vue, and Tailwind CSS over comparable early windows. The pitch behind that curve is simple: an AI assistant that runs on your own device and belongs to you.
OpenClaw describes itself as "Your own personal AI assistant. Any OS. Any Platform. The lobster way. " The interesting part is not the slogan itself but the position behind it. Local-First is not an optional mode layered onto a cloud product. It is the premise the rest of the system inherits. In a market where AI assistants increasingly centralize data and execution, that position lines up directly with developer demand for data sovereignty.
"The lobster way" also shows up in the project's internal language. The workflow orchestration tool is called Lobster. Community members call themselves lobsters. Even the GitHub chant, "EXFOLIATE! EXFOLIATE!", points at the same idea: growth through repeated shedding and rebuilds. OpenClaw treats aggressive refactoring as part of how the architecture evolves.
From an engineering angle, the more interesting story is the stack: TypeScript ESM, a pnpm monorepo, 230 Plugin SDK export paths, and 24 channel integrations inside one codebase. Rust and Go tools replace older JavaScript tooling in several places. The repository also folds in the TypeScript native Go compiler preview and handles QEMU cross-compilation carefully in its Docker build. Those choices say more about the project than any feature checklist.
The project is released under the MIT license and backed by sponsors including OpenAI, NVIDIA, Vercel, Blacksmith, and Convex. A cloud AI company funding a local-first open source competitor is not a trivial detail. This note reads the OpenClaw repository (github.com/openclaw/openclaw), version v2026.4.1, from the code outward: repository layout, plugin architecture, channels, runtime, memory, security, and native clients. All code references come from the upstream repository rather than secondary commentary.
Before diving into the code, establish the system shape first. OpenClaw can be read as four layers:
- Gateway (control plane): Hosts session management, configuration delivery, Cron scheduling, Webhooks and health checks as a WebSocket service (default ws://127.0.0.1:18789), and also hosts Control UI (Lit 3 + Vite) and Canvas hosting (A2UI).
- Agent / Pi Runtime: Based on @mariozechner/pi-agent-core@0.64.0, runs in RPC mode, supports Tool Streaming and Block Streaming, accesses 25+ model providers, and has Auth rotation and Failover capabilities.
- Channels + Skills (channels and skills layer): Covers 24 messaging platforms and interacts with the core through the 230 contract paths of the Plugin SDK; the ClawHub market, the before_install security hook, and tools such as Browser, Canvas, Nodes, Cron, and Sessions also sit in this layer.
- Memory (memory layer): It is composed of 13 sub-modules of memory-core, with local Markdown file persistence, sqlite-vec vector search and LanceDB as the storage backend, carrying user-editable preferences and long-term context.
Gateway is the single control-plane entry point. Every client, including the CLI, Web UI, macOS app, and iOS or Android nodes, connects to Gateway over WebSocket. The Agent runtime sits under Gateway in RPC mode, receives messages from each channel, calls models and tools, and routes the result back to the channel where the request started. Skills and channels talk to the core through the 230 exported Plugin SDK paths, while the Memory layer gives the Agent long-lived context across sessions.
Each layer also has a hard boundary. Gateway defines a typed WebSocket protocol in src/gateway/protocol/schema.ts. The Agent layer exposes capabilities through Pi's RPC surface. The Plugin SDK is the only legal import surface between extensions and the core. The Memory layer is split into 13 smaller modules to avoid monolithic coupling. The rest of the article walks through those layers one by one.
OpenClaw is not starting from scratch. Before its current naming, it went through two stages: MoltBot and ClawdBot. Traces of this history remain in the codebase: the scripts field of package.json still retains the "moltbot:rpc" command, pointing to the exact same implementation as "openclaw:rpc". The documentation domain docs.molt.bot still redirects to docs.openclaw.ai with an HTTP 301.
The project is led by the Austrian developer Peter Steinberger (GitHub: @steipete), who has 14,756 commits in the repository, far more than the second contributor (1,690). Steinberger was previously known for his contributions to the iOS SDK ecosystem. He transformed into an AI Agent platform developer and continued his style of high-frequency iteration and radical reconstruction into the development of OpenClaw.
OpenClaw uses calendar versioning rather than semantic versioning. Version format is vYYYY.M.D (e.g. v2026.4.1), which directly reflects the release date. When there are multiple releases on the same day, append the patch suffix vYYYY.M.D-N.
The release channel is divided into three layers, mapped through npm dist-tag:
| Channel | npm dist-tag | Tag format | Applicable scenarios |
| stable | latest | vYYYY.M.D | Production environment, default installation |
| beta | beta | vYYYY.M.D-beta.N | Pre-release verification, macOS App may be absent |
| dev | dev | main branch head | Develop and debug, release on demand |
To switch channels, use openclaw update --channel stable|beta|dev. When Beta is released, the npm version number must carry the -beta.N suffix. You cannot use the version number without the suffix with --tag beta release, otherwise the version identifier will be consumed - this is a release rule clearly recorded in the repository AGENTS.md.
As of the writing of this article, the latest stable version is v2026.3.31 (released on 2026-03-31). This version contains six breaking changes (Breaking Changes), and the high density reflects OpenClaw's aggressive iteration style:
- Nodes/exec Refactoring: Removed the duplicate nodes.run shell wrapper in CLI and Agent nodes tools, and all node shell executions use the exec host=node path. Node-specific capabilities are reserved for nodes invoke and specialized media/location/notify operations.
- Plugin SDK legacy path deprecation: The old provider compatibility subpath and the old bundled provider setup and channel-runtime compatibility shims are deprecated and a migration warning is issued. The currently documented openclaw/plugin-sdk/* entries plus the local api.ts and runtime-api.ts barrel files are the only path forward.
- Plugin Installation Security Tightening: Built-in dangerous-code critical-level discovery and installation scan failure are now denied by default (fail closed). Some plugins that previously installed successfully now require the --dangerously-force-unsafe-install flag to be explicitly specified in order to continue.
- Gateway authentication tightening: trusted-proxy mode rejects mixed shared token configurations; local-direct fallback requires the use of configured tokens and no longer implicitly authenticates callers on the same host.
- Node command gating: Node commands remain disabled until node pairing is approved. Simply completing device pairing is no longer sufficient to expose declared node commands.
- Reduced Trust Surface for Node Events: Node-originated runs now execute on a reduced trusted surface. Notification-driven or node-triggered processes that rely on broader host/session tool access may need to be adjusted.
This "every version may have Breaking Changes" strategy echoes the choice of calendar version numbers - since no semantic compatibility promises are provided, each snapshot is clearly identified with the release date. In practice, users should pin a specific version and read the CHANGELOG before upgrading.
The previous notable release v2026.3.28 (released on 2026-03-29) also contains a number of important changes:
- xAI integration: bundled xAI provider migrated to Responses API, added native x_search (Grok web search tool), and integrated optional x_search configuration steps in openclaw onboard.
- MiniMax image generation: Added image generation and image editing capabilities for the MiniMax image-01 model, supporting aspect ratio control.
- Qwen Authentication Migration: Remove deprecated qwen-portal-auth OAuth integration and migrate to Model Studio API Key mode.
- Plugins/hooks approval mechanism: The before_tool_call hook adds the asynchronous requireApproval capability. The plugin can pause the tool execution and prompt the user for approval through channels such as Telegram buttons, Discord interactions, /approve commands, etc.
- Microsoft Teams upgrade: Migrate to the official Teams SDK to support streaming replies and AI annotations for 1:1 conversations.
- Gateway OpenAI Compatibility: Added /v1/models and /v1/embeddings endpoints so that Gateway can be directly called by OpenAI-compatible third-party tools.
OpenClaw is a pnpm workspace monorepo. The core layout of the root directory is as follows:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
openclaw/ ├── src/ # Core source code │ ├── cli/ # CLI command entry and progress bar │ ├── commands/ # Implementation of each subcommand │ ├── gateway/ # Gateway control plane (including protocol/ subdirectory) │ ├── channels/ # Core channel implementation │ ├── routing/ # Message routing │ ├── plugins/ # Plugin discovery, loading, registration │ ├── plugin-sdk/# Public plugin contract (the only legal import side) │ ├── infra/ # Infrastructure (SQLite, file locks, etc.) │ └── media/ # Media processing pipeline ├── apps/ │ ├── macos/ # SwiftUI + AppKit menu bar application│ ├── ios/ # Xcode + SwiftUI │ └── android/ # Kotlin + Gradle ├── extensions/ # Internal extensions (bundled plugin workspace tree) ├── packages/ # Shared package ├── skills/ # Built-in Skills (distributed with npm package) ├── ui/ # Web Control UI (Lit 3 + Vite) ├── docs/ # Mintlify documentation ├── test/ # E2E test └── scripts/ # Build/publish/check scripts (60+) |
This directory tree reveals OpenClaw's engineering philosophy: The core is as thin as possible and the boundary is as hard as possible. src/ stores all TypeScript core code, extensions/ stores built-in extensions (bundled plugin workspace tree), and apps/ stores three native clients. The import relationship between the three is one-way: extensions/ can only call core capabilities through openclaw/plugin-sdk/*, and apps/ communicates with the core through the Gateway WebSocket protocol. Any reverse dependencies will be intercepted by CI's architecture guard script.
extensions/ is a bundled plugin workspace tree - built-in extensions distributed with npm packages. Channel plugins such as Matrix, Zalo, ZaloUser, Voice Call, and diagnostic telemetry (diagnostics-otel) are stored here. Each extension is an independent pnpm workspace package with its own package.json and openclaw.plugin.json manifest files. Extended runtime dependencies must be declared in their own dependencies and cannot be added to the root package.json (unless the core also uses the same dependency). While workspace:* is prohibited in dependencies (because npm install cannot resolve the workspace protocol), openclaw itself should be put into devDependencies or peerDependencies, and openclaw/plugin-sdk is resolved through the jiti alias at runtime.
packages/ stores pure shared library packages, does not contain a plugin list, and does not use the plugin loading pipeline. They provide utility functions and type definitions that are reusable across packages.
The
skills/ directory stores Built-in Skills (Bundled Skills) - they are distributed with the npm package and can be used after installation. Unlike third-party Skills on ClawHub, built-in Skills do not require clawhub install and do not go through the before_install security check pipeline. Each Skill is described by a SKILL.md file, which is injected into the system prompt when the Agent is running.
docs/ is built using the Mintlify framework and deployed at docs.openclaw.ai. Links within documents use root-relative paths (such as [Config](/configuration)) without the .md extension. The document supports Chinese translation. The Chinese version is located at docs/zh-CN/ and is automatically generated by the scripts/docs-i18n script, supplemented by the glossary docs/.i18n/glossary.zh-CN.json and the translation memory docs/.i18n/zh-CN.tm.jsonl to ensure terminology consistency.
The scripts/ directory contains more than 60 independent script files, plus 198 npm scripts entries in package.json, forming an extremely sophisticated build automation system for OpenClaw. Scripts can be divided into the following categories according to their uses:
- Build script: tsdown-build.mjs (main build entry), runtime-postbuild.mjs (post-build processing), bundle-a2ui.sh (Canvas A2UI packaging), ui.js (Web UI build)
- Code inspection script: check-extension-plugin-sdk-boundary.mjs (extension import boundary check, three modes), check-plugin-extension-import-boundary.mjs (the core must not be reverse-imported into the extension), check-no-pairing-store-group-auth.mjs (security authentication audit)
- Release script: openclaw-npm-release-check.ts (pre-release verification), plugin-npm-release-plan.ts (plugin release plan), openclaw-npm-postpublish-verify.ts (post-release verification)
- Platform scripts: package-mac-app.sh (macOS packaging), ios-configure-signing.sh (iOS signing), build-release-aab.ts (Android AAB build)
- Test scripts: test-parallel.mjs (parallel test orchestrator), test-live.mjs (real API Key test), 8 e2e/*.sh Docker E2E test scenarios
- Operation and maintenance scripts: committer (atomic commit tool, replacing manual git add/commit), restart-mac.sh (macOS Gateway restart), clawlog.sh (macOS unified log query)
OpenClaw's dependency control is extremely streamlined. The root package.json declares only 47 runtime dependencies and 22 development time dependencies. The version locking of key dependencies is as follows:
| Dependencies | Version | Purpose |
| @mariozechner/pi-agent-core | 0.64.0 | Agent runtime core |
| @agentclientprotocol/sdk | 0.17.1 | ACP Protocol SDK |
| @modelcontextprotocol/sdk | 1.29.0 | MCP Protocol SDK |
| matrix-js-sdk | 41.3.0-rc.0 | Matrix Channel |
| playwright-core | 1.58.2 | Browser Control |
| sqlite-vec | 0.1.9 | Vector storage |
| sharp | ^0.34.5 | Image processing |
| hono | 4.12.9 | HTTP Framework |
| express | ^5.2.1 | Compatibility layer |
| zod | ^4.3.6 | Runtime verification |
| ws | ^8.20.0 | WebSocket |
| undici | ^7.24.6 | HTTP client |
The most noteworthy development dependency is @typescript/native-preview@7.0.0-dev.20260331.1 - this is TypeScript's official Go language rewrite preview, which OpenClaw has integrated into the pnpm tsgo command. vitest@4.1.2 is paired with @vitest/coverage-v8 to provide coverage detection, tsdown@0.21.7 replaces webpack/rollup as the packager, oxfmt@0.43.0 and oxlint@1.58.0 replace Prettier and ESLint respectively. The selection idea of this tool chain is clear: replace the traditional solution written in JavaScript with native tools written in Rust/Go to obtain an order of magnitude performance improvement.
All dependencies with pnpm.patchedDependencies must use exact version numbers (the ^ or ~ prefix is not allowed), and dependency patches require explicit approval. Additionally, the repository explicitly states "Never update Carbon dependencies" - this is a hard rule written into AGENTS.md.
The previous chapter gave the first-level directory skeleton of src/. This chapter unfolds the internal design of each subdirectory one by one, taking the code structure and dependencies as the main line to explain the engineering layering of OpenClaw core source code.
src/cli/ is the entry layer for the entire OpenClaw command line tool. It does not contain any business logic and is only responsible for two things: parsing command line arguments and routing them to concrete implementations in src/commands/, and rendering structured progress feedback in the terminal.
The core of progress feedback is located in src/cli/progress.ts. This module uses two sets of mechanisms simultaneously:
The first set is OSC Progress Sequences (Operating System Command Progress Sequences). This is a set of terminal escape codes that allow a percentage progress bar to be displayed directly on the title bar or tab page in Windows Terminal, iTerm2, and some Linux terminals that support ConPTY. progress.ts drives an operating system-level progress indicator by writing the \x1b]9;4;1;{percent}\x07 sequence to stdout, which allows the user to see installation progress in the taskbar even when the terminal window is minimized.
The second set is @clack/prompts, a lightweight interactive terminal UI library. OpenClaw uses it to implement step indicators, multi-select menus, and confirmation prompts in onboard wizards. The spinner and OSC progress of @clack/prompts can work in parallel - the spinner is rendered on the current line of stdout, and the OSC sequence is rendered on the terminal title bar, and the two do not interfere with each other.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
// src/cli/progress.ts Simplified core logic import { spinner } from '@clack/prompts'; export function emitOscProgress(percent: number): void { process.stdout.write(`\x1b]9;4;1;${Math.round(percent)}\x07`); } export function clearOscProgress(): void { process.stdout.write(`\x1b]9;4;0;\x07`); } export async function withProgress( label: string, task: (update: (pct: number) => void) => Promise ): Promise { const s = spinner(); s.start(label); const result = await task((pct) => { emitOscProgress(pct); s.message(`${label} (${pct}%)`); }); clearOscProgress(); s.stop(`${label} ✔`); return result; } |
Each file in src/commands/ corresponds to a top-level CLI subcommand. File naming follows the {command}.ts pattern, such as start.ts, stop.ts, update.ts, onboard.ts, config.ts, plugin.ts.
The most complex of these is onboard.ts, the first run wizard. The execution process of the Onboard wizard is: Detect the system environment (Node.js version, platform, package manager) → Select the message channel (Telegram/Discord/Slack, etc.) → Enter the channel credentials (Bot Token, etc.) → Select the AI Provider (OpenAI/Anthropic/Ollama, etc.) → Enter the Provider API Key → Write the configuration file ~/.openclaw/config.yaml → Execute npm install --omit=dev for the first time Install the selected channel's extended dependencies. The entire process is driven by @clack/prompts, with spinner and progress bar feedback for each step.
src/gateway/ is the backbone of OpenClaw. It starts a WebSocket service locally (listening to ws://127.0.0.1:18789 by default) and acts as a Single Control Plane between all channels, plugins, native clients and Control UI.
The directory structure is roughly as follows:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
src/gateway/ ├── server.ts # WebSocket server life cycle ├── router.ts # Protocol message distribution ├── session.ts # Session management ├── presence.ts # online status ├── config.ts # Runtime configuration Hot-reload ├── cron.ts # Scheduled tasks ├── webhooks.ts # External webhook access ├── auth.ts # Authentication model ├── health.ts # /healthz, /readyz endpoints ├── openai-compat.ts # /v1/models, /v1/embeddings compatibility layer └── protocol/ ├── schema.ts # Protocol Schema aggregation entry └── schema/ # Schema definition files split by fields ├── sessions.ts ├── nodes.ts ├── channels.ts └── ... |
The protocol/ subdirectory is the type layer of Gateway. All WebSocket messages are serialized and deserialized via TypeScript type definitions exported by the protocol/schema.ts aggregate. Files within schema/ are organized by domain (sessions, nodes, channels, etc.), with each file exporting the request/response Zod Schema or TypeScript interface. This Schema is also used in Swift codegen - the Gateway client code in macOS/iOS native applications automatically generates corresponding Swift structs from these TypeScript types by the build script.
Session management (session.ts) maintains the memory status of all active sessions, including session ID, associated channel, associated Agent, message queue depth, last active time, etc. Presence (presence.ts) tracks the online status of all connected clients, supporting native applications and web UI to display which channels are online in real time. Cron (cron.ts) provides scheduled task scheduling based on cron expressions, which is used to periodically check the channel connection status and perform cleanup tasks. Webhooks (webhooks.ts) provides endpoint registration for channels that require HTTP callbacks, such as the Telegram webhook mode and the Slack Events API.
src/channels/ is not a single directory - OpenClaw spreads the core channel code across multiple first-level directories under src/. The specific mapping relationship is:
| Channel | Source code location | Underlying dependencies |
| Telegram | src/telegram/ | grammY |
| Discord | src/discord/ | discord.js |
| Slack | src/slack/ | @slack/bolt |
| Signal | src/signal/ | signal-cli (Java child process) |
| iMessage | src/imessage/ | BlueBubbles HTTP API / native imsg |
| src/web/ | Baileys (WhatsApp Web Protocol) |
src/channels/ itself exists as an aggregation layer, defining the Unified Messaging Abstraction interface and routing table that all channels must implement. The file structure inside each channel directory is roughly symmetrical: an adapter file is responsible for mapping the events of the platform SDK into a unified inbound message format, and a sender file is responsible for converting the unified outbound format back to platform-specific API calls.
The message routing engine (src/routing/) is the middle layer between the channel system and the Agent runtime. It distributes inbound messages to the correct Agent instance based on routing rules in the configuration file. Routing dimensions include: channel type, account ID, sender peer ID, group ID, and message content matching mode. In a multi-Agent scenario, the routing engine is responsible for isolating messages from different channels/accounts/groups into different Agent sessions.
src/plugins/ is the runtime host of the plugin system, not the plugin itself. It contains four core modules:
Discovery: Scan installed npm packages in extensions/ workspace and ~/.openclaw/plugins/ for packages that contain an openclaw.plugin.json manifest file.
Manifest Validation: Use Zod Schema to strictly verify the structure of openclaw.plugin.json. Fields such as id, channel.id, and install.npmSpec in the manifest file must conform to the predefined format.
Loader: Execute dynamic import() on the verified plugin, load its entry module and call the agreed life cycle hook.
Registry: Maintains a global plugin registry, recording the type, status, and capability statement of each loaded plugin. Registry supports runtime hot-plugging - newly installed plugins can be discovered → validate → load → register without restarting the Gateway.
Contract Enforcement: Ensuring at build time that plugins only import public APIs via openclaw/plugin-sdk/* via ESLint rules. Any plugins that directly reference modules inside src/ will be intercepted in CI.
src/plugin-sdk/ is the only public API side of OpenClaw for all external extensions. The exports field of package.json declares exactly 230 named export subpaths, each of which is a stable contract. These 230 subpaths are all legal import sources for plugin development - no exceptions. A detailed analysis of this catalog is provided in the next chapter.
src/infra/ encapsulates the underlying capabilities of interacting with the operating system. Core components include: a local persistence layer based on better-sqlite3 (to store session history, plugin status, user configuration, etc.), and a file locking mechanism based on proper-lockfile - ensuring that no two OpenClaw Gateway instances will operate on the same data directory at the same time on the same machine. The SQLite database file is located by default at ~/.openclaw/data/openclaw.db.
src/media/ implements a unified media processing pipeline. When the channel receives an image, audio, video or file message, the pipeline is responsible for: downloading the original media → format detection → transcoding if necessary (such as Opus → WAV for speech to text) → storing in a local cache → generating a reference URL for use by the Agent. The pipeline is designed to be pluggable, and media plugins can register custom processors to handle specific MIME types.
OpenClaw's plugin system is centered on src/plugin-sdk/ and exposes 230 precisely named sub-paths to the outside through the exports field of package.json. This is a strictly designed Contract System - it also defines what the plugin can and cannot do.
The exports field format of package.json is as follows:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
{ "exports": { "./plugin-sdk/channel-types": "./src/plugin-sdk/channel-types.ts", "./plugin-sdk/channel-inbound": "./src/plugin-sdk/channel-inbound.ts", "./plugin-sdk/channel-reply-pipeline": "./src/plugin-sdk/channel-reply-pipeline.ts", "./plugin-sdk/channel-send-result": "./src/plugin-sdk/channel-send-result.ts", "./plugin-sdk/channel-dm-security": "./src/plugin-sdk/channel-dm-security.ts", "./plugin-sdk/provider-types": "./src/plugin-sdk/provider-types.ts","./plugin-sdk/provider-registry": "./src/plugin-sdk/provider-registry.ts", "./plugin-sdk/memory-core-types": "./src/plugin-sdk/memory-core-types.ts", "./plugin-sdk/memory-core-store": "./src/plugin-sdk/memory-core-store.ts", "./plugin-sdk/plugin-manifest": "./src/plugin-sdk/plugin-manifest.ts", "./plugin-sdk/plugin-lifecycle": "./src/plugin-sdk/plugin-lifecycle.ts", "./plugin-sdk/runtime-config": "./src/plugin-sdk/runtime-config.ts", "./plugin-sdk/runtime-events": "./src/plugin-sdk/runtime-events.ts", "./plugin-sdk/media-types": "./src/plugin-sdk/media-types.ts", "./plugin-sdk/media-processor": "./src/plugin-sdk/media-processor.ts", "./plugin-sdk/speech-types": "./src/plugin-sdk/speech-types.ts", "./plugin-sdk/speech-engine": "./src/plugin-sdk/speech-engine.ts" // ... 230 items in total } } |
These 230 sub-paths can be divided into the following categories according to prefix:
| Prefix | Quantity (approx.) | Responsibilities |
| channel-* | ~45 | Channel type definition, inbound/outbound messages, DM security policy, group behavior, chunking policy |
| provider-* | ~35 | AI Provider interface, model registration, capability declaration, streaming response protocol |
| memory-core-* | ~20 | Memory system core type, storage interface, vector index |
| plugin-* | ~25 | Plugin manifest format, life cycle hooks, capability declaration |
| runtime-* | ~40 | Runtime configuration, event bus, logs, error types, session context |
| media-* | ~15 | Media type, processor interface, transcoding pipeline |
| speech-* | ~10 | Speech recognition/synthesis engine interface |
| Others (tool-*, skill-*, util-*, etc.) | ~40 | Tool/skill plugin interface, common tool type |
A core architectural constraint of OpenClaw is that all external extensions (packages in extensions/ workspace and third-party npm packages) can only be imported from openclaw/plugin-sdk/*. Direct references to internal modules in src/ are not allowed, relative paths are not allowed to cross package boundaries, and references to paths not declared in exports are not allowed.
This rule is enforced in CI via four custom ESLint rules:
| Lint rules | Function |
| lint:extensions:no-plugin-sdk-internal | Prohibit code in extensions/ from importing the internal implementation files of plugin-sdk (non-exports declaration paths) |
| lint:extensions:no-relative-outside-package | Prohibit code in extensions/ from using relative paths to reference files outside the package |
| lint:extensions:no-src-outside-plugin-sdk | Prohibit code in extensions/ from directly referencing any module under src/ that is not plugin-sdk |
| lint:plugins:no-extension-imports | Prohibit src/ core code from back-referencing modules in extensions/ (to prevent reverse dependencies) |
Together, these four rules form a strict Dependency Firewall: the boundary between core code and extension code is one-way, controlled, and auditable.
OpenClaw defines five plugin types, each corresponding to a set of sub-paths in plugin-sdk:
Channel Plugin: Implements a new messaging platform adapter. Complete implementations of channel-inbound and channel-reply-pipeline must be provided. channel.id must be declared in the manifest file.
Provider Plugin: Connect to a new AI model provider. The interfaces defined in provider-types need to be implemented, including model enumeration, Chat Completion stream, Embedding, etc.
Tool Plugin: Adds new callable tools for Agent. Register tool definitions through tool-* subpaths, including JSON Schema parameter descriptions and execution functions.
Skill Plugin: A prepackaged composite capability (such as "search web pages and summarize") that can contain the orchestration logic of multiple tools.
Media Plugin: Register a custom media processor to handle files of specific MIME types.
The metadata of each plugin is declared by openclaw.plugin.json in the package root directory:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
{ "id": "openclaw-channel-matrix", "version": "2026.4.1", "type": "channel","channel": { "id": "matrix", "displayName": "Matrix", "supportsGroups": true, "supportsDM": true }, "install": { "npmSpec": "@openclaw/channel-matrix@latest" }, "minCoreVersion": "2026.3.1", "entrypoint": "./dist/index.js" } |
Key fields: id is a globally unique identifier; channel.id must be provided when type is channel and is used for routing table matching; install.npmSpec specifies the npm package identifier used during installation; minCoreVersion declares the minimum compatible OpenClaw core version.
There are two important Barrel Files inside src/plugin-sdk/: api.ts and runtime-api.ts.
api.ts aggregates all pure type exports - interface definitions, type aliases, enumerations, etc. It is a compile-time dependency and does not contain any runtime code. runtime-api.ts aggregates modules that need to be implemented at runtime - factory functions, registers, event emitters, etc. The separation of the two ensures that if the plugin only needs type information (such as pure TypeScript type guards), it can only rely on api.ts without introducing any runtime code, keeping it tree-shaking friendly.
Plugin installation is performed through npm install --omit=dev, and only production dependencies are installed. Key constraint: The use of workspace:* protocols as dependencies is prohibited in the plugin's package.json - this is because third-party plugins are not in the monorepo workspace context when installed on the user's machine, and workspace:* will fail to resolve. There are special checking scripts in CI to intercept such violations.
v2026.3.31 is a Breaking Change version. Previously, a set of legacy subpaths prefixed with provider-compat-* were retained in the plugin-sdk for backward compatibility with earlier Provider interfaces. v2026.3.31 officially removed these paths. Third-party Provider plugins that rely on the old interface must be migrated to the new provider-* subpaths. The migration guide is located at docs/migration/v2026.3.31-provider-compat.md.
Gateway is the core runtime process of OpenClaw. It is not an optional component - all channel messages, Agent dispatch, plugin communication, and native client interactions are routed through the Gateway. To understand Gateway is to understand the full runtime of OpenClaw.
Gateway's design philosophy is Single Local Control Plane - there is only one instance of Gateway running on the local machine, which is the communication hub for all components. The startup command openclaw start actually starts the Gateway process. Gateway listens for WebSocket connections on ws://127.0.0.1:18789 (the default port), while providing an HTTP endpoint on the same port.
All components are clients of Gateway: channel adapters (Telegram bot, Discord bot, etc.) internally report inbound messages to Gateway through WebSocket; Agent receives tasks from Gateway during runtime and returns responses; native applications (macOS, iOS, Android) connect to Gateway through WebSocket to obtain real-time status; Control UI (Web interface) is also a WebSocket client.
Gateway's WebSocket protocol is fully typed. The protocol definition is located in src/gateway/protocol/schema.ts, which aggregates and exports all submodules from the src/gateway/protocol/schema/ directory. Each sub-module corresponds to a protocol field:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
// src/gateway/protocol/schema/sessions.ts import { z } from 'zod'; export const SessionPatchRequest = z.object({ method: z.literal('sessions.patch'), params: z.object({ sessionId: z.string(), patch: z.object({ thinkingLevel: z.enum(['off','minimal','low','medium','high','xhigh']).optional(), activeAgent: z.string().optional(), queueMode: z.enum(['sequential','parallel']).optional(), }), }), }); export const SessionPatchResponse = z.object({ result: z.object({ sessionId: z.string(), applied: z.record(z.unknown()), }), }); // src/gateway/protocol/schema/nodes.ts export const NodeListRequest = z.object({ method: z.literal('node.list'), params: z.object({ filter: z.object({ type: z.enum(['channel','agent','plugin','tool']).optional(), status: z.enum(['online','offline','error']).optional(), }).optional(), }), }); export const NodeDescribeRequest = z.object({ method: z.literal('node.describe'), params: z.object({ nodeId: z.string() }), }); export const NodeInvokeRequest = z.object({method: z.literal('node.invoke'), params: z.object({ nodeId: z.string(), action: z.string(), payload: z.unknown(), }), }); |
The protocol adopts a JSON-RPC-like request/response pattern. Core methods include:
| Method | Purpose |
| sessions.patch | Modify session parameters (thinking level, active agent, queue mode, etc.) |
| sessions.list | List all active sessions and their status |
| node.list | List all registered nodes (channels, agents, plugins, tools) |
| node.describe | Get detailed information and capability statement of the specified node |
| node.invoke | Send operation instructions to the specified node (such as requiring the channel to send messages, requiring the Agent to perform tasks) |
macOS/iOS native apps need to communicate with Gateway. To ensure consistency between TypeScript protocol definitions and Swift client code, OpenClaw includes a Swift codegen step in the build process. The build script parses the Zod Schema in src/gateway/protocol/schema/ and automatically generates the corresponding Swift Codable struct and enum. The generated code is located in apps/macos/Generated/ and apps/ios/Generated/. This means that protocol changes only need to modify the TypeScript Schema, and the Swift side will automatically synchronize, without the risk of manual synchronization missing.
Gateway supports three authentication modes, arranged in order of priority:
trusted-proxy: Gateway trusts requests from specific proxies (such as Nginx, Cloudflare Tunnel) and identifies them based on the HTTP header injected by the proxy. This is the recommended mode for production environments.
local-direct: When the WebSocket connection comes from 127.0.0.1, skip authentication and authorize directly. This is the default behavior for local development and standalone deployment.
gateway token: A static Token set through the configuration file, carried by the client through the Authorization: Bearer header during the WebSocket handshake. Used for remote access scenarios.
v2026.3.31 introduces an important security change: In trusted-proxy mode, Gateway will refuse the connection if multiple clients are detected using the same shared-token. The previous configuration of "multiple people sharing one token" is not recommended but works. The new version upgrades it to a hard error. This is because the sessions of different users cannot be distinguished in the shared-token scenario, which will lead to confusing message routing.
Gateway exposes a set of OpenAI compatible endpoints at the HTTP layer:
/v1/models: Returns a list of all available models in the current configuration, in a format compatible with the OpenAI List Models API. This allows any OpenAI API-compatible client (such as Cursor, Continue, etc.) to directly use OpenClaw Gateway as a model provider.
/v1/embeddings: Provides text vectorization interface, format compatible with OpenAI Embeddings API. The backend can be routed to the actual configured Embedding Provider (OpenAI, Ollama native model, etc.).
The health check endpoint follows Kubernetes conventions:
/healthz: Liveness Probe, returns 200 as long as the Gateway process is running.
/readyz: Readiness Probe, returns 200 only when at least one channel connection is successful and the Agent runtime has been initialized. Used by load balancers to determine whether a node can receive traffic.
Gateway directly serves a Web management interface——Control UI. The UI is developed using Lit 3 (Web Components framework) + Vite (build tool), and the source code is located in the ui/ directory. The build product is embedded in the static resources of Gateway when released and can be accessed directly through HTTP (default http://127.0.0.1:18789). Control UI itself is also a WebSocket client, maintaining a long connection with Gateway to achieve real-time status updates.
The specification document of Bridge Protocol is located at docs/gateway/bridge-protocol.md, which defines the communication convention between native applications and Gateway - including message encoding format, heartbeat mechanism, reconnection strategy, and event subscription model. This document is the core reference for native app developers.
OpenClaw supports 24 messaging channels in v2026.4.1. The core engineering challenge of the channel system is: how to abstract 24 messaging platforms with different characteristics and API styles into a unified set of inbound/outbound messaging models while retaining the unique capabilities of each platform.
| Channel | Underlying implementation | Type |
| Baileys (WhatsApp Web Reverse Protocol) | Core Channel (src/web/) | |
| Telegram | grammY | Core Channel (src/telegram/) |
| Slack | @slack/bolt | Core Channel (src/slack/) |
| Discord | discord.js | Core channel (src/discord/) |
| Signal | signal-cli (Java child process) | Core Channel (src/signal/) |
| BlueBubbles (iMessage) | BlueBubbles HTTP API | Core channel (src/imessage/), recommended method |
| iMessage (legacy imsg) | Native AppleScript/osascript | Core channel, marked legacy |
| Google Chat | Google Chat API | Built-in extensions |
| IRC | irc-framework | Built-in extensions |
| Microsoft Teams | Teams SDK (v2026.3.28 upgraded version) | Built-in extensions |
| Matrix | matrix-js-sdk + @matrix-org/crypto-wasm | Built-in extensions (extensions/) |
| Feishu | Feishu Open API | Built-in extensions |
| LINE | @line/bot-sdk | Built-in extensions |
| Mattermost | Mattermost REST API + WebSocket | Built-in extensions |
| Nextcloud Talk | Nextcloud Talk API | Built-in extensions |
| Nostr | nostr-tools | Built-in extensions |
| Synology Chat | Synology Chat Webhook | Built-in extensions |
| Tlon | Tlon API | Built-in extensions |
| Twitch | tmi.js | Built-in extensions |
| Zalo | Zalo Official Account API | Built-in extensions (extensions/) |
| Zalo Personal | Zalo Personal API (ZaloUser) | Built-in extensions (extensions/) |
| Voice Call | VoIP/SIP integration | Built-in extensions (extensions/) |
| WeChat (WeChat) | @tencent-weixin/openclaw-weixin (iLink Bot API) | Official cooperation plugin |
| WebChat | Gateway built-in WebSocket chat | Core channels |
The type contract of the channel system is defined by three core documents:
types.plugin.ts: Public types for plugin developers. The interfaces that channel plugins must implement are defined here, including ChannelAdapter (channel adapter), ChannelSender (message sender), and ChannelConfig (channel configuration Schema).
types.core.ts: core internal types, not exported through plugin-sdk. Contains routing table entries, session binding relationships, and internal message envelope (Envelope) format.
types.adapters.ts: Adapter auxiliary type, which defines the mapping interface from each platform's SDK events to a unified inbound format.
Unified messaging abstraction is the core design of the channel system. It is defined by three plugin-sdk subpaths:
channel-inbound: Defines a unified structure for all channel inbound messages. Regardless of whether the message comes from WhatsApp, Telegram or Discord, it is converted to the same InboundMessage type after being processed by the channel adapter. This type includes: channelId, peerId (sender ID), groupId (group ID, null for DM), content (text/media/mixed content), replyTo (reference message ID), timestamp, rawEvent (platform raw event, used for channel-specific logic).
channel-reply-pipeline: Defines the processing pipeline through which Agent responses pass. The pipeline stages include: content formatting (Markdown → platform-specific formatting) → long message chunking (per-channel chunking) → media attachment processing → platform API calls.
channel-send-result: Defines the unified structure of message sending results, including the message ID returned by the platform, sending status (success/failure/partial success), and error information.
In a group scenario, the Agent does not respond to all messages by default - this can lead to noise in the group. OpenClaw implements @mention gating (Mention Gating): Only when the message contains @mention to the Bot, the Agent will process the message. This behavior can be overridden by configuration to always mode (response to all messages) per channel/group.
Reply Tags solve another group problem: when multiple messages come in at the same time, the Agent's reply needs to mark its corresponding original message. This is implemented in Telegram via reply_to_message_id, in Discord via Message Reference, and in Slack via Thread TS. The channel adapter is responsible for mapping the unified replyTo field to the platform-specific reply mechanism.
Long message chunking (Per-channel Chunking) is another platform difference processing point. Telegram’s single message limit is 4096 characters, Discord’s is 2000 characters, and WhatsApp’s is about 65536 characters. The chunking stage in channel-reply-pipeline splits overly long Agent responses into multiple messages based on the constraints of the target channel, while ensuring that structures such as code blocks, Markdown lists, etc. are not truncated in the middle.
The private messaging (DM) scenario has an independent security model, defined by the channel-dm-security subpath. The core is the dmPolicy configuration item, which supports three modes:
pairing: Users must send a pairing code before activating a DM conversation. The pairing code is generated by the openclaw pair command and is used once. This is the safest mode.
allowlist: Only user IDs/mobile phone numbers listed in the allowFrom configuration can initiate DM conversations.
open: Anyone can start a DM conversation directly. It is only recommended for use in a controlled environment (such as intranet deployment).
WhatsApp: Based on Baileys library, using WhatsApp web protocol. The first connection requires scanning the QR code to complete the login. The QR code is rendered in the form of ASCII art in the terminal and displayed in the form of a picture in the Control UI. Session credentials are persisted to the local file system, and automatic recovery is initiated later.
Telegram: Supports two operating modes - Long Polling (default) and Webhook mode. In Webhook mode, Gateway registers a public HTTPS endpoint to receive Telegram push, which has lower latency but requires an address reachable by the public network (usually implemented through Cloudflare Tunnel or ngrok). The grammY framework provides a complete type-safe encapsulation of the Bot API.
Discord: Supports two interaction modes: native Slash Commands (/ask, /image, etc.) and plain text commands. discord.js provides a rich event model, and OpenClaw leverages its Message Component capabilities to implement interactive buttons and selection menus.
Microsoft Teams: Version v2026.3.28 includes a major upgrade to Teams integration, moving to a new version of the Teams SDK. The new version supports streaming replies (Streaming Replies). The Agent's replies can stream into a Teams conversation in real time with AI annotation tags, making it clear that the message came from AI.
WeChat (WeChat): Implemented through official cooperation channels, using the @tencent-weixin/openclaw-weixin package and accessing the iLink Bot API at the bottom. Currently only private messages are supported, group chats are not supported. The v2.x version requires OpenClaw core version ≥ 2026.3.22.
OpenClaw's Agent runtime is built on top of the Pi Agent. This is not a self-developed Agent framework, but a deep integration of external libraries @mariozechner/pi-agent-core@0.64.0 and @mariozechner/pi-ai@0.64.0. The Pi ecosystem also includes pi-coding-agent (a special agent for code generation) and pi-tui (terminal UI).
When Pi Agent runs in RPC mode, it supports two streaming output protocols:
Tool Streaming: When the Agent calls a tool, the execution process and intermediate results of the tool are returned in a streaming manner. For example, when the Agent calls the search tool, each search result is pushed as a stream event, instead of waiting for all results to be returned before outputting them all at once.
Block Streaming: The agent's text response is streamed out in blocks. A "block" can be a paragraph, a block of code, or a list. Block streaming is more suitable for message channel scenarios than token-by-token streaming - the channel adapter can send each block immediately when it is completed, instead of accumulating the entire response and sending it, and also avoids the frequent API calls caused by token-by-token sending.
OpenClaw's Session model is key to understanding message routing. Each Agent maintains multiple independent sessions (Session), and the sessions are completely isolated:
DM Session: Conversations with each DM user constitute an independent session. A session is uniquely identified by the (agentId, channelId, peerId) triplet.
Group Session: Each group has an independent session, identified by the (agentId, channelId, groupId) triplet. Group conversations are completely isolated from DM conversations - Agents cannot see the private chat history of the same user in the group, and vice versa.
The activation mode of the session controls when the Agent responds: in mention mode, only @mention triggers a response; in always mode, all messages trigger a response. The default is always for DM conversations and mention for group conversations.
Queue Mode controls the processing strategy of concurrent messages: in sequential mode, messages are processed one by one in strict order of reception; in parallel mode, multiple messages can be processed in parallel (suitable for stateless tool calling scenarios).
Reply-back routing ensures that the Agent's response is sent to the correct channel and conversation. When the Agent triggers a cross-channel operation through a tool call (such as asking the Agent to send a message to a Slack channel in a Telegram conversation), the reply-back route is responsible for routing the operation results back to the Telegram conversation that initiated the request.
Three built-in tools enable Agents to have cross-session/cross-Agent coordination capabilities:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
// sessions_list: List all currently active sessions { name: 'sessions_list', description: 'List all active sessions with their channel, peer, and status', parameters: { filter: { type: 'object', properties: { channelId: { type: 'string' }, status: { enum: ['active', 'idle', 'archived'] } }} } } // sessions_history: Read the historical messages of the specified session { name: 'sessions_history', description: 'Read message history from a specific session', parameters: { sessionId: { type: 'string' }, limit: { type: 'number', default: 50 } } } // sessions_send: Send messages to the specified session (implementing Agent-to-Agent communication) { name: 'sessions_send', description: 'Send a message to a specific session (enables agent-to-agent coordination)', parameters: { sessionId: { type: 'string' }, content: { type: 'string' } } } |
sessions_send is the key to multi-Agent coordination. Agent A can discover Agent B's session through sessions_list, and send instructions or queries to Agent B through sessions_send. Agent B's response will return Agent A's session context through reply-back routing.
OpenClaw supports running multiple Agents in the same instance, and each Agent has independent configuration and session space. Routing rules are defined in the configuration file and support distributing inbound messages to different Agents according to the three dimensions of channel, account, and peer:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# config.yaml Multi-Agent routing example agents: - id: general-assistant provider:openai model: gpt-4o routes: - channel: telegram account: "@mybot" - channel: discord account: "bot-token-1" - id: coding-helper provider: anthropopic model: claude-sonnet-4-20250514 routes: - channel: slack account: "workspace-1" peers: ["U12345678"] # Only messages from specific users are routed to this Agent |
Each Agent has independent workspace and session storage to achieve complete isolation.
The runtime context for each Agent is provided by the ~/.openclaw/workspace/ directory. Three special files in this directory will be automatically injected into the Agent’s system prompts:
AGENTS.md: Defines the Agent's roles, behavioral guidelines, and constraints. This is the core definition file for the Agent personality.
SOUL.md: More fine-grained personality description - tone, conversation style, knowledge field preferences, etc.
TOOLS.md: Tool usage guide, telling Agent the usage scenarios and best practices of each available tool.
These three files are all in Markdown format and can be edited freely by users. No need to restart after modification - Gateway will check the mtime of the file before processing each session message, and reload it if there is any change.
Session history is persisted to ~/.openclaw/agents//sessions/*.jsonl in JSONL (JSON Lines) format. One file per session, one message per line. The JSONL format was chosen carefully: it supports append-only writing (crash safety), supports line-wise incremental reading (memory efficiency), and can be inspected directly with standard text tools.
Long-running sessions can accumulate large amounts of history, causing context window overflow and increased latency. OpenClaw provides two coping mechanisms:
Session Pruning: Automatically delete old messages that are older than a configured time window (default 7 days). The pruning operation is triggered when the session is activated and is lazy.
Session Compaction: Triggered manually via the /compact command. The compression process calls an AI model to summarize the long history into a condensed contextual summary, replacing the original message-by-message record. Compressed session files can be reduced in size by more than 80% while retaining key contextual information.
OpenClaw exposes granular control over the "depth of thinking" of AI models. The thinkingLevel parameter supports six levels:
| Level | Behavior |
| off | Disable Extended Thinking and generate responses directly |
| minimal | Minimum thinking budget |
| low | Low thinking budget, suitable for simple tasks |
| medium | Medium thinking budget, default |
| high | High thinking budget, suitable for complex reasoning |
| xhigh | Extremely high thinking budget, used in scenarios that require deep reasoning |
Thinking Level can be adjusted dynamically at the session level through the sessions.patch protocol method, or a global default can be set in the configuration file. Providers that support extended thinking (such as Anthropic Claude) will adjust the budget limit of thinking tokens based on the level.
idle-stream timeout introduced in v2026.3.31 solves a practical operation and maintenance problem: when the Model Stream does not output new tokens for a long time (for example, the model server is stuck or the network is interrupted), the Agent will wait without releasing the session lock, causing all subsequent messages of the session to accumulate. The idle-stream timeout allows you to configure a timeout (default 120 seconds). When the stream has no new data within the specified time, the Agent will actively interrupt the stream and return a partial response or error message. This timeout is adjustable per Provider in the configuration file - a longer timeout may be required when using native Ollama models.
The personalization capabilities of an AI assistant depend on the depth of the memory system. OpenClaw's memory subsystem memory-core is the most detailed part of the module split in the entire project. It consists of 13 sub-modules, all exported through plugin-sdk. The design goal is clear: all memory data is persisted in the form of local Markdown files and SQLite databases, which can be directly edited by users, can be controlled by Git, and can be run offline.
There are 13 memory-related export paths in plugin-sdk, and each path corresponds to an independent compilation unit:
| Export path | Responsibilities |
| memory-core | Root module, defines MemoryStore interface, MemoryEntry type, TTL policy and serialization contract |
| memory-core-engine-runtime | When the engine is running, bind memory operations to the current Agent runtime life cycle |
| memory-core-host-engine-embeddings | Embedding engine host: schedules Embedding model calculation vectors and manages batch embedding queues |
| memory-core-host-engine-foundation | Basic engine host: Provides tokenizer binding, vector dimension negotiation, and distance metric selection |
| memory-core-host-engine-qmd | QMD (Query-Memory-Document) engine: Semantically matching user queries with memory documents |
| memory-core-host-engine-storage | Storage engine host: abstracts the underlying storage backend (SQLite, LanceDB) and provides unified CRUD |
| memory-core-host-multimodal | Multimodal memory: processing the indexing and retrieval of non-text memory items such as pictures and audio |
| memory-core-host-query | Query host: Build semantic search queries, combining keyword filtering and vector similarity |
| memory-core-host-runtime-cli | CLI runtime host: Exposing terminal commands such as openclaw memory search |
| memory-core-host-runtime-core | Core runtime host: memory system initialization, migration and life cycle management |
| memory-core-host-runtime-files | File runtime host: monitor changes in Markdown memory files and trigger re-indexing |
| memory-core-host-secret | Key host: manages encryption keys stored in memory and SecretRef parsing |
| memory-core-host-status | Status host: reports index progress, number of vectors, recent query latency and other operating indicators |
This split method follows OpenClaw's plugin architecture principles: each sub-module can be replaced or disabled independently, and the core system only relies on the interface defined by the memory-core root module and does not directly rely on any specific storage backend.
OpenClaw's memory system stores user preferences and long-term context as local Markdown files, located by default in the ~/.openclaw/memory/ directory. Each memo file is standard Markdown with YAML front-matter metadata:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
--- type: preference created: 2026-03-15T08:22:00Z updated: 2026-04-01T14:30:00Z tags: [coding-style, language] --- # Coding Preferences - Preferred language: TypeScript with strict mode - Tab width: 2 spaces -Always use explicit return types - Prefer functional composition over class inheritance |
The core advantages of this design lie in three points: first, users can directly modify the memory content using any text editor without entering the OpenClaw interface; second, memory files can be included in Git version control, and team members can share and synchronize preference configurations; third, memory content is completely available offline and does not rely on any cloud services. The memory-core-host-runtime-files module detects changes in Markdown files through file system monitoring (fs.watch) and automatically triggers the re-indexing process - parsing front-matter, extracting text, calculating embedding vectors, and updating vector storage.
Semantic search relies on vector storage. OpenClaw provides two backend options:
sqlite-vec (version 0.1.9) is the default backend. It is a vector search extension for SQLite, declared in package.json as an npm dependency sqlite-vec@0.1.9. sqlite-vec stores vectors as BLOB columns in SQLite tables, supporting exact nearest neighbor (Exact KNN) and quantization-based approximate nearest neighbor (ANN) searches. For individual use cases - typically with memory entries on the order of hundreds to thousands - sqlite-vec's exact KNN is efficient enough, with query latencies in the sub-millisecond range. The advantages of sqlite-vec are fully consistent with OpenClaw's local-first philosophy: single-file database, zero external dependencies, straightforward backup and migration.
memory-lancedb is the second backend, also exported through plugin-sdk. LanceDB is an embedded vector database. The bottom layer uses Lance columnar format and supports IVF-PQ index. It is suitable for scenarios where memory entries reach hundreds of thousands. The memory-core-host-engine-storage module isolates the two backends through a unified storage abstraction layer, and the upper-layer code does not need to be aware of the underlying implementation differences:
|
1 2 3 4 5 6 7 8 |
// memory-core-host-engine-storage abstract interface export interface VectorStorageBackend { insert(entries: MemoryEntry[]): Promise; search(query: Float32Array, topK: number, filter?: MemoryFilter): Promise<ScoredEntry[]>; delete(ids: string[]): Promise; count(): Promise; vacuum(): Promise; } |
memory-core-host-engine-embeddings manages the complete pipeline of embedded computations. When a memory file is created or modified, this module performs the following process:
- Parse the Markdown file, divide the text into paragraphs (chunking), and control each block within 512 tokens
- Calling the currently configured Embedding Model to calculate vectors, using the embedding endpoint configured in the provider plugin by default
- Write vectors to vector storage together with metadata (source file path, chunk offset, timestamp, label)
- Maintain an incremental index: only recalculate embeddings for changed blocks, and retain the original vectors for unmodified blocks
memory-core-host-engine-qmd (QMD engine) is responsible for semantic matching during query time. The full name of QMD is Query-Memory-Document, which implements a three-stage retrieval process: first calculate the embedding vector for the user query, then perform an approximate nearest neighbor search in the vector storage to obtain the candidate set, and finally use BM25 keyword scoring to re-rank the candidate set. The memory-core-host-query module is responsible for constructing query objects and combining conditions such as semantic similarity thresholds, tag filtering, and time ranges into unified query descriptors.
The memory system is the cornerstone of OpenClaw's personalization capabilities. When the Agent runtime processes each round of dialogue, it will retrieve relevant memories through memory-core-engine-runtime and inject them into the system prompt words. This process is transparent to the user, but directly affects how personalized the Agent’s responses are—it knows the user’s preferred programming language, coding style, common tool chains, and even project context established in past conversations.
OpenClaw's Model Provider system is the core of its multi-model support capabilities. There are more than 25 provider plugins exported through plugin-sdk, covering mainstream commercial APIs, open source inference engines, and cloud platform gateways. Each provider is an independent plugin and follows a unified registration, authentication and model directory protocol.
Each provider plugin consists of four core files:
| File | Responsibilities |
| provider-entry.ts | Plugin entry point, register the provider to the plugin registry, declare supported functional features (Feature Flags) |
| provider-auth.ts | Authentication logic, implementing API Key or OAuth process |
| provider-catalog-shared.ts | Model directory, listing all models supported by this provider and their capability tags (text/images/code, etc.) |
| provider-model-shared.ts | Model sharing configuration, defining metadata such as token restrictions, pricing information, context window size, etc. |
Provider plugins are registered via the export path of plugin-sdk. Taking the OpenAI provider as an example, the export path is plugin-sdk/provider-openai, Anthropic is plugin-sdk/provider-anthropic, and so on.
As of v2026.4.1, plugin-sdk exports the following providers:
| Provider | Model example | Authentication method |
| OpenAI | GPT-4o, o3, o4-mini, Codex | API Key / OAuth |
| Anthropic (Claude) | Claude Sonnet 4, Opus 4 | API Key / OAuth |
| Google (Gemini) | Gemini 2.5 Pro, Flash | API Key |
| DeepSeek | DeepSeek-V3, DeepSeek-R1 | API Key |
| xAI (Grok) | Grok-3, Grok-3-mini | API Key |
| Ollama | Deploy any GGUF model locally | None (local) |
| Mistral | Mistral Large, Codestral | API Key |
| MiniMax | MiniMax-Text-01, image-01 | API Key |
| Moonshot (Dark Side of the Moon) | Kimi | API Key |
| ModelStudio (Tongyi Qianwen) | Qwen-Max, Qwen-Plus | API Key |
| Qianfan (Baidu Wenxin) | ERNIE-4.0, ERNIE-Speed | API Key |
| NVIDIA | Nemotron, Llama 3 NVIDIA | API Key |
| HuggingFace | Inference API Hosting Model | API Token |
| Together | Llama, Mixtral and other open source models | API Key |
| Venice | Privacy-first reasoning | API Key |
| vLLM | Self-hosted vLLM instance | Custom |
| SGLang | Self-hosted SGLang instance | Custom |
| BytePlus (Volcano Engine) | Big bean bag model | API Key |
| Cloudflare AI Gateway | Workers AI Agent | API Token |
| Amazon Bedrock | Claude on Bedrock, Titan | AWS IAM |
| Anthropic Vertex | Claude on Vertex AI | GCP Service Account |
| Chutes | GPU inference market | API Key |
| KiloCode | KiloCode model | API Key |
| Kimi Coding | Kimi code model | API Key |
| OpenCode / OpenCode Go | Open source code reasoning | API Key |
The authentication subsystem consists of four modules: provider-auth-api-key (API Key authentication), provider-auth-login (OAuth login authentication), provider-auth-result (authentication result encapsulation) and provider-auth-runtime (runtime authentication status management).
Most providers support API Key single mode, but mainstream providers such as OpenAI and Anthropic support both OAuth login and API Key. In OAuth mode, after the user completes authorization through the browser, OpenClaw obtains the access token and automatically manages the refresh process. This dual-mode design (Auth Rotation) allows users to seamlessly switch to their own API Key after the free quota is used up, and vice versa.
Synthetic Auth is implemented through the resolveSyntheticAuth function. When multiple providers share the same underlying credentials (for example, Anthropic Vertex uses GCP credentials instead of the Anthropic native API Key), synthetic authentication converts the underlying credentials into the format expected by the provider. The implementation is located in the authentication runtime module:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
//Synthetic authentication parsing in provider-auth-runtime export async function resolveSyntheticAuth( provider: ProviderId, secretStore: SecretStore ): Promise { const secretRef = getProviderSecretRef(provider); const rawCredential = await secretStore.resolve(secretRef); //Convert credential format based on provider type switch (provider) { case 'anthropic-vertex': return synthesizeVertexAuth(rawCredential as GCPServiceAccount); case 'amazon-bedrock': return synthesizeBedrockAuth(rawCredential as AWSCredentials); default: return { type: 'api-key', key: rawCredential as string }; } } |
SecretRef is OpenClaw's credential reference semantics. Credentials are not stored in clear text in the configuration file, but instead reference the operating system's keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service) through a SecretRef. The format of SecretRef is secretref:
:, resolved to the actual credential value at runtime by the memory-core-host-secret module.
Model Failover is OpenClaw's core mechanism to deal with API rate limits and service interruptions (see docs.openclaw.ai/concepts/model-failover for details). When the primary model returns a 429 (Rate Limited) or 5xx error, the system automatically routes the request to a preconfigured alternative model. Failover configuration is defined in the user's settings file:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
{ "models": { "primary": "anthropic:claude-sonnet-4-20260514", "fallback": [ "openai:gpt-4o", "google:gemini-2.5-pro" ], "failover": { "maxRetries": 2, "retryDelayMs": 1000, "fallbackOnRateLimit": true, "fallbackOnServerError": true } } } |
The failover logic is implemented at the routing layer (src/routing/) and is transparent to the upper-layer Agent runtime. The routing layer maintains health status and rate limiting windows for each provider, trying alternative models in fallback list order when the primary model is unavailable.
The v2026.3.28 version introduces three important changes to the provider system:
xAI migration to Responses API: The xAI provider migrated from the traditional Chat Completions API to the Responses API format, while enabling x_search native web search functionality. Grok models can call xAI's search infrastructure directly in the conversation, without the need for an additional layer of tool calls.
MiniMax image generation: The MiniMax provider has added support for the image-01 model, which enables venison plot capabilities through MiniMax's image generation API. This feature is registered as a Provider-owned Tool, following the OpenClaw design principle - provider-specific tools and settings belong to the provider plugin, not the core system.
Tongyi Qianwen authentication changes: Qwen’s portal auth mode has been removed and switched to Model Studio API Key authentication. This is a breaking change and existing portal auth users will need to manually migrate to API Key mode.
OpenClaw supports GitHub Copilot account login through two export modules, plugin-sdk/github-copilot-login and plugin-sdk/github-copilot-token. Users with Copilot subscriptions can directly use GitHub account authentication to access underlying models (GPT-4o, Claude, etc.) through Copilot's infrastructure without the need to configure each provider's API Key separately. The authentication process reuses GitHub's Device Flow OAuth, and after obtaining the Copilot token, the github-copilot-token module manages the token refresh.
Agent Client Protocol (ACP) is a stateful Agent session protocol defined by OpenClaw. The core idea of ACP is to decouple AI Agent interaction from a specific chat interface, allowing it to start and manage stateful Agent work sessions through any communication channel (Discord, iMessage, terminal, etc.). The project relies on @agentclientprotocol/sdk@0.17.1 to provide the core type and client implementation of the protocol.
ACPX (repository openclaw/acpx, 1,834 stars) is a headless ACP CLI client for OpenClaw. It allows users to create, manage, and interact with ACP sessions from the command line, without the need for a graphical interface. Typical usage scenarios for ACPX include Agent automation in CI/CD pipelines, server-side deployment, and script orchestration.
ACP sessions can be bound to any chat channel. With the /acp spawn codex --bind here command, the user can create an ACP session in the current channel context. Currently supported bindings include:
- Discord: Through Discord Bot channel binding, ACP sessions are mapped to Discord threads
- BlueBubbles: iMessage bridge on macOS, ACP sessions access iMessage via BlueBubbles API
- iMessage: Direct iMessage binding (macOS/iOS only)
The core layering of ACP requires a clear distinction between three concepts: Chat Surface is the UI layer for user interaction, which can be a Discord channel, terminal window or web interface; ACP Session is a stateful Agent Interaction context, maintains conversation history, workspace status, and tool authorization; Runtime Workspace is the file system sandbox where the Agent actually performs operations. A chat surface can be associated with multiple ACP sessions, and each ACP session is bound to a unique runtime workspace.
OpenClaw integrates Model Context Protocol (MCP) and relies on @modelcontextprotocol/sdk@1.29.0. MCP defines a standard communication protocol between AI models and external tools, and OpenClaw exposes external MCP tool servers to the Agent runtime through the MCP bridge layer.
v2026.3.31 introduces a critical security change for ACPX plugin-tools MCP bridging: MCP tools are off by default (explicit default-off) and must be explicitly enabled in the configuration. This change stems from the security considerations of Trust Boundary Hardening - external MCP tool servers may execute arbitrary code, and enabling it by default will expand the attack surface. Enable configuration example:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
{ "mcp": { "servers": { "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"], "enabled": true } }, "trustPolicy": "prompt-per-tool" } } |
The trustPolicy supports three levels: prompt-per-tool (user confirmation is required for each tool call), prompt-once (automatic trust after first confirmation), and trust-all (full trust, only recommended for use in controlled environments).
For OpenAI and Codex series models, OpenClaw enables the apply_patch tool by default. This is a code editing tool natively supported by the OpenAI Codex model. It returns structured patch instructions directly through the API, which are modified by the runtime execution file of OpenClaw. Compared with letting the model output the complete file content and then doing diff, apply_patch reduces the output token consumption and reduces the error rate when editing large files. apply_patch's sandbox permissions are aligned with write permissions - In a non-master session's Docker sandbox, apply_patch's write scope is subject to the same constraints as normal file writes.
v2026.3.31 Migrate the default behavior of inference for the three main CLI backends - Claude CLI, Codex CLI and Gemini CLI - to their respective bundled plugin. Through the Plugin SDK's cli-backend and cli-runtime export paths, the CLI backend can register custom inference flows, tool exposures, and session management policies. The significance of this migration is decoupling - the core no longer hard-codes the behavior of the CLI backend, and third-party plugins can register custom CLI backends through the same interface.
The deep value of ACP is reflected in the Agent-to-Agent (A2A) communication capability. OpenClaw's Session toolset—sessions_list, sessions_history, sessions_send—allows one Agent session to discover, query, and send messages to another Agent session. sessions_send supports optional reply-back mode (ping-pong communication) and announce steps, allowing structured coordination conversations between Agents.
In multi-Agent deployment scenarios (for example, one Agent is responsible for customer conversations and another Agent is responsible for back-end task execution), A2A communication avoids the complexity of requiring external message queues in traditional architectures. All communication is routed through the Gateway's WebSocket control plane, and agents share the same runtime infrastructure but have isolated session contexts and workspaces.
The new ACP channel binding in v2026.3.31 further extends this capability: /acp spawn codex --bind here can directly bind the current chat surface as a Codex-driven workspace without creating a child thread. This way, users can launch a coding agent directly in a Discord channel, and the agent's output appears directly in the conversation flow.
OpenClaw's media processing pipeline is located in the src/media/ directory and is responsible for the preprocessing, understanding and life cycle management of all non-text content. Three core modules are exported through plugin-sdk: media-runtime (runtime pipeline scheduling), media-understanding (media content understanding interface) and media-understanding-runtime (runtime binding of understanding modules). There is also web-media that exports media-specific logic that handles web channels.
Image processing depends on sharp@0.34.5 - the highest performance image processing library in the Node.js ecosystem, which uses libvips at the bottom. OpenClaw uses sharp to perform the following processing:
- Resize: Scale the image uploaded by the user to the maximum resolution supported by the model to avoid wasting tokens or exceeding API limits
- Format conversion: uniformly convert BMP, TIFF, WebP and other formats to JPEG or PNG to ensure that all providers can receive it
- Metadata stripping: remove privacy data such as geographical location and device information from EXIF information
- Thumbnail generation: Generate low-resolution previews for UI display
File type detection uses file-type@22.0.0, which determines the file type based on the Magic Number instead of the file extension to prevent malicious file camouflage.
PDF processing depends on pdfjs-dist@5.6.205 (the npm distribution of Mozilla PDF.js). The processing flow includes text extraction, page rendering to images (for visual understanding of multimodal models), and structured content parsing. For large PDFs, OpenClaw implements a pagination handling strategy—extracting only page ranges relevant to the context of the current session, rather than loading the entire document at once.
Audio and video processing pipelines process multimedia files uploaded by users or audio streams captured via voice input. Audio processing includes format conversion (unification to WAV/MP3), sample rate normalization and silence detection. The Transcription hook converts audio input into text and integrates it into the Agent's conversation flow - voice messages are automatically transcribed and processed as text messages, and the Agent can selectively reply in voice or text.
Video processing adopts a key frame extraction strategy: extract key frames from the video at fixed intervals or scene change detection, and send them as image sequences to the multi-modal model for understanding, avoiding the high computational cost of processing the complete video stream.
Each channel (Channel) can independently configure the maximum size of media files. For example, the configuration of the Discord channel:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "channels": { "discord": { "mediaMaxMb": 25 }, "web": { "mediaMaxMb": 100 }, "cli": { "mediaMaxMB": 500 } } } |
Files that exceed the limit are rejected in the pre-processing stage and do not enter the subsequent stages of the pipeline. Temporary files (intermediate products during processing) follow strict life cycle management: each media processing task creates an independent temporary directory, which is cleaned up after the processing is completed regardless of success or failure. The media-runtime module maintains a temporary file registry and performs cleanup-on-exit when the process exits to prevent disk leaks.
The two SDK export paths media-understanding and media-understanding-runtime define the interface and runtime implementation of media content understanding. Media understanding is more than just format conversion—it transforms images, documents, and audio into structured input that models can consume. For images, the understanding pipeline extracts text (OCR) from images and identifies objects and scenes; for PDFs, it generates page summaries and structured paragraph indexes; for audio, it outputs timestamped transcripts.
The output format of multimodal understanding follows the requirements of each model provider. OpenAI's GPT-4o and Anthropic's Claude Sonnet 4 accept base64-encoded images embedded in the message body; Google Gemini supports larger media files that can be referenced after uploading through the File API. The responsibility of media-understanding-runtime is to select the optimal encoding and transmission strategy based on the currently active model providers.
OpenClaw integrates @mozilla/readability@0.6.0 (Mozilla's readability extraction library) and linkedom@0.18.12 (a lightweight DOM implementation) for extracting body text from web content. When the Agent uses a browser tool to access a web page, after the original HTML is parsed by linkedom, the Readability algorithm extracts the core text content and strips away noisy elements such as navigation bars, advertisements, and sidebars. The extracted plain text enters the Agent's context window, significantly reducing token consumption compared to injecting original HTML.
Markdown rendering is handled by markdown-it@14.1.1. Before being sent to each channel, the Markdown formatted reply output by the Agent is formatted according to the capabilities of the target channel: Discord natively supports Markdown, Telegram supports some Markdown subsets, WhatsApp uses WhatsApp-style text formatting, and SMS/iMessage is reduced to plain text.
OpenClaw's speech system covers the complete link from voice wake-up to speech synthesis. plugin-sdk exports three speech modules: speech (public interface), speech-core (core implementation) and speech-runtime (runtime binding). The voice function is divided into four forms according to the platform and interaction mode.
Voice Wake (see docs.openclaw.ai/nodes/voicewake for details) is the wake word feature for macOS and iOS platforms. The device continuously monitors ambient audio and activates the Agent session after detecting the preset wake word. Wake word detection runs locally on the device and does not send an audio stream to the cloud - consistent with OpenClaw's local-first principle.
Message forwarding after waking up is implemented through VoiceWakeForwarder. After the user's voice is converted into text through local speech recognition, VoiceWakeForwarder calls OpenClaw's CLI interface to pass the text to the Agent:
|
1 |
openclaw-mac agent --message "${text}" --thinking low |
The implementation of VoiceWakeForwarder requires special handling of Shell Escaping: the user's voice transcript may contain special Shell characters such as quotation marks, dollar signs, and backtick marks. Direct splicing into the command line may lead to injection risks or parsing errors. The forwarder performs strict shell escaping of the text before passing it on. The --thinking low parameter instructs the Agent to use a low-latency thinking mode, giving priority to response speed rather than reasoning depth, and adapting to the real-time requirements of voice interaction.
Talk Mode (see docs.openclaw.ai/nodes/talk for details) is the Android platform's continuous voice conversation mode. Different from Voice Wake's "wake up → single interaction" mode, Talk Mode maintains a continuously open voice channel - the user and Agent can have multiple rounds of voice conversations without re-awakening each round. Talk Mode uses VAD (Voice Activity Detection) to automatically determine the start and end of the user's speech to achieve a natural conversation rhythm.
The macOS platform also provides Push-to-Talk mode, which operates as a system-level overlay. The user activates the microphone input by long pressing the shortcut key, and then ends the recording and sends it after releasing it. This mode is suitable for asking quick questions in a desktop workflow without switching to an OpenClaw window. The overlay uses AppKit's NSPanel implementation and is set up to float above all windows.
Speech synthesis (TTS, Text-to-Speech) adopts a two-layer strategy. The preferred solution is ElevenLabs's API, which provides high-quality, low-latency, multi-language speech synthesis. When ElevenLabs is unavailable (the network is offline or the API Key is not configured), the system automatically falls back to the platform's native TTS: macOS uses AVSpeechSynthesizer, iOS uses AVSpeechSynthesizer (same framework), and Android uses android.speech.tts.TextToSpeech.
In addition, OpenClaw integrates node-edge-tts@1.2.10 as a third-layer TTS backend. Edge TTS calls the online TTS service of Microsoft Edge browser. It is free and supports multi-language and multi-sound. It is a practical intermediate option in scenarios where there is no ElevenLabs subscription but there is a network connection.
The Voice Call plugin is packaged in the extensions/ directory and distributed with OpenClaw as a built-in extension. It implements a complete voice call function - users can have real-time voice conversations with the Agent like making a phone call, and the two-way audio stream is transmitted through WebRTC or the platform's native audio framework.
The quality assurance of voice calls relies on Closed-Loop Testing. The test script is executed through test:voicecall:closedloop npm script, and the process is as follows: automatically generate test text → TTS synthesizes into audio → audio is fed to the voice call pipeline as input → Agent processes and generates a reply → TTS synthesizes reply audio → transcribes the reply audio into text → compares the semantic consistency of the original text and the reply content. This end-to-end closed loop eliminates the uncertainty of manual testing and ensures that every link in the voice pipeline (ASR → Inference → TTS) works properly.
|
1 2 3 4 5 6 7 8 9 10 11 |
# Execute voice call closed-loop test pnpm test:voicecall:closedloop # Test process: # 1. Generate test prompt # 2. TTS synthesizes input audio # 3. Inject audio into the voice call pipeline # 4. Wait for Agent response # 5. Capture TTS output audio # 6. ASR transcription output # 7. Assertion: The output text matches the expected semantics |
The entire voice system embodies OpenClaw's pursuit of multi-terminal consistency: the same Agent can receive voice input in four ways: wake word, continuous voice, button to talk or voice call, and output voice replies in three ways: ElevenLabs, Edge TTS or system native TTS. All combinations behave consistently on various platforms. Speech capabilities are not an add-on feature, but a first-class interaction mode on par with text channels.
OpenClaw's multi-port strategy is not a simple WebView wrapper. The three native clients of macOS, iOS, and Android each bear differentiated responsibilities: the macOS application is the developer's local console and debugging center, the iOS application is a lightweight node (Node) on the mobile terminal, and the Android application is oriented to the widest range of device command groups. The three communicate unifiedly through the Gateway WebSocket protocol to achieve cross-platform node registration, command dispatch and canvas synchronization.
The source code of the macOS application is located in apps/macos/. It adopts the SwiftUI + AppKit hybrid architecture and uses the menu bar resident icon as the interactive entrance. In the OpenClaw internal vocabulary, the code name for macOS applications is makeup (short for "mac app").
The core functions of the application cover the following aspects:
Gateway health monitoring: The menu bar icon reflects the status of the Gateway process in real time, including the number of connections, memory usage and heartbeat delay. The panel that pops up by clicking the icon provides a one-click restart entry. The restart of the Gateway must be performed through the OpenClaw Mac application itself or the scripts/restart-mac.sh script, rather than manually in tmux - the latter will bypass the process monitoring chain and lead to inconsistent status.
Voice wake and Push-to-Talk floating layer: Voice Wake continuously monitors the wake word, and the PTT (Push-to-Talk) overlay resides on the desktop in the form of a translucent floating window. The two together form the macOS native entrance to voice interaction.
WebChat embedding and debugging tool: The embedded WebChat view supports real-time conversations with Gateway, while exposing the debugging panel for viewing message flow, tool call logs and token consumption.
SSH tunnel remote control: macOS applications can connect to remotely deployed Gateway instances through SSH tunnels and control cloud services in the local menu bar.
The state management of macOS applications has been fully migrated to the Observation framework introduced in Swift 5.9. The @Observable macro is used to mark observable types and @Bindable is used to implement property-level two-way binding. The legacy ObservableObject / @StateObject / @Published patterns have been explicitly deprecated — any remaining legacy usage should be migrated to the new framework. The advantage of the Observation framework is more fine-grained dependency tracking: SwiftUI only re-renders the view when the property that is actually read changes, rather than the overall notification mode of ObservableObject.
|
1 2 3 4 5 6 7 8 9 10 11 12 |
// Correct: Observation framework @Observable final class GatewayMonitor { var isConnected = false var latencyMs: Int = 0 var sessionCount: Int = 0 } // Error: Deprecated old mode, do not use // class GatewayMonitor: ObservableObject { // @Published var isConnected = false // } |
macOS apps require signed builds for system permissions to persist across recompilations. Unsigned development builds will trigger a TCC (Transparency, Consent, and Control) permission reset pop-up window after each rebuild. The packaging script is located in scripts/package-mac-app.sh and is responsible for code signing, notarization (Notarization) and DMG encapsulation.
The system capabilities exposed by macOS Node Mode are controlled through TCC permission mapping:
| Node command | Function | TCC permissions |
| system.run | Execute local command and return stdout/stderr/exit code | needsScreenRecording flag |
| system.notify | Send user notification | notifications |
| canvas.* | Canvas operation routing | screen-recording |
| camera.* | Camera capture | camera |
The logs of macOS applications are uniformly queried through the scripts/clawlog.sh script. The underlying system uses the Unified Logging system of macOS and supports filtering by subsystem. Common operations:
|
1 2 3 4 5 6 7 8 |
# Track all OpenClaw subsystem logs in real time ./scripts/clawlog.sh --follow # Filter by category ./scripts/clawlog.sh --category networking --tail 100 # View specific subsystems ./scripts/clawlog.sh --subsystem ai.openclaw.gateway |
The iOS application source code is located in apps/ios/ and is a standard Xcode project + SwiftUI project. Different from macOS applications, iOS applications are positioned to run as remote nodes of Gateway, automatically pairing with Gateway instances in the LAN through the Bonjour Device Discovery (Device Discovery) mechanism, and establishing persistent connections through Gateway WebSocket.
Core capabilities provided by iOS nodes:
Canvas Surface: Renders Agent-driven canvas content on iOS devices, supporting touch interaction.
Voice Wake forwarding: The voice wake detection results on the iOS side are forwarded to the Gateway through WebSocket to achieve touch-free voice activation on the mobile side.
Talk Mode: The voice interaction mode of long pressing and speaking, the audio stream is directly transmitted to the Gateway for recognition and processing.
Camera Snap/Clip: Supports taking snapshots and collecting short video clips for use by the Agent's visual capabilities.
Screen Recording: Perform screen recording through ReplayKit and send the recording content to the Agent as context.
The version number of iOS applications is maintained in two locations: apps/ios/Sources/Info.plist and apps/ios/Tests/Info.plist. The key fields are CFBundleShortVersionString (display version number) and CFBundleVersion (build number). Both files must be updated simultaneously when publishing.
Android apps are located in apps/android/ and are built using Kotlin + Gradle. Compared with iOS applications, Android nodes expose a richer set of device command families (Device Command Families), taking full advantage of the openness of the Android platform.
The app UI is organized into three main tabs:
| Tab | Function |
| Connect | Device pairing entrance supports two methods: Setup Code and manual input |
| Chat Sessions | Conversation list and chat interface |
| Voice | Voice interaction control panel |
The device command group supported by the Android node is the richest among the three terminals:
| Instruction family | Ability |
| notifications | Read/Send system notifications |
| location | GPS positioning and geofencing |
| SMS | Reading and sending text messages |
| photos | Album access and photo upload |
| contacts | Read and write address book |
| calendar | Calendar event management |
| motion | Accelerometer, gyroscope and other sensor data |
| app update | Application self-update management |
In addition, the Android side also supports Canvas rendering, camera capture and screen recording capabilities.
|
1 2 3 4 5 6 7 8 9 10 11 |
# Unit testing (Play Debug variant) ./gradlew :app:testPlayDebugUnitTest # Third-party integration testing ./gradlew :app:testThirdPartyDebugUnitTest # Kotlin code style check ./gradlew :app:ktlintCheck :benchmark:ktlintCheck # Release AAB build bun apps/android/scripts/build-release-aab.ts |
The version information is defined in versionName (display version) and versionCode (numeric incremental version) in apps/android/app/build.gradle.kts.
The three native applications communicate with the Gateway through the unified Gateway WebSocket protocol. Core commands related to nodes include:
| Command | Direction | Function |
| node.list | Gateway → Client | Enumerate all connected nodes and their capability statements |
| node.describe | Gateway → Node | Query the detailed capability description and parameter schema of the specified node |
| node.invoke | Gateway → Node | Execute the command on the specified node and return the results |
The system permissions of the macOS platform are controlled through the TCC framework, covering screen-recording, notifications, camera, and location. Each permission is bound to a specific node command capability, and the application will request authorization the first time it is called.
Session-level privilege escalation is controlled via the /elevated on|off command. When enabled, the current session gains full bash access; when closed, it falls back to the restricted execution surface. This command is independent per session and does not affect other concurrent sessions.
v2026.3.31 introduces two breaking changes related to node security. First, node commands are no longer automatically enabled after Device Pairing is completed—node commands must be exposed to the Agent only after explicit Node Pairing Approval. Device pairing only establishes a WebSocket connection channel, while node pairing approval confirms the user's authorized intent to expose the device's capabilities.
Second, the runs initiated by the node (Node-originated Runs) are restricted to the reduced trusted execution surface (Reduced Trusted Surface). Even if the node itself has full capabilities, the execution flow actively triggered from the node side can only use a predefined subset of security tools.
Live Canvas is an Agent-driven visualization workspace hosted by OpenClaw Gateway. Unlike traditional static output, Live Canvas is a persistent interactive screen - Agent can push content, reset state, execute scripts, and capture snapshots on it. The cross-end rendering of Canvas is implemented by native applications (macOS, iOS SwiftUI, Android), and the canvas control logic is unified and abstracted through the A2UI (Agent to UI) protocol.
A2UI defines the protocol specification for Agent to send control instructions to the UI layer. The A2UI implementation for Canvas host is located in the src/canvas-host/a2ui/ directory. The implementation is packaged as a standalone bundle that is loaded by Gateway at runtime and injected into the Canvas host container.
Bundle build products are version tracked through the hash file src/canvas-host/a2ui/.bundle.hash (automatically generated and should not be edited manually). The build command has two equivalent forms:
|
1 2 3 4 5 |
# via pnpm script pnpm canvas:a2ui:bundle # Via shell script scripts/bundle-a2ui.sh |
The building of the A2UI bundle is the first step in the overall pnpm build pipeline. The complete build pipeline is:
|
1 2 3 4 5 |
pnpm build # Equivalent to: # 1. pnpm canvas:a2ui:bundle # 2. tsdown-build.mjs # 3. runtime-postbuild.mjs |
The vendor source code of A2UI is maintained in the vendor/a2ui directory, and the shared encapsulation layer of the native side is located in apps/shared/OpenClawKit/Tools/CanvasA2UI.
The build of the A2UI bundle may fail in a cross-compilation environment. A typical scenario is building an amd64 target via QEMU on Apple Silicon - in which case A2UI's build step may crash due to QEMU's incomplete emulation of certain instruction sets. This has been protected in the Dockerfile: when the A2UI bundle fails to build, a stub file will be created instead to ensure that the overall build of the Docker image will not be interrupted. This means that the image produced by QEMU cross-compilation may not contain full Canvas functionality.
Canvas' operation model consists of four core primitives:
| Operation | Semantics | Typical uses |
| canvas.push | Append content (HTML/JS/CSS fragments) to Canvas | Incrementally build UI interface |
| canvas.reset | Clear Canvas and reinitialize | Switch context or reset state |
| canvas.eval | Execute arbitrary JavaScript in the Canvas context | Dynamic interaction logic, data visualization |
| canvas.snapshot | Capture a visual snapshot of the current Canvas | Record status and generate screenshot feedback |
The security positioning of canvas.eval needs special explanation. It is classified as an Operator Control Surface - meaning its security is the responsibility of the deployment operator (Operator), not the OpenClaw platform itself. Agent can execute arbitrary JavaScript code through canvas.eval, which gives great flexibility, but it also means that Operator must establish corresponding security lines in its own deployment environment.
In multi-end architecture, Canvas operations are exposed through node mode. All canvas.* calls are routed as node.invoke instructions and sent to the corresponding end-side node for execution. This means that the Agent can specify rendering of canvas content on a specific device (such as the user's iPad or Android phone), enabling cross-device visual workflow orchestration.
The three platforms have different implementations of Canvas rendering: macOS uses WebKit views, iOS uses SwiftUI native view layer combined with WebKit rendering, and Android uses Android WebView. But the upper-layer A2UI protocol ensures that the Agent does not need to care about the underlying rendering differences.
A2UI's bundle file is located in src/canvas-host/a2ui/a2ui.bundle.js, and its hash is recorded in src/canvas-host/a2ui/.bundle.hash (automatically generated and should not be edited manually). The build command is pnpm canvas:a2ui:bundle or scripts/bundle-a2ui.sh. In the complete build pipeline of pnpm build, the A2UI bundle is executed as the first step.
Cross-compilation is a known pain point. When building amd64 images on Apple Silicon, the A2UI bundle may fail due to limitations of the QEMU emulation environment. The Dockerfile handles this graceful degradation: when the bundle fails, it creates a stub file (containing the comment /* A2UI bundle unavailable in this build */), and cleans up the vendor/a2ui and apps/shared/OpenClawKit/Tools/CanvasA2UI directories at the same time to ensure that the build will not be interrupted due to A2UI being unavailable. CI builds are executed on the native architecture and therefore are not affected by this.
Safe positioning of a Canvas is key to understanding its design boundaries. canvas.eval allows Agent to execute arbitrary JavaScript code in Canvas, which is functionally equivalent to a browser-side eval(). OpenClaw explicitly categorizes this as the Operator control plane - similar to script execution in browser automation tools, the security responsibility lies with the deployer rather than the platform. This positioning is consistent with OpenClaw's single-user, local-first architecture: on the user's own device, the Agent already has the same permissions as the user. However, in multi-tenant or public deployment scenarios, the Operator must evaluate the risks caused by Canvas eval and make appropriate restrictions.
Lobster is a workflow orchestration shell in the OpenClaw ecosystem. The separate repository is openclaw/lobster (992 stars). Its positioning slogan is "OpenClaw-native workflow shell" — a workflow execution environment designed natively for OpenClaw.
The core abstraction of Lobster is Typed JSON Pipelines (Typed JSON Pipelines). Unlike the Unix shell's text pipeline, the data flowing in the Lobster pipeline is a JSON structure with type constraints. Each pipeline step declares its input schema and output schema, and Lobster can perform type checking during the pipeline assembly stage instead of exposing type mismatch issues at runtime.
The pipeline architecture is composable (Composable Pipeline Architecture): developers can link OpenClaw's Skills and Tools into multi-step workflows. Each step can be a Skill call, a Tool execution, a piece of custom logic, or a nested sub-pipeline.
The key flow control mechanism is Approval Gates (Approval Gates). Approval points can be inserted between any steps in the pipeline, where execution is paused and waits for confirmation from a designated approver (human or other agent) before moving forward. This is critical for automated processes that involve sensitive operations—for example, in a deployment pipeline where the code compilation step is automated but requires human approval before being pushed to production.
Lobster has strict requirements for visual consistency of terminal output. Color definitions are concentrated in the src/terminal/palette.ts module, which exports a shared set of terminal color palettes (Terminal Color Palette). All output to the terminal—including onboarding boot flows, config prompts, and TTY UI output—must reference color constants defined in the palette, and hardcoding color values in your code is strictly prohibited.
|
1 2 3 4 5 6 7 8 9 |
// src/terminal/palette.ts export const palette = { primary: chalk.hex('#5B8DEF'), success: chalk.hex('#6BCB77'), warning: chalk.hex('#FFD93D'), error: chalk.hex('#FF6B6B'), muted: chalk.gray,highlight: chalk.bold.white, // ...more color definitions } as const; |
This design ensures Lobster's visual consistency across different terminal emulators and color schemes, while simplifying theme customization.
Caclawphony (repo openclaw/caclawphony, 34 stars) is a Symphony system built on top of Lobster. Its core capability is to decompose project-level tasks into mutually isolated autonomous execution units (Isolated Autonomous Execution Runs). Each execution unit has an independent context, toolset and sandbox environment, and multiple units can run in parallel.
Caclawphony is suitable for scenarios where work needs to be divided and conquered, such as large-scale project refactoring and batch code migration. The project manager (human or agent) defines the task decomposition strategy at the top level, and Caclawphony is responsible for converting it into a collection of Lobster pipelines that can be executed in parallel.
Caclawphony complements the Session tool in the OpenClaw main repository: sessions_send provides point-to-point communication between Agents, while Caclawphony provides a task-level orchestration framework - it cares about "which work units require parallel/serial execution" rather than "how Agent A sends messages to Agent B". The combination of the two forms a complete Agent collaboration stack from a single conversation to complex project execution.
Lobster's choice of name (lobster) was not arbitrary. In the OpenClaw conceptual system, the lobster symbolizes two engineering concepts: First, the lobster's claws represent tools - each step in the Lobster pipeline is a tool call that can be run independently; second, the lobster's molt represents version evolution - the pipeline can replace the internal implementation while keeping the external interface unchanged.
Another key difference between Lobster pipelines and Unix pipelines is the error handling semantics. In Unix pipes, non-zero exit codes from upstream commands are ignored downstream (unless set -o pipefail is set). Each step of the Lobster pipeline must explicitly declare its error handling strategy: fail-fast (any error immediately terminates the entire pipeline), retry (retry with exponential backoff), skip (log the error but continue execution), or fallback (switch to an alternative step). This explicit error handling semantics makes the Lobster pipeline far more reliable than shell scripts.
The Lobster pipeline can be combined with OpenClaw's Cron scheduling system to achieve timing automation. Cron is responsible for triggering timing, and Lobster is responsible for executing logic. Typical applications include: executing code quality scanning pipelines every early morning, generating project status report pipelines every week, delaying execution of cleanup pipelines after specific events are triggered, etc. The Cron trigger passes the Lobster pipeline ID as the execution payload, and Gateway's scheduler is responsible for instantiating the pipeline and starting execution at the specified time.
OpenClaw's web management interface—Control UI—is hosted and distributed directly by the Gateway process, eliminating the need for a separate front-end server. The UI source code is located in the ui/ directory and is built using Lit 3 (Google's Web Components library), using Vite as the development server and build tool. The build command is pnpm ui:build, and the output is embedded in the static resource path of Gateway.
The choice of Lit over React/Vue/Svelte reflects OpenClaw's engineering preferences: Lit is based on the Web Components standard, does not require a virtual DOM runtime, the resulting bundle is extremely small, and is naturally compatible with Gateway's native HTTP service. The functions of Control UI cover session management, channel status monitoring, configuration editing, Skills management and Agent interaction. The UI building system supports the signal (Signals) responsive mode and implements fine-grained UI updates through @lit-labs/signals@0.2.0 and signal-utils@0.21.1.
UI also has a separate test pipeline pnpm test:ui, and a dedicated lint rule lint:ui:no-raw-window-open to prevent the use of raw window.open() in UI code (safety wrappers provided by the framework should be used).
WebChat (see docs.openclaw.ai/web/webchat for details) is a conversational interface embedded in Control UI that directly uses Gateway's WebSocket connection - no independent WebChat port or additional configuration is required. After installing the Gateway, the user can start a conversation with the Agent by visiting http://localhost:18789 in the browser.
WebChat is also an embedded Web view of macOS App, loaded directly through the WebKit view of macOS. This kind of architecture reuse ensures that the conversation experience on the web and macOS is consistent.
OpenClaw's browser control tool (see docs.openclaw.ai/tools/browser for details) is one of the most complex modules in the core tool system. It uses playwright-core@1.58.2 to control a dedicated Chromium instance through CDP (Chrome DevTools Protocol) - not the user's daily browser, but an independent instance managed by OpenClaw with an independent browser profile (Profile).
The core capabilities of browser control include:
- Page Snapshots: Capture the DOM status and visual rendering of the page for the Agent to analyze the page content
- Structured actions (Actions): click, fill in forms, scroll, navigate - Agent drives the browser through structured instructions instead of injecting free JavaScript
- File upload: Agent can instruct the browser to upload specified files in the file picker
- Multi-Profile Isolation: Different browser profiles can be used for different tasks to maintain the isolation of cookies and login status
Configuration for browser tools is declared via JSON:
|
1 2 3 4 5 6 |
{ "browser": { "enabled": true, "color": "#FF4500" } } |
The color parameter controls the title bar color of the browser window - this is a design detail that allows users to quickly distinguish it from their daily browser through color when the Agent-controlled browser window appears on the screen.
When building a Docker image, you can pass --build-arg OPENCLAW_INSTALL_BROWSER=1 to pre-install Chromium and Xvfb (X Virtual Frame Buffer), which increases the image size by about 300MB, but saves the 60-90 seconds of Playwright installation time each time the container is started. This is especially important for CI/CD scenarios.
OpenClaw's First-class Tools are a direct extension of the platform's core capabilities, different from third-party tools that are accessed through Skills or MCP. First-class tools are integrated directly into the Gateway and Agent runtimes, with full security policy and sandbox support:
| Tools | Ability | Documentation |
| Browser | Exclusive Chromium control, CDP snapshot, structured operations, file upload | docs.openclaw.ai/tools/browser |
| Canvas | A2UI driven visual workspace (push/reset/eval/snapshot) | docs.openclaw.ai/platforms/mac/canvas |
| Nodes | Device-side operations: camera snap/clip, screen record, location.get, notifications | docs.openclaw.ai/nodes |
| Cron | Scheduling of scheduled tasks and automatic triggering | docs.openclaw.ai/automation/cron-jobs |
| Sessions | sessions_list / sessions_history / sessions_send (inter-Agent communication) | docs.openclaw.ai/concepts/session-tool |
| Webhooks | Receive external HTTP callbacks and trigger Agent processing | docs.openclaw.ai/automation/webhook |
| Gmail Pub/Sub | Gmail email arrival event driven | docs.openclaw.ai/automation/gmail-pubsub |
| Discord/Slack Actions | Platform native interaction (slash commands, buttons, drop-down menus) | Channel document embedded |
In sandbox mode, the availability of tools is strictly restricted. In the Docker sandbox of non-main sessions, tools that are allowed by default include bash, process, read, write, edit, and sessions series; tools that are disallowed by default include browser, canvas, nodes, cron, discord, and gateway. This double-layer control of whitelist + blacklist ensures safe isolation in multi-tenant scenarios.
OpenClaw's security model covers the complete link from message entry to execution environment. This chapter starts with the access control of DM (Direct Message) pairing, passes through the execution boundary of sandbox isolation, and ends with security infrastructure and credential management, systematically dismantling the security architecture of OpenClaw.
OpenClaw's default DM security policy (dmPolicy) is set to "pairing". In this mode, any DM session initiated by an unknown sender will receive a pairing code (Pairing Code), and the user needs to confirm it through the CLI on the server side to establish a trust relationship.
Comparison of three DM strategy modes:
| Mode | Behavior | Security level |
| pairing (default) | Unknown sender receives pairing code, administrator approval is required | High |
| allowlist | Only users in the whitelist can initiate DM | High |
| open | Accept all DMs (allowFrom: "*" needs to be configured at the same time) | Low |
Pair approval is done via CLI command:
|
1 |
openclaw pairing approve <code> |
The allow list for each channel is configured independently via the allowFrom field. For example, channels.telegram.allowFrom and channels.discord.allowFrom control the access lists of Telegram and Discord channels respectively.
Public DM access requires two explicit authorizations: dmPolicy="open" and the "*" wildcard in the allowFrom array. Setting up just one of these does not open up public access—this is an intentional double-gating design to prevent configuration mistakes that could lead to accidental exposure.
openclaw doctor will proactively detect and alert for risky or incorrect DM policy configurations, including but not limited to: missing allowFrom wildcards in open mode, empty whitelist in allowlist mode and other anomalies.
The legacy configuration key name channels.discord.dm.policy has been migrated to channels.discord.dmPolicy. Old formats are still recognized in the current version, but will trigger deprecation warnings.
OpenClaw's sandbox policy is configured through agents.defaults.sandbox.mode. The recommended default value is "non-main", which means that non-main sessions (group sessions, channel sessions, etc.) automatically enter a sandbox isolation environment.
This design is based on OpenClaw's Single-user design assumption: the operator of the main session (Main Session) is the owner of the service and has full host access rights, and the tool is executed directly on the host. Instead of the main session coming from an external user, each session executes in a separate Docker sandbox container, completely isolated from each other and the host machine.
The availability of tools in the sandbox is controlled by both whitelist and blacklist:
| Category | Tool list |
| Sandbox whitelist (allowed use) | bash, process, read, write, edit, sessions_list, sessions_history, sessions_send, sessions_spawn |
| Sandbox blacklist (use prohibited) | browser, canvas, nodes, cron, discord, gateway |
Tools in the blacklist that are called within a sandbox session will return an explicit permission denial error instead of being silently ignored.
OpenClaw's security policy documents live in the separate repository openclaw/trust (35 stars) and are published at trust.openclaw.ai. That repository contains the full Threat Model documentation. Security vulnerability reports are received at security@openclaw.ai.
OpenClaw Plugin SDK exports the ssrf-runtime module for use by the plugin when making network requests. This module verifies the target address and blocks access to the intranet address (RFC 1918), loopback address, link local address, and cloud metadata endpoint, thereby preventing SSRF (Server-Side Request Forgery) attacks. All plugin network calls should be routed through this module, rather than using the fetch or http modules directly.
OpenClaw officially declares Prompt Injection as Out of Scope — it is not considered a security vulnerability. This position is based on practical considerations: there is no reliable defense against prompt injection under the current LLM architecture, and including it in the vulnerability scope will only create a false security promise. Accordingly, canvas.eval and browser script execution are classified as Operator control planes, and the security boundary is defined by the deployer.
The security control of the plugin installation process has undergone significant strengthening between v2026.3.28 and v2026.3.31.
The before_install hook in the installation process provides an integration point for security scanners. Any external security scanning tool can be registered as a before_install handler to check the plugin code before it is shipped.
v2026.3.31 breaking changes: The built-in dangerous code detector now implements the Fail Closed (Fail Closed) policy by default for "critical" level findings. Previously, critical-level findings only generated warnings, which administrators could choose to ignore. Under the new policy, findings marked critical will directly prevent installation. To force installation of a marked plugin, an explicit override parameter must be used:
|
1 |
openclaw plugin install --dangerously-force-unsafe-install |
The verbosity of this parameter name is intentional—to make each use deliberate enough to avoid misoperation. Both Skills installations and Plugins installations are subject to the same scanning gating.
v2026.3.31 has made several tightenings on Gateway’s authentication mechanism:
trusted-proxy mode Rejects Mixed Shared-token Configs. If multiple services are detected sharing the same authentication token, Gateway will refuse to start and report a configuration conflict.
local-direct fallback mode now requires explicit configuration of the token. Previously, connections on the same host could be implicitly authenticated (Implicit Same-host Auth), which was risky in multi-tenant deployment scenarios. The new version removes this implicit trust and all connections must provide a valid token.
Node Pairing Approval becomes a mandatory prerequisite - node commands are not exposed until pairing approval is completed. Node-originated Runs are restricted to a reduced trusted execution surface.
OpenClaw's credentials are stored in the ~/.openclaw/credentials/ directory. Credential refresh for the web service provider re-executes the OAuth process via the openclaw login command.
Key references in the Provider plugin use SecretRef semantics - only the reference identifier of the key is stored in the configuration file rather than the clear text value, which is resolved by the credential manager to the actual key at runtime. This design ensures that configuration files can be safely brought into version control.
Basic rules about content security: Never submit real phone numbers, video files, or production environment configuration values to the code repository.
OpenClaw's build and test infrastructure epitomizes its engineering discipline. This chapter breaks down the selection of the build tool chain, the classification of 198 npm scripts, the architectural design of the test infrastructure, and the implementation details of code quality gating.
OpenClaw's build tool selection deliberately avoids the mainstream webpack/rollup/esbuild family bucket model, and instead adopts a more focused tool combination:
| Tools | Version | Responsibilities |
| tsdown | 0.21.7 | Packager (bundler), driven by scripts/tsdown-build.mjs |
| TypeScript | 6.0.2 | Type checking |
| @typescript/native-preview | 7.0.0-dev.20260331.1 | Preview version of TypeScript compiler implemented in Go (pnpm tsgo) |
| oxfmt | 0.43.0 | Code formatting (replaces Prettier) |
| oxlint + oxlint-tsgolint | 1.58.0 / 0.18.1 | Code inspection (replaces ESLint) |
| Bun | - | TypeScript executor during development/testing |
| Node 22+ | - | Production runtime (maintain Node + Bun dual path compatibility) |
| tsx | 4.21.0 | TypeScript execution based on Node |
| jiti | 2.6.1 | Runtime ESM resolution (plugin-sdk alias resolution) |
A few key points in model selection are worth discussing:
tsdown instead of using esbuild directly: tsdown provides a higher-level packaging abstraction on top of esbuild, and its configuration file is more concise than writing an esbuild plugin directly. The build entry point is scripts/tsdown-build.mjs.
@typescript/native-preview: This is the official Go language rewrite experimental version of TypeScript, called through pnpm tsgo. Its type checking is an order of magnitude faster than the standard TypeScript compiler, and OpenClaw uses it for a fast type checking path in CI.
oxfmt / oxlint: A Rust-based formatting and checking toolchain that replaces the traditional Prettier + ESLint combination. The formatting commands are pnpm format (check) and pnpm format:fix (automatic fix), and the check command is pnpm lint.
Bun + Node dual runtime: Use Bun for faster startup (bun, bunx) during development and testing phases, and use Node 22+ for production deployment to ensure compatibility. Both paths must remain available simultaneously.
Full build pipeline:
|
1 2 3 4 5 |
pnpm build # expands to: # 1. pnpm canvas:a2ui:bundle → A2UI bundle build # 2. scripts/tsdown-build.mjs → Main package # 3. runtime-postbuild.mjs → Runtime post-processing |
Three build variants serve different scenarios:
| Variations | Command | Purpose |
| Full build | pnpm build | Contains A2UI bundle + body + post-processing |
| Docker build | pnpm build:docker | Skip A2UI bundle (may fail under QEMU) |
| Strict smoke test | pnpm build:strict-smoke | Quickly verify the basic usability of the built product |
OpenClaw's package.json contains 198 npm scripts. This number is not inflated—it reflects the density of automation instructions required for an engineering system that covers multiple platforms, multiple channels, and multiple plugins. The following is classified according to responsibilities:
build, build:docker, build:plugin-sdk:dts, build:strict-smoke — core build pipeline and its variants, plus type declaration generation for the Plugin SDK.
Using pnpm check as the meta-check entry, arrange tsgo, lint, format, format:check and about 20 specific rule check scripts. See CI architecture analysis in Section 17.6 for details.
The test scripts are the largest group: test, test:fast, test:watch, test:coverage, test:e2e, test:live, test:gateway, test:channels, test:extensions, test:contracts, and a series of test:docker:* and test:parallels:* scripts.
release:check, release:openclaw:npm:check, release:plugins:npm:check - version number, changelog, npm registry consistency check before release.
android:*, ios:*, ui:* — quick entry to build, test, and lint each platform.
docs:check-links (dead link detection), docs:spellcheck (spell check), docs:check-i18n-glossary (international glossary consistency).
protocol:check (protocol definition consistency check), protocol:gen (generate TypeScript types), protocol:gen:swift (generate Swift types).
OpenClaw uses Vitest 4.1.2 as the testing framework, and works with @vitest/coverage-v8 to collect code coverage at the V8 engine level. The coverage threshold is uniformly set to 70%, covering the four dimensions of lines, branches, functions, and statements.
A key mandatory rule: Vitest's concurrency mode only allows the use of forks pools. The threads, vmThreads, and vmForks modes are explicitly disabled. This limitation stems from the large number of process-level side effects involved in OpenClaw testing (subprocess creation, file system operations, network port occupation, etc.), and thread-level isolation cannot provide sufficient isolation guarantees.
Parallel test orchestration is driven by the test-parallel.mjs script and provides three execution configurations:
| Configuration | Parallelism | Purpose |
| default | 50% of CPU cores | Daily development, balancing speed and system responsiveness |
| serial | 1 | Debug failed use cases and eliminate concurrency interference |
| max | 100% of CPU cores | CI environment, maximize throughput |
The test system covers multiple levels, and each level focuses on verification requirements at different granularities:
Unit testing and integration testing: pnpm test (full run), pnpm test:fast (excluding slow use cases), pnpm test:watch (file monitoring mode), pnpm test:coverage (with coverage report).
Domain testing: test:channels (channel integration), test:extensions (extension interface), test:gateway (Gateway protocol), test:e2e (end-to-end process), test:live (real API docking).
Contract Tests (Contract Tests): test:contracts:channels and test:contracts:plugins enforce interface contract consistency of channels and plugins respectively. Contract testing ensures that Channel adapters and Plugins follow their declared interface protocols, preventing implementation drift.
Docker E2E Test (8+ scenarios): Perform end-to-end validation in a complete Docker containerized environment. Covered scenarios include:
| Scene | Validation range |
| onboard | First time boot process |
| plugins | Plugin installation, loading and execution |
| MCP channels | MCP protocol channel connectivity |
| gateway network | Gateway network topology and routing |
| OpenWebUI | OpenWebUI integration |
| doctor-switch | doctor diagnosis and configuration switching |
| qr-import | QR code configuration import |
| live models | Real model endpoint docking |
Parallels Smoke Test: Perform smoke tests on three virtual machine clients of macOS, Windows, and Linux to verify the availability of basic functions across operating systems.
Performance Test Suite:
| Script | Measurement dimensions |
| test:perf:budget | Performance budget check (running time/memory cap) |
| test:perf:hotspots | Hotspot function profiling |
| test:perf:imports | Module import time-consuming analysis |
| test:startup:bench | Startup time baseline |
| test:startup:memory | Startup memory usage |
Live test: Enable by setting the environment variable OPENCLAW_LIVE_TEST=1 and execute pnpm test:live. These tests call external services using real API keys and therefore are not run in regular CI but are executed periodically in a dedicated live test environment.
OpenClaw has a very high density of gate control measures for code quality, which are detailed below:
File line limit: The check:loc script enforces a file line limit of approximately 500-700 lines. Files exceeding the upper limit will be marked as requiring splitting. This is a soft but enforced coding guideline with the goal of preventing huge files that are difficult to maintain.
Strict type discipline: prohibit the use of @ts-nocheck, avoid using any type, prefer unknown. Prefer using zod@4.3.6 for runtime schema verification at external boundaries (configuration files, webhook payloads, CLI output, API responses).
Dynamic import protection: The build system will detect the presence of both static and dynamic imports in the same module and issue an INEFFECTIVE_DYNAMIC_IMPORT warning. This mixed mode will cause tree-shaking to fail - the module has been packaged through static import, and dynamic import will not bring additional on-demand loading benefits, but will increase the complexity of code understanding.
Duplicate code detection: Use jscpd@4.0.8 to scan the src/, extensions/, test/, scripts/ directories for code duplication. Duplicate blocks that exceed the threshold trigger a CI failure.
Drift detection: A series of check scripts monitor the consistency between various definitions and implementations:
| Check | Detect content |
| canon:check | Standard code style consistency |
| plugin-sdk:api:check | Plugin SDK public API drift detection |
| config:docs:check | Configure schema consistency with documents |
| lint:plugins:plugin-sdk-subpaths-exported | Plugin SDK sub-path export integrity |
In addition, there are more than 8 boundary lint rules for specific extensions (Extension) to ensure that each extension module does not cross-border access to the internal API of other extensions.
OpenClaw's CI adopts Two-tier Check System to separate local development gating and CI gating:
pnpm check is a local check that must be passed before each commit. The execution order is:
|
1 2 3 4 5 6 |
# Execution sequence of pnpm check: # 1. no-conflict-markers → Detect unresolved merge conflict markers # 2. host-env-policy:swift → Verify Swift host environment policy # 3. tsgo → Go version of TypeScript type checking # 4. lint → oxlint code inspection # 5. format → oxfmt format verification |
The pipeline is executed serially. Failure in any step will terminate subsequent steps and report the error location.
check-additional is additionally executed in the CI environment, including architectural policy and boundary policy guards. These checks are intentionally left out of the local development loop - they are generally slower and rely on CI-specific environments (such as full git history, diff information for all branches, etc.), and putting them in the local loop can seriously slow down development.
Pre-commit hooks are managed by the prek tool, and the hook's default behavior is to execute the full pnpm check pipeline.
For scenarios that require fast iteration, the environment variable FAST_COMMIT=1 can skip the format and check steps:
|
1 2 |
# Skip formatting and checking (used when manually ensuring code quality) FAST_COMMIT=1 git commit -m "wip: experimental changes" |
Using FAST_COMMIT means that the developer is solely responsible for code quality - CI will still perform a complete check and commits that do not meet the requirements will be blocked at the CI stage.
The entry threshold (Landing Bar) for code integration into the main branch (main) is:
|
1 2 3 4 |
# Main branch access three-piece set: pnpm check # type + lint + format pnpm test # Full test pnpm build # Build verification (when the change involves the build impact area) |
The third item, pnpm build, is conditional: only required if the change involves build-affecting surfaces. The definition of build impact areas includes but is not limited to: tsdown-build.mjs configuration changes, package.json dependency changes, tsconfig.json changes, new or deleted module exports, etc. Purely logical changes (function implementation adjustments, bug fixes, etc.) do not trigger build requirements, thus balancing CI speed and security.
OpenClaw's deployment strategy covers all scenarios from single developers to enterprise teams. Deployment methods are arranged in increasing complexity: npm global installation, Docker containerization, Ansible orchestration, Nix declarative configuration, Windows system tray. Each method corresponds to a different operation philosophy and security model.
For most developers, npm global installation is the fastest way to enter OpenClaw:
|
1 2 |
npm install -g openclaw@latest openclaw onboard --install-daemon |
The first command installs the OpenClaw CLI and all its dependencies into the global node_modules directory of Node.js. The second command starts the interactive boot wizard (Onboarding Wizard). The --install-daemon flag instructs the wizard to automatically register the system daemon (Daemon) after the process is completed.
How daemons are registered varies by operating system. On macOS, OpenClaw generates a launchd plist file and registers it as a user-level Launch Agent via launchctl load. The key configuration of plist is as follows:
|
1 2 3 4 5 6 7 8 9 10 |
Label ai.openclaw.gateway ProgramArguments /usr/local/bin/node /usr/local/lib/node_modules/openclaw/dist/gateway.js RunAtLoad KeepAlive |
On Linux, OpenClaw uses systemd user service (user service), writes the unit file to ~/.config/systemd/user/openclaw-gateway.service, and starts it through systemctl --user enable --now openclaw-gateway. The key to this choice is "user level" - no root permissions are required, the service life cycle is bound to the user session, and the principle of least privilege is followed. Together with loginctl enable-linger, it can keep running even if the user is not logged in.
Docker deployment is OpenClaw's preferred solution for production and isolation scenarios. Its Dockerfile adopts the Multi-Stage Build mode, which is divided into four stages. Each stage is carefully designed to minimize the final image size.
The only responsibility of the first stage is to extract all package.json files from the extensions/ directory tree while preserving their directory structure. This is a pure file copy phase, no installation operations are performed:
|
1 2 3 4 5 |
FROM node:24-bookworm AS ext-deps WORKDIR/app COPY extensions/ extensions/ RUN find extensions -name "package.json" -exec sh -c \ 'mkdir -p /out/$(dirname {}) && cp {} /out/{}' \; |
The purpose of this separation is to take advantage of Docker's layer caching mechanism - only when the extension's package.json changes, subsequent dependency installation layers will become invalid.
The build phase uses Bun as the JavaScript runtime to accelerate dependency installation, while using pnpm as the package manager, and tsdown as the TypeScript compilation tool:
|
1 2 3 4 5 6 7 8 |
FROM oven/bun:1 AS build WORKDIR/app COPY --from=ext-deps /out/extensions ./extensions COPY package.json pnpm-lock.yaml pnpm-workspace.yaml ./ RUN pnpm install --frozen-lockfile COPY . . RUN pnpm run build RUN cd ui && pnpm run build |
The build product includes two parts: TypeScript compilation output in the dist/ directory (generated through tsdown), and Web Control UI static resources in the ui/dist/ directory (built through Vite).
The third stage is the most critical volume optimization link in the entire build pipeline:
|
1 2 3 4 5 |
FROM build AS runtime-assets RUN pnpm prune --prod RUN find . -name "*.d.ts" -delete \ && find . -name "*.map" -delete \ && find . -name "*.ts" ! -name "*.d.ts" -path "*/src/*" -delete |
First remove all devDependencies through pnpm prune --prod, and then clear the TypeScript declaration files (.d.ts), Source Map files (.map) and source code files one by one. Ultimately only JavaScript runtime code and production-grade dependencies remain.
The final stage is based on streamlined Node 24 images, and all base images use Pinned SHA256 Digest to ensure reproducible builds:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Default variant FROM node:24-bookworm@sha256:abc123... AS runtime # Simplified variant # FROM node:24-bookworm-slim@sha256:def456... AS runtime USER node WORKDIR /home/node/app COPY --from=runtime-assets --chown=node:node /app ./ HEALTHCHECK --interval=3m --timeout=10s --start-period=30s \ CMD curl -f http://localhost:18789/healthz || exit 1 EXPOSE 18789 18790 CMD ["node", "dist/gateway.js"] |
OpenClaw provides two image variants (Variant), selected through the build parameter OPENCLAW_VARIANT:
| Variations | Basic image | Features | Applicable scenarios |
| default | node:24-bookworm | Contains complete Debian tool chain and supports browser installation | Requires Playwright/browser channel |
| slim | node:24-bookworm-slim | Minimize the system library and reduce the image size by about 40% | Pure CLI/API scenario |
The rest of the build parameters (Build Args) include:
| Parameter name | Default value | Description |
| OPENCLAW_EXTENSIONS | all | Control which extensions are built, comma separated or all |
| OPENCLAW_INSTALL_BROWSER | false | Whether Chromium (Playwright) is pre-installed in the image |
| OPENCLAW_INSTALL_DOCKER_CLI | false | Whether to install Docker CLI (for sandbox function) |
In terms of security, the final image runs as USER node (uid 1000), eliminating root permissions. Health Check configures two endpoints: /healthz is used for Liveness Probe, /readyz is used for Readiness Probe, the detection interval is 3 minutes, the timeout is 10 seconds, and the startup grace period is 30 seconds.
The official docker-compose.yml defines two services, reflecting OpenClaw's gateway-client separation architecture:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
version: "3.9" services: openclaw-gateway: image: ghcr.io/openclaw/openclaw:latest ports: - "18789:18789" - "18790:18790" volumes: - openclaw-data:/home/node/.openclaw -/var/run/docker.sock:/var/run/docker.sock security_opt: - no-new-privileges:true cap_drop: - NET_RAW - NET_ADMIN restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:18789/healthz"] interval: 3m timeout: 10s start_period: 30s openclaw-cli: image: ghcr.io/openclaw/openclaw:latest command: ["node", "dist/cli.js"] depends_on: openclaw-gateway: condition: service_healthy environment: - OPENCLAW_GATEWAY_URL=http://openclaw-gateway:18789 volumes: openclaw-data: |
Security Hardening is reflected in three aspects: no-new-privileges prevents processes from elevating privileges through setuid/setgid; cap_drop discards NET_RAW and NET_ADMIN capabilities (Capability) to prevent original socket operations and network configuration tampering; Docker Socket mounting (/var/run/docker.sock) provides sandbox integrationDocker-in-Docker (DinD) capability, allowing Agents to execute commands in isolated containers.
In terms of port mapping, 18789 is the main HTTP port of Gateway, which carries REST API, WebSocket connection and Web Control UI; 18790 is the Bridge port for external channel plugins to bridge to Gateway through gRPC or WebSocket.
openclaw/openclaw-ansible (545 stars) provides a complete set of Ansible Playbooks that packages OpenClaw's Docker deployment into reproducible infrastructure code (Infrastructure as Code). Its core features include:
Tailscale VPN integration: Playbook integrates Tailscale by default, and Gateway is bound to the Tailscale virtual network interface instead of the public network interface. This means that OpenClaw instances are only reachable within Tailnet without exposing any public network ports, fundamentally eliminating the risk of unauthorized access.
UFW firewall configuration: Automatically configure UFW (Uncomplicated Firewall) rules, only allow SSH (22) and Tailscale required ports (41641/UDP), and discard all other inbound traffic.
Docker Isolation: OpenClaw runs in an independent Docker network, and the data volume is mapped to the specified path of the host. It supports customizing mount points, environment variables and resource limits through Ansible variables.
openclaw/nix-openclaw (611 stars) provides declarative configuration in the form of Nix Flake. For NixOS users or developers using home-manager, this is the deployment method that best fits their workflow:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
{ inputs.openclaw.url = "github:openclaw/nix-openclaw"; outputs = { self, nixpkgs, openclaw }: { nixosConfigurations.myhost = nixpkgs.lib.nixosSystem { modules = [ openclaw.nixosModules.default { services.openclaw = { enable = true; gateway.port = 18789; extensions = [ "discord" "telegram" "whatsapp" ]; }; } ]; }; }; } |
The advantage of Nix deployment is complete reproducibility - the same Flake input will inevitably produce the same system configuration, eliminating the environmental difference problem of "can it run on my machine".
openclaw/openclaw-windows-node (405 stars) provides a native integrated experience for Windows. Its core components include:
System Tray Companion: A lightweight .NET application that resides in the Windows system tray. It manages the lifecycle of the OpenClaw Gateway process (start/stop/restart), displays real-time status, and provides shortcut menu access to the Web Control UI and logs.
PowerToys Command Palette Extension (Command Palette Extension): Integrate the command palette (Run plugin) of Microsoft PowerToys. Users can directly send commands to OpenClaw Agent through the Alt+Space shortcut key without switching to a browser or terminal window.
Securely exposing services is a core issue when deploying Gateway on remote Linux servers. OpenClaw offers three modes:
Tailscale Serve (Tailnet internal access): Map the Gateway port to Tailscale's HTTPS proxy through tailscale serve, which is accessible only to devices within Tailnet. Cooperate with Gateway's --tailscale serve mode to automatically complete certificate configuration and port mapping.
Tailscale Funnel (public network HTTPS exposure): tailscale funnel mode exposes services to the public network through Tailscale's global edge network and automatically obtains HTTPS certificates. Suitable for scenarios that require external webhook callbacks (such as Telegram Bot).
SSH Tunnel: The most traditional but flexible way to map a remote Gateway to a local one through SSH port forwarding:
|
1 |
ssh -L 18789:localhost:18789 -L 18790:localhost:18790 user@remote-host |
Gateway's binding mode (Bind Mode) is controlled by the --bind parameter: loopback (default) only listens to 127.0.0.1, and lan listens to 0.0.0.0 to accept LAN connections. Tailscale mode is set through the --tailscale parameter: off (default), serve, funnel.
openclaw onboard is the interactive initialization command of OpenClaw, guiding the user through all configurations from scratch to usable. The process is performed step by step:
Step 1: Gateway configuration—Select the binding mode (loopback/lan), port, and Tailscale mode. If it is detected that a Gateway instance is already running, it will ask whether to reuse it.
Step 2: Workspace configuration—Create the default workspace directory (~/.openclaw/workspace), configure the LLM provider key (OpenAI API Key, Anthropic API Key, etc.), and set the default model.
Step 3: Channel configuration—Enable the channel (Discord Bot Token, Telegram Bot Token, WhatsApp mobile number, etc.) according to the user's choice and verify the validity of the credentials.
Step 4: Skills configuration - It is recommended to install popular Skills and prompt users to browse ClawHub to discover more.
When the --install-daemon flag is included, the system daemon process will be automatically registered and started after the wizard ends. If a problem occurs, the openclaw doctor command performs a comprehensive health check: verifies the Node.js version, checks port occupancy, tests LLM connections, verifies configuration file syntax, and outputs diagnostic reports and repair suggestions.
ClawHub (openclaw/clawhub, 7,214 stars) is OpenClaw's official Skill registration center and distribution platform. It's positioned like npm is to Node.js, or crates.io is to Rust—a centralized package registry, but distributing AI Agent capability modules.
Installing Skill only requires one command:
|
1 2 |
clawhub install weather-forecast clawhub install code-review --version 2.1.0 |
ClawHub's Agent integration capability is its core differentiating feature: when the Agent encounters a capability that is needed in a conversation but is not currently installed, it can automatically search ClawHub and pull the installation after user confirmation. The implementation path of this process is that the Agent calls the built-in clawhub_search tool function, which sends a request to api.clawhub.com/v1/search and returns the matching Skill list and its security rating.
Clawhub.com on the Web provides a visual Marketplace interface that supports browsing by category, searching by keyword, viewing installation statistics and community ratings. Each Skill page displays the rendered content, dependency graph, and version history of its SKILL.md.
openclaw/skills (3,622 stars) is the version archive repository for all Skills, saving a complete snapshot of each version of each Skill. This design ensures that even if a skill author deletes a version, deployed instances can still be pulled from the archive.
VoltAgent/awesome-openclaw-skills (43,292 stars) is a community-maintained curated list of over 5,400 community-verified Skills. Its Star number reflects the activity of the OpenClaw Skill ecosystem - the Star number of an "awesome list" is usually a bellwether of its ecological scale.
OpenClaw’s Skills are divided into three levels based on distribution methods:
| Level | Storage location | Installation method | Update strategy |
| Built-in Skills (Bundled) | skills/ directory, distributed with npm package | Auto-include | Updated with OpenClaw version |
| Managed Skills (Managed) | ~/.openclaw/managed/skills/ | clawhub install | clawhub update |
| Workspace Skills(Workspace) | ~/.openclaw/workspace/skills// | Manually created | User self-management |
Loading priority increases from top to bottom - Workspace Skills can override hosted or built-in Skills of the same name, providing maximum customization flexibility.
Each Skill is defined by a SKILL.md file, which is the core vehicle for the AgentSkills specification. SKILL.md is in Markdown format with special YAML Front Matter:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
--- name: code-review version: 2.1.0 description: Automated code review with multi-language support triggers: - pattern: "review {file_path}" - pattern: "check code quality" permissions: - filesystem:read -git:read before_install: scripts/check-deps.sh --- #Code Review Skill ##Instructions You are a senior code reviewer. When triggered, analyze the provided file for bugs, style issues, and potential improvements. ## Tools ### review_file Analyze a single file and return findings. - `file_path` (string, required): Path to the file to review - `severity` (string, optional): Minimum severity to report (info|warn|error) |
The before_install field specifies a Security Hook script to be executed during the installation phase. This script is used to verify system dependencies (e.g. check Python version, confirm that specific binaries exist) and prevent installation with a non-zero exit code. The security significance of this mechanism is that it provides Skill authors with a declarative pre-checkpoint to prevent incompatible Skills from being installed in an environment that does not have the running conditions.
The OpenClaw organization maintains a series of sub-repositories with different functions:
| Warehouse | Stars | Positioning |
| openclaw/acpx | 1,834 | Headless ACP CLI (Agent Control Protocol command line client) |
| openclaw/lobster | 992 | Workflow Shell (Workflow Shell), interactive task orchestration |
| openclaw/nix-openclaw | 611 | Nix Flake declarative deployment |
| openclaw/openclaw-ansible | 545 | Ansible Playbook automated deployment |
| openclaw/openclaw-windows-node | 405 | Windows System Tray + PowerToys Integration |
| openclaw/openclaw.ai | 250 | Official website source code |
| openclaw/community | 92 | Community Governance Documents and Discord Management Strategies |
| openclaw/trust | 35 | Security policy, vulnerability disclosure process, audit report |
| openclaw/caclawphony | 34 | Symphony autonomous running framework, long-term unattended Agent tasks |
OpenClaw does not exist in isolation. The following projects compete or complement it in different dimensions:
HKUDS/nanobot (37,216 stars) is positioned as an "ultra-lightweight OpenClaw alternative", removing the complexity of multi-channel and plugin systems and focusing on extremely fast response in a single CLI scenario. Its core selling points are sub-second cold start and extremely low memory usage, making it suitable for resource-constrained edge devices.
chatgpt-on-wechat/CowAgent (42,673 stars) takes the WeChat ecosystem as the core battlefield and provides in-depth integration of corporate WeChat, public accounts, mini programs, etc. In the Chinese market, WeChat’s penetration rate makes it a natural entry point for AI Agents, and CowAgent’s coverage in this vertical field exceeds OpenClaw’s.
AstrBot (28,373 stars) is an IM chatbot framework that supports mainstream domestic instant messaging platforms such as QQ, Feishu, and DingTalk. Different from OpenClaw's platform independence, AstrBot chooses to delve deeply into the domestic ecology and provide API designs that are more in line with the habits of domestic developers.
OpenClaw's documentation is built on Mintlify and deployed at docs.openclaw.ai. The Chinese localized version is located at docs.openclaw.ai/zh-CN.
The technical implementation of the internationalization (i18n) pipeline deserves attention: the translation work is driven by the scripts/docs-i18n script, which reads glossary.zh-CN.json as the glossary (Glossary) to ensure consistent translation of proper nouns (for example, "Gateway" is always translated as "Gateway", "Skill" remains in English). Translation Memory is stored in the zh-CN.tm.jsonl file, using the JSON Lines format, with each line pairing source text and translation. The role of this file is similar to the TMX file in traditional localization tools - when the source text changes, the i18n script first looks for an exact match or fuzzy match in the translation memory to avoid duplicating the translation of existing content.
The main forum for community communication is Discord (discord.gg/clawd), which is also the main channel for obtaining real-time development progress and direct communication with maintainers.
The following table compares OpenClaw with current mainstream AI Agent frameworks from multiple dimensions:
| Dimensions | OpenClaw | Manus | AutoGen | LangChain | OpenHands |
| Agreement | MIT | Closed Source SaaS | MIT (CC-BY-4.0 docs) | MIT | MIT |
| Deployment method | Local first + Docker + Ansible | Pure Cloud | Local/Cloud | Local/Cloud | Docker container |
| Main language | TypeScript | Unpublished | Python | Python | Python |
| Multiple channels | Discord/Telegram/WhatsApp/Slack/Web/SMS, etc. 15+ | Web only entrance | No native channel | No native channel | Web UI |
| Voice support | Native Realtime API | Yes | None | Community extension | None |
| Plugin system | Skills + Plugin SDK + ClawHub | Built-in Tools | Tool Registration | Tools/chain/agent | Sandbox tools |
| Memory system | SQLite + vector + knowledge graph | Cloud conversation history | Memory status | Multiple Memory Types | Conversation History |
| Security Model | Three-layer sandbox + permission DSL + audit log | Platform hosting | No built-in sandbox | No built-in sandbox | Docker Sandbox |
| Community size | 342K stars, 20K+ commits | N/A (closed source) | 42K stars | 105K stars | 55K stars |
OpenClaw’s three core differentiating pillars can be extracted from the comparison:
Local-First: All data is stored on the user's device by default. The Gateway process runs locally or on a user-controlled server, and LLM API calls are issued directly from the user device without going through any third-party relay server. This design is particularly critical for enterprise users who are subject to data compliance requirements - regulations such as GDPR and HIPAA have strict restrictions on data leaving the country, and a local-first architecture naturally meets these requirements.
Multi-Channel Native support (Multi-Channel Native): Instead of packaging a single interface into multiple channels through an adapter (Adapter), the channel (Channel) is treated as a first-class citizen (First-Class Citizen) at the architectural level. Each channel has an independent message formatter (Formatter), permission model and user identity mapping.
MIT fully open: There is no Enterprise version and no closed source components that retain core functions. All features are completely free for all users. This strategy relies heavily on sponsors and community contributions in terms of business model, but is extremely effective in building trust.
Why are OpenAI, NVIDIA, and Vercel sponsoring a local-first "contender"? There is a clear strategic logic behind this seemingly contradictory sponsorship relationship:
OpenAI: Every Agent call from OpenClaw consumes OpenAI’s API Token. Local first ≠ no cloud model, quite the opposite - OpenClaw is the super distribution channel for OpenAI APIs. Every OpenClaw user is a potential API paying user and uses it much more frequently than the average ChatGPT user. Sponsored OpenClaw is an Ecosystem Lock-in strategy: when developers get used to the OpenClaw + GPT-4 workflow, the cost of switching to other models will increase significantly.
NVIDIA: Local Inference is one of the long-term evolution directions of OpenClaw. When users started running open source LLM locally, demand for GPU computing power translated directly into hardware sales for NVIDIA. Sponsoring OpenClaw is cultivating market demand for Local Inference.
Vercel: OpenClaw’s Web Control UI, documentation site, and ClawHub market can all be deployed on the Vercel platform. Sponsoring open source projects is Vercel's standard move to expand the developer tool ecosystem, and is consistent with its logic of sponsoring Next.js and Turborepo.
Objectively assessed, OpenClaw faces the following structural challenges:
16,843 Open Issues: As of April 2026, the repository had more than 16,000 open issues. That reflects extremely high user participation, but it also means the maintenance team faces heavy triage pressure. If a large share of issues remain unanswered for too long, community trust will erode.
Node.js environment threshold: For non-JavaScript developers, installing and maintaining a Node.js environment can be a barrier in itself. The Python ecosystem's AutoGen and LangChain have a natural advantage in this regard - Python's installation and environment management are more friendly to data scientists and researchers.
Single maintainer risk (Bus Factor): steipete contributed 14,756 commits, accounting for about 73% of the total commits (20,000+). The gap between the second contributor (vincentkoc, 1,690 commits) is huge. This means that the project is highly dependent on a single person, with a Bus Factor close to 1 - if the core maintainer is unable to continue for any reason, the project's survival will be seriously threatened.
Radical refactoring rhythm: Almost every version contains breaking changes (Breaking Changes). Frequent changes to plugin APIs make community Skills expensive to maintain—a Skill can become obsolete within weeks due to upstream API changes. This tension between "rapid iteration" and "stable platform" is OpenClaw's current biggest architectural risk.
TypeScript native compiler: OpenClaw is tracking @typescript/native-preview 7.0.0-dev (a TypeScript native compiler implemented in Go). The compiler promises over 10x compilation speed improvements, which will have a significant impact on the OpenClaw development experience and CI/CD pipeline efficiency. There is already an experimental branch in the repository to adapt to the features of the new compiler.
Plugin SDK stabilization: The current plugin system has multiple legacy paths (Legacy Paths), including old version import methods that have been abandoned but have not yet been deleted. The core goal of SDK stabilization is to determine a long-term unchanged API surface (API Surface) and mark all old paths as deprecated and remove them in subsequent versions.
WeChat official integration: Cooperation with Tencent will bring official support for WeChat channels, instead of the current indirect integration that relies on third-party libraries. This is crucial for penetration in the Chinese market - WeChat has more than 1.3 billion monthly active users, and official channels mean a more stable API and lower risk of account suspension.
Enterprise adoption path: The combination of Ansible Playbooks + Docker containers + sandbox isolation has paved the way for enterprise deployments. The next step is to complete RBAC (role-based access control), audit log export (Audit Log Export) to the SIEM system, and SSO (single sign-on) integration.
OpenClaw represents a clear technical philosophy: AI Agents should run on user-controlled infrastructure, and data should not leave the user's trust boundary. This position seems to go against the current trend of "everything is in the cloud" industry, but it is this countercurrent that gives it unique value.
From an implementation standpoint, OpenClaw has already built channel abstraction, plugin isolation, a three-layer sandbox, vector-backed memory, and native apps across platforms inside a TypeScript monorepo. The maturity of the codebase and the architecture is far ahead of what its four-month age would suggest. That said, the concentration of work around a single maintainer, the pace of aggressive refactoring, and the growing issue backlog still create real risk.
For developers, OpenClaw is the most complete open source local-first AI Agent platform available. For enterprises, its combination of Docker + Ansible + sandbox provides an auditable, isolated, and reproducible deployment path. For the AI industry, it proves that "local first" is not a compromise, but an architectural paradigm that can compete head-on with cloud SaaS competitors in terms of functional completeness.
| Resources | Link |
| GitHub main repository | github.com/openclaw/openclaw |
| Official website | openclaw.ai |
| English documentation | docs.openclaw.ai |
| Chinese Document | docs.openclaw.ai/zh-CN |
| Discord Community | discord.gg/clawd |
| ClawHub Marketplace | clawhub.com |
| ClawHub source code | github.com/openclaw/clawhub |
| Star Growth Curve | star-history.com/#openclaw/openclaw |
| DeepWiki Analysis | deepwiki.com/openclaw/openclaw |
| Security Trust Center | trust.openclaw.ai |
| Safe contact email | security@openclaw.ai |
| Warehouse | Stars | Description |
| openclaw/openclaw | 343,696 | Main repository: CLI, Gateway, Agent runtime, Plugin SDK |
| openclaw/clawhub | 7,214 | Official Skill Directory Platform |
| openclaw/skills | 3,622 | ClawHub archive of all Skill versions |
| openclaw/acpx | 1,834 | Headless ACP CLI: Stateful Agent Client Protocol Session |
| openclaw/lobster | 992 | Lobster Workflow Shell: Typed JSON Pipeline + Approval Gate |
| openclaw/nix-openclaw | 611 | Nix declarative packaging support |
| openclaw/openclaw-ansible | 545 | Ansible automated deployment (Tailscale + UFW + Docker) |
| openclaw/openclaw-windows-node | 405 | Windows System Tray + PowerToys Command Panel Extension |
| openclaw/openclaw.ai | 250 | Official website source code |
| openclaw/trust | 35 | Security trust strategy and threat model |
| openclaw/caclawphony | 34 | Symphony: Project tasks → Isolated and autonomous execution |
| Command | Purpose |
| openclaw onboard | Interactive guided installation (Gateway + Channel + Skills) |
| openclaw gateway run | Start the Gateway control plane |
| openclaw agent --message "..." | Send message to Agent |
| openclaw channels status --probe | Check the connection status of all channels |
| openclaw channels login | Channel login (such as WhatsApp QR code scanning) |
openclaw pairing approve |
Approve DM match request |
| openclaw doctor | Diagnosing configuration issues and security risks |
| openclaw config set | Modify configuration items |
| openclaw update --channel | Switch publishing channel and update |
| openclaw message send --to | Send message to specified target |
| openclaw gateway status | View Gateway running status |
| openclaw nodes list | List connected device nodes |
| clawhub install | Install Skill from ClawHub |
The following commands can be sent directly in conversations on WhatsApp, Telegram, Slack, Discord, Teams, WebChat and more:
| Command | Function |
| /status | View current session status (model + token usage + fee) |
| /new or /reset | Reset session |
| /compact | Compress session context (generate summary) |
| /think | Set thinking level: off|minimal|low|medium|high|xhigh |
| /verbose on|off | Control verbose output |
| /usage off|tokens|full | Display usage statistics after each reply |
| /restart | Restart Gateway (only owner is available in the group) |
| /activation mention|always | Group activation mode switching |
| /elevated on|off | Toggle elevated bash access |
| /approve | Approve pending tool execution or plugin operations |
| /acp spawn codex --bind here | Create ACP workspace in current session |
All technical details in this article are derived from the following primary data and do not rely on any secondary community information:
- GitHub API (api.github.com/repos/openclaw/openclaw): repository metadata, Stars/Forks/Issues statistics, contributor ranking, and release descriptions
- AGENTS.md (repository root directory, 35,263 bytes): architectural specifications, module boundaries, build guidelines, testing strategies, release guidelines
- package.json (repository root directory): 233 exports entries (including 230 plugin-sdk sub-paths), 47 runtime dependencies, 22 development dependencies, 198 npm scripts
- Dockerfile and docker-compose.yml: complete containerized build and deployment configuration
- README.md (GitHub API base64 decoding): official feature list, channel support, installation guide, security model
- GitHub Releases API (v2026.3.31, v2026.3.28 Release Notes): Breaking changes and new feature details
After completing openclaw onboard, the system will generate a minimum configuration file ~/.openclaw/openclaw.json. When configuring manually, the minimum available JSON is as follows:
|
1 2 3 4 5 |
{ "agent": { "model": "anthropic/claude-sonnet-4-6" } } |
You only need to specify a model to start the Gateway. The agent uses this model for all inference tasks. More complex configurations can declare multi-model failover, channel access, security policies, sandbox mode, and Skills:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
{ "agent": { "model": "openai/gpt-5.2", "fallbackModels": ["anthropic/claude-sonnet-4-6", "google/gemini-2.5-flash"], "thinkingLevel": "medium" }, "agents": { "defaults": { "workspace": "~/.openclaw/workspace", "sandbox": { "mode": "non-main" } } }, "channels": { "telegram": { "botToken": "123456:ABCDEF", "dmPolicy": "pairing", "allowFrom": [] }, "discord": { "token": "your-discord-bot-token", "dmPolicy": "pairing" }, "whatsapp": { "allowFrom": ["+1234567890"] } }, "browser": { "enabled": true }, "gateway": { "mode": "local", "auth": { "mode": "token" } } } |
See docs.openclaw.ai/gateway/configuration for complete configuration reference. Configuration files support JSON5 format (allowing comments and trailing commas), which is a design choice for human readability.
For Chinese user groups, WeChat access is a high-priority requirement. OpenClaw's WeChat support is implemented through the npm package @tencent-weixin/openclaw-weixin officially released by Tencent, based on the iLink Bot API. This is a landmark integration - marking Tencent’s official participation in the open source AI Agent ecosystem.
Installation and activation process:
|
1 2 3 4 5 |
# Install WeChat plugin openclaw plugins install "@tencent-weixin/openclaw-weixin" #Scan code to log in openclaw channels login --channel openclaw-weixin |
WeChat integration currently only supports Private Chat and does not support group chat. v2.x version requires OpenClaw >=2026.3.22. Users need to enable the "WeChat ClawBot plugin" in the WeChat client (Me → Settings → Plugins) - this feature is gradually released in grayscale by Tencent.
This WeChat access path implemented through the official npm package rather than reverse engineering avoids the risk of account bans faced by projects such as chatgpt-on-wechat, and has higher stability and compliance. However, the current limited functionality (only private chat) also reflects Tencent’s cautious attitude when opening up the WeChat ecosystem.
Leave a Reply