Operations
How to deploy, configure, fund, back up, and monitor a hosted Olbos instance. For the architecture behind these knobs, see Architecture.
What runs where
| Component | Runs on | Cost to operate |
|---|---|---|
| MCP server | the user's machine | zero — it's a client |
| SDK | inside the user's agent | zero — it's a library |
| CLI wizard | the user's machine | zero |
| API + engine watch loop | your infra | small: one container, shared volume |
| Solana RPC | a provider (Helius/Triton) | the main scaling cost |
| Dashboard | a static/Next host | negligible |
| Gas | the engine wallet | cents/day; scales with activity |
The expensive parts of running Olbos are RPC volume, audits, and keeping venue adapters correct as protocols evolve — not compute.
Deploy (Docker Compose)
The repo ships one image, two services (API + watch loop), sharing a /data
volume:
# 1. engine config + key
cp services/engine/mainnet.json.example deploy/data/engine.json
# edit engine.json: set key paths to /data/…, put the engine keypair in deploy/data/
# fund the engine wallet with a little SOL (gas + Swig/ATA rents)
# 2. up
docker compose -f deploy/docker-compose.yml up -d --buildThe compose defaults encode the production posture: x402 mainnet rail,
OLBOS_MOCK=0, Kamino + Marginfi enabled, CORS pinned to your origins,
trust-proxy on, alert webhook passthrough. The API is healthchecked via /healthz;
the watch loop depends on the API being healthy.
The dashboard is a Next.js app — deploy it to any static host (e.g.
app.olbos.tech) with NEXT_PUBLIC_OLBOS_API and NEXT_PUBLIC_OLBOS_REQUIRE_AUTH=1.
Configuration (environment)
API
| Var | Default | Purpose |
|---|---|---|
OLBOS_CONFIG | /data/engine.json | engine config (RPC, keys, event log, deposit mint) |
OLBOS_RAIL | dev | x402 for real payments |
X402_NETWORK | solana-devnet | solana for mainnet (auto-selects mainnet USDC) |
USDC_MINT | per-network | override the settlement mint |
FACILITATOR_URL | PayAI | the x402 facilitator |
OLBOS_REQUIRE_AUTH | off | 1 = hosted multi-tenant (wallet sign-in, per-owner scoping) |
OLBOS_MOCK | on | 0 disables the localnet mock venue |
OLBOS_KAMINO / OLBOS_MARGINFI | off | 1 enables each real venue |
OLBOS_*_RPC | config.rpc | per-venue RPC override |
OLBOS_CORS_ORIGINS | open | comma-separated allowed origins |
OLBOS_TRUST_PROXY | 0 | set behind a reverse proxy |
OLBOS_ALERT_WEBHOOK | — | ops alert sink |
Boot guards. The API refuses to start if the rail's settlement mint ≠ the engine's deposit mint, or if a mainnet rail is paired with a non-mainnet config. "Payment is the deposit" must never silently be a lie.
Watch loop
| Var | Default | Purpose |
|---|---|---|
OLBOS_WATCH_INTERVAL | 60000 (real venues) | ms between cycles |
OLBOS_MOVE_COST_USDC | 0.10 | per-move cost for the break-even gate — tune to real fees |
OLBOS_DATA_API | — | premium data feed (engine-as-buyer) |
RPC
Public mainnet RPC throttles (observed 429s during testing). A treasury
engine needs a paid RPC (Helius/Triton) with retry/failover. Set it in
engine.json and per-venue overrides as needed. This is not optional for
production.
Durability
The event log is node:sqlite on the shared volume — a single-writer, append-only
workload SQLite handles well. Operational requirements:
- Back up the volume (the
.dband engine keypair) on a schedule. The event log is the engine's state; losing it loses the audit trail and the projections. - Enable WAL and ship backups off-box.
- The watch lockfile prevents two planners; don't run two watch containers against one volume.
- Restart is safe: kill state lives in the log, so a restarted engine holds after a kill rather than resuming.
Monitoring & alerts
GET /healthz— wire to your uptime monitor (200/503).OLBOS_ALERT_WEBHOOK— fires on the events a human must see: owner kills, fail-safe kills, venue health violations, on-chain revokes, healthz failures.- Rate limits — token buckets (300/min reads, 30/min writes) return
429; tune to your load.
Serving model
Olbos is open source + hosted: the runtime is open (a custody claim is only credible if the requesting code is inspectable), and a hosted instance is operated for those who don't want to self-host. Self-hosters run the same code against their own RPC and keep everything.
Production hardening checklist
Done:
- ✅ rate limits, body-size limit, configurable CORS, trust-proxy
- ✅
/healthz, alert webhook on critical events - ✅ watch-loop single-instance lock; restart-safe kill state
- ✅ boot guards on mint/network config splits
Before external capital:
- ⬜ external security audit — the top blocker (custody flow, session auth, rail)
- ⬜ one real settled x402 dollar end-to-end on mainnet
- ⬜ paid RPC with retry/failover
- ⬜ engine key → HSM/KMS (compromise is bounded by the Swig role, but still)
- ⬜ active reconciliation (event-log vs on-chain balance watchdog)
- ⬜ adapter drift monitoring (alert when mainnet state breaks an SDK)
- ⬜ facilitator fallback (second facilitator or self-hosted)
- ⬜ withdrawal exit-liquidity handling at high utilization
- ⬜ integration tests + CI (custody, adapters, auth)
- ⬜ legal posture (regulatory review, ToS, risk disclosures)
See Security — known limitations for the why behind each.
Next: Security · Architecture.
