Operate¶
This page covers running Bubo in production: deploying to a host, scheduling the poller, grading outcomes, governance reporting, and one-shot backfill.
Deploy to a host¶
Single-host install, three commands:
# 1. Install the package + entry points into an isolated venv.
uv tool install bubo
# 2. Lay down the per-host workspace, Codex profile, Claude settings,
# rendered cron/systemd templates, and SQLite schema.
bubo init
# 3. Verify before scheduling — non-zero exit on any missing piece.
bubo doctor
bubo init writes under $BUBO_ROOT (default ~/.local/share/bubo); pass
--root /opt/bubo to override. Init is idempotent: on an upgrade it leaves
config/env.toml alone (your edits are preserved) and refresh-overwrites the
packaged prompts/skills/plugins. Add --force to overwrite config/env.toml
or ~/.codex/config.toml too.
Remote host (SSH)¶
For a fleet, wrap the three lines in your config-management tool (Ansible,
Pyinfra, Salt). The old bespoke scripts/deploy-package.sh is gone — use the
uv tool install flow above.
Upgrades¶
uv tool install --reinstall bubo
bubo init # idempotent — refreshes packaged assets
bubo doctor # confirm
Uninstall¶
uv tool uninstall bubo
# Optionally remove the per-host workspace (SQLite state lives here):
rm -rf "${BUBO_ROOT:-$HOME/.local/share/bubo}"
# Optionally remove the agent configs the installer wrote:
rm -f ~/.codex/config.toml ~/.claude/settings.json
rm -f ~/.codex/skills/code-reviewer # this is a symlink
Schedule the poller¶
bubo initdoes NOT install cron entries or systemd units. Scheduling is a separate, deliberate step — operators run Bubo under different regimes (cron, systemd, Kubernetes CronJob, Nomad, …), so the install path stays scheduler-agnostic.
bubo init materializes three ready-to-install templates under
$BUBO_ROOT/deploy/templates/. The {{ROOT}} placeholder is already
substituted with your install path, so the files are ready for sudo install /
systemctl enable with no hand-editing. Pick one of the two paths below.
Both run the same cadence (poll every 15 min, sync outcomes hourly, health
probe every 5 min) and both ship separate flock files per role (poller /
outcome-sync / health) to prevent the cross-role lock collision that broke a
production deploy in 0.5.0 (see
CHANGELOG.md
[0.5.1]).
Cron¶
# Create the lock dir owned by the runtime user (once).
sudo install -d -o bubo -g bubo -m 0755 /var/run/bubo
# Install the cron drop-in. The exact directory is distro-specific:
# Debian / Ubuntu / RHEL → /etc/cron.d/
# macOS launchd hosts → use the systemd path or convert manually
sudo install -m 0644 \
"$BUBO_ROOT/deploy/templates/bubo.cron" \
/etc/cron.d/bubo
The three lines fire bubo-poller (poll cycle),
bubo-poller --sync-outcomes (outcome grading), and bubo-poller --health
(liveness probe) at staggered cadences. Each invocation runs once and exits —
there's no daemon mode, so tight intervals are safe.
systemd¶
sudo cp "$BUBO_ROOT/deploy/templates/bubo.service" /etc/systemd/system/
sudo cp "$BUBO_ROOT/deploy/templates/bubo.timer" /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now bubo.timer
The service file uses LoadCredential= to inject secrets from
/etc/bubo/credentials/, keeping tokens out of config/env.toml on disk.
Pair it with TOML env interpolation in the config:
Verify¶
bubo doctor runs four non-mutating checks:
| Check | What fails it |
|---|---|
Workspace dirs (config/, var/state, var/work, var/log) |
Init never ran, or the install root moved without re-init. |
config/env.toml present |
Operator hasn't seeded the config yet. |
| SQLite DB initialized | Schema-init step skipped or DB file deleted. |
~/.codex/config.toml contains [profiles.bubo] |
Codex profile missing or hand-edited away — the v0.5.0 incident's exact failure mode. |
Pass --no-agent-config to skip the Codex profile check on hosts that
hand-roll the Codex config. Doctor returns 0 on a full pass and non-zero on any
failure, so it drops cleanly into a cron-driven liveness probe.
Troubleshooting¶
Host and infrastructure issues — sandbox failures, AppArmor / user-namespace errors, and "no findings" results that are not a genuine clean review — live on a dedicated page: Troubleshooting.
Outcome sync¶
--sync-outcomes grades posted findings against current SCM state. It runs
from the cron line bubo init materializes, so once scheduling is set up you
never call it by hand.
It records whether each finding was resolved, left unresolved after merge,
deleted, replied to, marked disputed, marked false-positive, or marked
duplicate — feeding the llm_review.findings{status=…} counter described in
telemetry.md.
Reply classification¶
A thread can be resolved because the developer fixed the finding or because
they rejected it ("working as intended", "not a blocker"). The resolved
flag alone can't tell them apart, which would overcount the reviewer's
precision. So when a finding's discussion has a developer reply and no explicit
dispute marker, --sync-outcomes asks an LLM to read the bot's finding plus the
reply and decide whether it was accepted or rejected. A rejection sets
disputed (and false_positive when the reply says the finding is factually
wrong).
- Model-agnostic. Classification runs the same agent you set in
[agents].reviewer_command(defaultcodex exec --profile bubo; a custom command likeclaude -pis reused as-is). The review contract lives in the prompt, not the command, so the same command makes the accept/reject call here. - Classified once. Each finding is classified one time (tracked by the
reply_classifiedcolumn) to bound LLM cost across the hourly sync. A reply that arrives after the first classification isn't re-graded. - Cold-start safe. At most a few dozen findings are classified per run, so the first sync after upgrading — when the whole backlog of resolved-with-reply findings is unclassified — won't fire hundreds of agent calls at once. The backlog drains over later runs.
- Fail-safe. A transient classifier failure (timeout, non-zero exit) leaves the finding unclassified and retries on a later sync; unparseable output degrades to "unclear". Either way the sync keeps going.
Prefer the explicit path? Tag a reply with [llm-review:disputed] /
[llm-review:false-positive] — an explicit marker short-circuits the LLM call.
Governance report¶
bubo report assembles an auditable governance report from the metrics already
in SQLite: review counts (with a no-findings acknowledgements rollup), a
provenance breakdown (counts by band/source), the accept-vs-dispute
rate, a noise trend (daily false-positives), a bug-catch ROI proxy,
review latency (p50/p95/max/avg seconds), per-category dispute rates,
token/cost rollups, policy-decision stats (from the Phase 2
governance_decisions), and a per-change audit trail. It's the
compliance-facing companion to the provenance + governance-decision capture in
configuration.md, "Governance & provenance".
It is strictly READ-ONLY — it only queries existing state, never mutates it,
so it's safe to run from a monitoring cron alongside bubo doctor. It's also
self-hosted: the data comes from your own SQLite and never leaves your
infrastructure, which is the whole compliance pitch for regulated and enterprise
governance teams.
# Full nested report as JSON (the default).
bubo report
# A week of the per-change audit trail as CSV.
bubo report --since-hours 168 --format csv > audit.csv
Flags¶
| Flag | Default | What it does |
|---|---|---|
--format {json,csv} |
json |
Output format. JSON is the full nested report; CSV is a single tabular section. |
--section |
audit |
CSV only — which section to emit. audit (the per-change audit trail), noise_trend, or dispute_classes. Ignored for JSON. |
--since-hours |
24 |
Rolling window, in hours, for the report. |
--since / --until |
unset | ISO dates bounding a fixed audit window. Override --since-hours for a reproducible window. |
--project |
all | Restrict the report to a single project path. |
--limit |
unset | Cap the number of audit-trail rows. |
--root |
$BUBO_ROOT |
Workspace root to read state from (same meaning as elsewhere). |
JSON vs CSV¶
- JSON (default) is the full nested report. Sections:
meta,reviews(with a nestedacknowledgementsrollup),provenance,outcomes,noise_trend,roi,latency,dispute_classes,policy_decisions, andaudit. Scalar rollups (review counts, accept/dispute rate, ROI proxy, latency, token/cost) are JSON-only — they have no tabular form. - CSV is a single tabular section, picked with
--section. The default is the per-changeaudittrail;--section noise_trendemits the daily false-positive trend, and--section dispute_classesthe per-category dispute rates (CLI path: raw stats, nowould_suppress). Use CSV for spreadsheet/BI import of the row-shaped sections.
MCP tool¶
The same report is available to MCP-capable chat clients (Codex, Claude Desktop,
Cline) as get_governance_report(since_hours, since, until, project). It
returns the same nested JSON as bubo report --format json, so a governance
analyst can pull the report into a chat session without shell access.
A companion get_dispute_classes(project) tool returns just the per-category
dispute stats (cumulative — they span all recorded outcomes, not a rolling
window, so the tool takes no window argument). Unlike the CLI path, it reads
your real [review].dispute_suppress_threshold / dispute_suppress_min_samples
and adds a truthful would_suppress flag per category — so an analyst can see
which classes the suppression filter would drop if enabled, computed against
your actual thresholds rather than a hardcoded guess.
Policy-decision stats are populated only when Phase 2 governance is enabled
([governance].policy_mode not off). When it's off, the policy_decisions
section reports available: false rather than empty counts, so an auditor can
tell "no decisions recorded" apart from "governance decisions were never turned
on".
Backfill — one-shot, not a cron job¶
The backfill commands import bot comments that already exist on the SCM into local SQLite. Reach for them when:
- You just deployed against a project where the bot has posted before from another install.
- You reset
var/state/reviewer.sqlite(test, rebuild, host migration) and need the per-finding metrics to reflect history, not just go-forward.
They are deliberately off the cron schedule. Each run scans every MR/PR
updated since the cutoff — fine for a one-shot recovery, wasteful every 15
minutes. Run them once, then let --sync-outcomes take over for go-forward
grading.
# GitLab
bubo-poller --backfill-gitlab-bot-comments-since 2026-05-25T00:00:00Z
# GitHub
BUBO_PROVIDER=github bubo-poller --backfill-github-bot-comments-since 2026-05-25T00:00:00Z
# Then grade the imported rows once:
bubo-poller --sync-outcomes
Both backfill commands are idempotent — a comment already in SQLite is upserted, not duplicated — so re-running with a different cutoff is safe.
Migrating from the shell installer¶
The scripts/install-package.sh / scripts/deploy-package.sh shell installers
(v0.5.x and earlier) have been removed. If you're still running an old
shell-installer deployment, migrate to the uv tool install path:
# On the host that has a working shell-installer deploy:
uv tool install bubo
bubo init # idempotent — keeps env.toml
bubo doctor # confirm
# Optional cleanup once the new install looks good:
rm -rf "$BUBO_ROOT/bin" \
"$BUBO_ROOT/scripts" \
"$BUBO_ROOT/src" # left over from shell-installer copy
State (var/state/reviewer.sqlite) and operator config (config/env.toml)
survive the migration — bubo init shares the same workspace layout as the old
shell installer.