Guides
Production Playbook
Operational playbook for running Agent Observer safely with long-running workloads.
Production Playbook
This playbook is for teams running Agent Observer as a daily operational tool.
Operational Goals
- keep workspace scope correct
- reduce unattended failure rates
- keep long-running tasks recoverable
- make incidents diagnosable quickly
Preflight Checklist
Run before enabling recurring jobs or large todo batches:
- Verify default workspace path.
- Verify required folder permissions in macOS.
- Verify runner commands execute manually.
- Verify prompts are bounded and deterministic.
- Run one dry-run for every new automation pattern.
Safe Rollout Pattern
Phase 1: Smoke
- Use small prompts and tiny todo lists (2-3 items).
- Run manually.
- Confirm status transitions and output quality.
Phase 2: Limited
- Increase to one real workflow.
- For schedules: start with low frequency.
- For todo runner: use medium list (~10 items).
Phase 3: Full
- Enable production cadence.
- Keep explicit rollback procedures.
- Monitor daily until stable.
Schedules Runbook
Use for recurring checks and summaries.
Daily checks:
- Review schedules with
errorstatus. - Confirm next-run timestamps are reasonable.
- Disable tasks with repeated noisy failures.
- Update prompts to reduce ambiguity.
Todo Runner Runbook
Use for large finite backlog execution.
Per job checks:
- Review progress (
completed / total). - Review failed/blocked counts.
- Resolve root cause before reset/replay.
- Resume from current item whenever possible.
Incident Response
Severity 1: Wrong workspace modifications
- Pause active runner/schedule.
- Snapshot current git state.
- Re-scope workspace paths.
- Re-run with strict path constraints.
Severity 2: Persistent runner failures
- Reproduce one failing item manually.
- Capture stderr and dependency errors.
- Patch runner or prompt contract.
- Resume from remaining items.
Severity 3: UI/state inconsistency
- Restart app.
- Reload one workspace only.
- Confirm persisted job/schedule state.
- Re-enable automations gradually.
Prompt Engineering Rules For Ops
Use prompts that include:
- scope boundary
- expected deliverable format
- explicit stop conditions
- explicit failure output
Avoid prompts that:
- request broad speculative changes
- span multiple repositories without explicit boundaries
- omit success criteria
Weekly Reliability Review
Track:
- schedule success/error ratio
- todo runner completion time by batch size
- top recurring failure causes
- mean time to recovery
Use this review to tighten prompts, runner contracts, and scope defaults.
Install Telemetry Data Quality Runbook
Use this runbook to validate production install-count accuracy.
Daily Checks
- Verify
GET /api/world-statereturnsinstallSource.kind = productionwhen production mode is enabled. - Verify
installCountis non-decreasing unless retention policy intentionally prunes old records. - Verify ingest success ratio from
POST /api/install-beaconis healthy (202expected,429monitored). - Verify duplicate ingestion does not increase
uniqueInstallCount.
Drift / Incident Checks
- Post the same beacon payload twice and verify the second response has
duplicate: true. - Post a malformed payload and verify schema rejection (
400). - Burst test one installation hash and verify rate-limit behavior (
429withRetry-After). - Confirm aggregate store path and write permissions for
AGENT_OBSERVER_INSTALL_BEACON_STORE_FILE.
Release Gate
Before shipping telemetry changes:
- Run
tests/smoke/install-beacon.spec.ts. - Run
tests/smoke/install-beacon-backend.spec.ts. - Confirm
reference/install-telemetryis updated. - Confirm
reference/world-state-apiis updated. - Confirm
reference/troubleshootingis updated.
Suggested Team Ownership
- Doc owner: updates runbooks after feature changes
- Ops owner: reviews recurring failures and remediation
- Release owner: validates automation behavior before releases