Production Playbook

Guides

Production Playbook

Operational playbook for running Agent Observer safely with long-running workloads.

Production Playbook

This playbook is for teams running Agent Observer as a daily operational tool.

Operational Goals

keep workspace scope correct
reduce unattended failure rates
keep long-running tasks recoverable
make incidents diagnosable quickly

Preflight Checklist

Run before enabling recurring jobs or large todo batches:

Verify default workspace path.
Verify required folder permissions in macOS.
Verify runner commands execute manually.
Verify prompts are bounded and deterministic.
Run one dry-run for every new automation pattern.

Safe Rollout Pattern

Phase 1: Smoke

Use small prompts and tiny todo lists (2-3 items).
Run manually.
Confirm status transitions and output quality.

Phase 2: Limited

Increase to one real workflow.
For schedules: start with low frequency.
For todo runner: use medium list (~10 items).

Phase 3: Full

Enable production cadence.
Keep explicit rollback procedures.
Monitor daily until stable.

Schedules Runbook

Use for recurring checks and summaries.

Daily checks:

Review schedules with error status.
Confirm next-run timestamps are reasonable.
Disable tasks with repeated noisy failures.
Update prompts to reduce ambiguity.

Todo Runner Runbook

Use for large finite backlog execution.

Per job checks:

Review progress (completed / total).
Review failed/blocked counts.
Resolve root cause before reset/replay.
Resume from current item whenever possible.

Incident Response

Severity 1: Wrong workspace modifications

Pause active runner/schedule.
Snapshot current git state.
Re-scope workspace paths.
Re-run with strict path constraints.

Severity 2: Persistent runner failures

Reproduce one failing item manually.
Capture stderr and dependency errors.
Patch runner or prompt contract.
Resume from remaining items.

Severity 3: UI/state inconsistency

Restart app.
Reload one workspace only.
Confirm persisted job/schedule state.
Re-enable automations gradually.

Prompt Engineering Rules For Ops

Use prompts that include:

scope boundary
expected deliverable format
explicit stop conditions
explicit failure output

Avoid prompts that:

request broad speculative changes
span multiple repositories without explicit boundaries
omit success criteria

Weekly Reliability Review

Track:

schedule success/error ratio
todo runner completion time by batch size
top recurring failure causes
mean time to recovery

Use this review to tighten prompts, runner contracts, and scope defaults.

Install Telemetry Data Quality Runbook

Use this runbook to validate production install-count accuracy.

Daily Checks

Verify GET /api/world-state returns installSource.kind = production when production mode is enabled.
Verify installCount is non-decreasing unless retention policy intentionally prunes old records.
Verify ingest success ratio from POST /api/install-beacon is healthy (202 expected, 429 monitored).
Verify duplicate ingestion does not increase uniqueInstallCount.

Drift / Incident Checks

Post the same beacon payload twice and verify the second response has duplicate: true.
Post a malformed payload and verify schema rejection (400).
Burst test one installation hash and verify rate-limit behavior (429 with Retry-After).
Confirm aggregate store path and write permissions for AGENT_OBSERVER_INSTALL_BEACON_STORE_FILE.

Release Gate

Before shipping telemetry changes:

Run tests/smoke/install-beacon.spec.ts.
Run tests/smoke/install-beacon-backend.spec.ts.
Confirm reference/install-telemetry is updated.
Confirm reference/world-state-api is updated.
Confirm reference/troubleshooting is updated.

Suggested Team Ownership

Doc owner: updates runbooks after feature changes
Ops owner: reviews recurring failures and remediation
Release owner: validates automation behavior before releases

Agent SDK Runner Recipes

Practical runner patterns for wiring Todo Runner to Anthropic Agent SDK workers.

Keyboard Shortcuts

Core keyboard interactions for chat, navigation, and terminal workflows.