Observability¶

ORCA Framework observability is a practical, file-based tracing layer for understanding what an agent run actually did. It is meant to support debugging, review, QA, and trajectory evaluation without pretending ORCA Framework needs a full telemetry platform.

Purpose¶

Use traces to answer questions such as:

What issue, project, or work item was this run acting on?
Which agent role performed the work?
What steps were taken and in what order?
Which tools, commands, and artifacts were involved?
Where did the run fail, retry, or stop?
What evidence exists for later review or evaluation?
What receipt summarizes the run?
Which replay or restore artifacts exist for debugging?

Trace Model¶

Each meaningful run should capture:

Run identity: run ID, session ID if available, start time, end time, elapsed time
Work identity: Linear issue, project, or opt-out record
Agent identity: command, skill, role, harness
Context read: artifacts, issue comments, docs, external sources
Actions taken: major steps, tools used, commands executed, files read or written
Decisions made: important branches, approvals requested, scope changes, assumptions
Receipt link: compact summary of the run outcome
Lineage link: upstream and downstream artifact relationships when tracked
Goal state: objective, contract link, lifecycle transition, verifier result, and steering note when goal mode is active
Reliability signals: retries, failures, blockers, warnings, stop reason
Optional metadata: token, cost, cached-token, or cache-read data when the harness exposes it

When workflow accounting is enabled, traces should link to the per-run metrics artifact described in docs/workflow-accounting.md. When Orca Monitor status export is enabled, traces and receipts should be summarized into the local snapshot described in docs/orca-monitor-status.md.

Use templates/run-trace.md as the default artifact shape and templates/contracts/trace-contract.md for the required fields. For portable machine-readable structure, use schema/versions/v1/run-trace.schema.json.

What To Trace¶

Trace decisions and workflow boundaries, not every keystroke.

Good trace entries include:

starting a spec or plan pass
reading a work item, spec, or QA brief
invoking a risky command or external tool
calling a governed tool or MCP server
deciding to stop, retry, escalate, or request approval
writing or updating a durable artifact
changing course because evidence contradicted an assumption

What Not To Trace¶

Do not trace:

secrets, tokens, credentials, or raw environment values
full copies of external pages when a short citation is enough
repetitive low-value shell noise
private user data that is not needed for debugging or evaluation
sensitive security details that should stay in a restricted report
raw secrets or credential values passed through tool parameters
large logs or transcript dumps when a targeted excerpt or linked artifact is enough
Orca Monitor status snapshots that claim billing, quota, account, or hosted usage truth

Traces Versus Run Memory¶

Run memory and traces are related but different:

Run memory records durable project facts, decisions, and context worth carrying forward.
Traces record what happened during a specific run.

Run memory answers "what should future runs remember?". Traces answer "what did this run actually do?".

See docs/run-memory.md for the durable-memory side of the model. See docs/shared-state.md for the active coordination side of the model.

How Traces Help¶

Debugging: find where a run made the wrong assumption or skipped a gate.
Review: show what evidence supported a change or recommendation.
Evals: score the trajectory, not just the final answer.
Handoffs: let the next agent understand what already happened.
Inspection: give humans enough evidence to approve or resume without rereading the entire history.
Replay and restore: let maintainers compare newer behavior or recover from known-good workflow states.
Token efficiency: show where retries, context drift, or oversized artifacts are wasting spend.

Storage Model¶

Start simple:

keep traces as Markdown artifacts linked from the work item
allow schema-backed companions when validation or transformation matters
keep the Orca Monitor status export as a tiny derived local JSON file, not as the trace itself
use one trace per meaningful run or phase
prefer inspectable files over opaque binary logs
keep summaries in the work item comment and detailed traces in linked artifacts when needed

Lightweight Rules¶

Trace only meaningful work.
Keep entries concise and evidence-oriented.
Record stop reason clearly.
Link the trace from the Linear issue or opt-out record when the run matters to project state.