Skip to content

Observability

ORCA Framework observability is a practical, file-based tracing layer for understanding what an agent run actually did. It is meant to support debugging, review, QA, and trajectory evaluation without pretending ORCA Framework needs a full telemetry platform.

Purpose

Use traces to answer questions such as:

  • What issue, project, or work item was this run acting on?
  • Which agent role performed the work?
  • What steps were taken and in what order?
  • Which tools, commands, and artifacts were involved?
  • Where did the run fail, retry, or stop?
  • What evidence exists for later review or evaluation?
  • What receipt summarizes the run?
  • Which replay or restore artifacts exist for debugging?

Trace Model

Each meaningful run should capture:

  • Run identity: run ID, session ID if available, start time, end time, elapsed time
  • Work identity: Linear issue, project, or opt-out record
  • Agent identity: command, skill, role, harness
  • Context read: artifacts, issue comments, docs, external sources
  • Actions taken: major steps, tools used, commands executed, files read or written
  • Decisions made: important branches, approvals requested, scope changes, assumptions
  • Receipt link: compact summary of the run outcome
  • Lineage link: upstream and downstream artifact relationships when tracked
  • Goal state: objective, contract link, lifecycle transition, verifier result, and steering note when goal mode is active
  • Reliability signals: retries, failures, blockers, warnings, stop reason
  • Optional metadata: token, cost, cached-token, or cache-read data when the harness exposes it

When workflow accounting is enabled, traces should link to the per-run metrics artifact described in docs/workflow-accounting.md. When Orca Monitor status export is enabled, traces and receipts should be summarized into the local snapshot described in docs/orca-monitor-status.md.

Use templates/run-trace.md as the default artifact shape and templates/contracts/trace-contract.md for the required fields. For portable machine-readable structure, use schema/versions/v1/run-trace.schema.json.

What To Trace

Trace decisions and workflow boundaries, not every keystroke.

Good trace entries include:

  • starting a spec or plan pass
  • reading a work item, spec, or QA brief
  • invoking a risky command or external tool
  • calling a governed tool or MCP server
  • deciding to stop, retry, escalate, or request approval
  • writing or updating a durable artifact
  • changing course because evidence contradicted an assumption

What Not To Trace

Do not trace:

  • secrets, tokens, credentials, or raw environment values
  • full copies of external pages when a short citation is enough
  • repetitive low-value shell noise
  • private user data that is not needed for debugging or evaluation
  • sensitive security details that should stay in a restricted report
  • raw secrets or credential values passed through tool parameters
  • large logs or transcript dumps when a targeted excerpt or linked artifact is enough
  • Orca Monitor status snapshots that claim billing, quota, account, or hosted usage truth

Traces Versus Run Memory

Run memory and traces are related but different:

  • Run memory records durable project facts, decisions, and context worth carrying forward.
  • Traces record what happened during a specific run.

Run memory answers "what should future runs remember?". Traces answer "what did this run actually do?".

See docs/run-memory.md for the durable-memory side of the model. See docs/shared-state.md for the active coordination side of the model.

How Traces Help

  • Debugging: find where a run made the wrong assumption or skipped a gate.
  • Review: show what evidence supported a change or recommendation.
  • Evals: score the trajectory, not just the final answer.
  • Handoffs: let the next agent understand what already happened.
  • Inspection: give humans enough evidence to approve or resume without rereading the entire history.
  • Replay and restore: let maintainers compare newer behavior or recover from known-good workflow states.
  • Token efficiency: show where retries, context drift, or oversized artifacts are wasting spend.

Storage Model

Start simple:

  • keep traces as Markdown artifacts linked from the work item
  • allow schema-backed companions when validation or transformation matters
  • keep the Orca Monitor status export as a tiny derived local JSON file, not as the trace itself
  • use one trace per meaningful run or phase
  • prefer inspectable files over opaque binary logs
  • keep summaries in the work item comment and detailed traces in linked artifacts when needed

Lightweight Rules

  • Trace only meaningful work.
  • Keep entries concise and evidence-oriented.
  • Record stop reason clearly.
  • Link the trace from the Linear issue or opt-out record when the run matters to project state.