AI systems and agent infrastructure

The harness is the product, not the prompt.

We build the system that survives production. State, recovery, observability, orchestration. Full stack, database to deployment.

Bonfire agent harness (compact)orchestratorcheckpoint storestatesub-agentrecovery loopresumeidem: a1f3… · budget ok · approval req

Every business is moving onto AI, the way they once moved off paper. Anyone can ship a prompt and a demo. The hard part is the architecture that keeps an agent correct, cheap, and recoverable in the real world. That is harness design — where systems are won or lost.

  1. Paper
  2. Prompt + demo
  3. Harness

What separates a system from a feature

Reliability is a design decision, not a patch.

A comparison of two agent designs. Left, the prompt-first chain: Sub-agent runs, then Sub-agent dies, then Orchestrator retries from the top, then Duplicate charge · corrupted dataThe default for agents built prompt-first.. Right, the harness chain: Checkpoint state, then Sub-agent runs, then Sub-agent dies, then Recovering agent resumes, which loops back to its checkpoint — Failure is recoverable by construction.. An execution trace records the incident: at 00.00, checkpoint written; at 00.01, tool call started; at 00.04, side effect landed; at 00.05, worker failed; at 00.06, recovery read checkpoint; at 00.06, duplicate avoided.

Prompt-first

Sub-agent runs

side effect lands — API call, row written

Sub-agent dies

mid-task, after the effect

Orchestrator retries from the top

knowing none of it

Duplicate charge · corrupted data

the agent still looks fine

The default for agents built prompt-first.

How the design holds

Checkpoint state

before any stateful operation

Sub-agent runs

idempotency token per call

skip duplicate

Sub-agent dies

failure still happens — by design

retry

Recovering agent resumes

reads checkpoint, does not repeat

resume

Failure is recoverable by construction.

Execution trace

  1. 00.00checkpoint written
  2. 00.01tool call started
  3. 00.04side effect landed
  4. 00.05worker failed
  5. 00.06recovery read checkpoint
  6. 00.06duplicate avoided
  • Idempotency tokens
  • Circuit breakers
  • Cost guards
  • budget ok

idem_key = sha256(agent_id + step + payload)

MRHF-Codec vs. Meta's DAC

A neural audio codec we built and measured. High-frequency reconstruction at 6 kHz and above, against Meta's Descript Audio Codec baseline.

MetricMRHF-CodecBaseline (Meta DAC)Delta
HF-SI-SDR @ 6 kHz++11.64 dB−30.44 dB+42 dB
Bitrate11.8% lowerbaseline−11.8%
Metrics won7 / 107 / 10

MRHF-Codec — 11.7M-param generator, trained on 275,527 files (187.4 hrs), 240+ tests across 9 modules. Source: research project benchmark.

What we build

Eight things we do, backed by shipped work.

From systems design to a research-grade ML edge.

Evidence, not adjectives

Shipped systems and open-source infrastructure.

Design judgment shows up in what survived production.

Production
A multi-tool AI-agent SaaS, web and desktop. 12+ API integrations, hardened connectors.
Ecosystem
goalkeeper on the Claude Code marketplace, plus reaper-mcp and vst-bench. Servers others run.
Day one
HIPAA-adjacent. Audit logging, on-device ML, designed in.
Research
Neural codecs and diffusion models, benchmarked.

How we work together

Three ways to bring us in.

01

Advisory

Discovery, impact-versus-effort scoring, an AI Scope Document. Senior judgment before you commit a budget.

02

Fractional CTO

Technical leadership for AI initiatives, no full-time hire.

03

Build

We design, build, and ship the system. Full stack, production-quality bar.

Your first move onto AI, or keeping a production system honest. The work we do.

Tell us where you are with AI. We will tell you plainly if we can help.