AI systems and agent infrastructure

The harness is the product, not the prompt.

We build the system that survives production. State, recovery, observability, orchestration. Full stack, database to deployment.

Start a conversation See the work

Every business is moving onto AI, the way they once moved off paper. Anyone can ship a prompt and a demo. The hard part is the architecture that keeps an agent correct, cheap, and recoverable in the real world. That is harness design — where systems are won or lost.

Paper
Prompt + demo
Harness

What separates a system from a feature

Reliability is a design decision, not a patch.

Prompt-first

Sub-agent runs

side effect lands — API call, row written

Sub-agent dies

mid-task, after the effect

Orchestrator retries from the top

knowing none of it

Duplicate charge · corrupted data

the agent still looks fine

The default for agents built prompt-first.

How the design holds

Checkpoint state

before any stateful operation

Sub-agent runs

idempotency token per call

skip duplicate

Sub-agent dies

failure still happens — by design

retry

Recovering agent resumes

reads checkpoint, does not repeat

resume

Failure is recoverable by construction.

Execution trace

00.00checkpoint written
00.01tool call started
00.04side effect landed
00.05worker failed
00.06recovery read checkpoint
00.06duplicate avoided

Idempotency tokens
Circuit breakers
Cost guards
budget ok

idem_key = sha256(agent_id + step + payload)

MRHF-Codec vs. Meta's DAC

A neural audio codec we built and measured. High-frequency reconstruction at 6 kHz and above, against Meta's Descript Audio Codec baseline.

Metric	MRHF-Codec	Baseline (Meta DAC)	Delta
HF-SI-SDR @ 6 kHz+	+11.64 dB	−30.44 dB	+42 dB
Bitrate	11.8% lower	baseline	−11.8%
Metrics won	7 / 10	—	7 / 10

MRHF-Codec — 11.7M-param generator, trained on 275,527 files (187.4 hrs), 240+ tests across 9 modules. Source: research project benchmark.

What we build

Eight things we do, backed by shipped work.

From systems design to a research-grade ML edge.

Evidence, not adjectives

Shipped systems and open-source infrastructure.

Design judgment shows up in what survived production.

Production: A multi-tool AI-agent SaaS, web and desktop. 12+ API integrations, hardened connectors.
Ecosystem: goalkeeper on the Claude Code marketplace, plus reaper-mcp and vst-bench. Servers others run.
Day one: HIPAA-adjacent. Audit logging, on-device ML, designed in.
Research: Neural codecs and diffusion models, benchmarked.

See the full record

How we work together

Three ways to bring us in.

01

Advisory

Discovery, impact-versus-effort scoring, an AI Scope Document. Senior judgment before you commit a budget.

02

Fractional CTO

Technical leadership for AI initiatives, no full-time hire.

03

Build

We design, build, and ship the system. Full stack, production-quality bar.

Talk to us

Your first move onto AI, or keeping a production system honest. The work we do.

Tell us where you are with AI. We will tell you plainly if we can help.

hello@bonfiresystems.com