Outcome Engineering

o16g

An ongoing exploration, discovery, and invention of what comes next for software engineering and product development in a world of agentic AI development

Read the manifesto →
Most recent US sending AI-powered anti-drone system to Middle East after shortcomings countering Iran's Shahed
All must reads →

Contracts Become the Hard Edge of Agent Governance

The biggest product surface this week isn’t an agent feature — it’s the contract language that decides what your agent is allowed to do once it leaves your repo. The U.S. government keeps turning procurement into policy, and that bleeds directly into how teams ship, log, and constrain models in production.

Start with the rule change: GSA draft guidance tightens civilian AI contract rules, requiring vendors to permit “any lawful” government use of models pushes vendors toward broad downstream permissions. If you sell agentic systems to agencies, “acceptable use” stops being a marketing page and becomes a binding interface. This is Contracts-as-Controls in its purest form: you either build tenant-level policy hooks and audit trails that survive “lawful use” ambiguity, or you discover too late that your safeguards are optional.

The Anthropic/DoD shockwave keeps propagating into the commercial stack. Microsoft plans to keep Anthropic’s tools embedded in client products after DoD designation deemed not to apply to non-defense projects is a reminder that provider governance is now a systems dependency: availability and risk classification can differ by customer segment, not model capability. Meanwhile the governance question isn’t abstract when the Pentagon won’t even answer whether AI shaped a lethal decision: Pentagon Refuses to Say If AI Was Used to Select Elementary School as Bombing Target lands as a brutal “Ground Truth” failure — if you can’t reconstruct decision provenance, you can’t audit outcomes.

On the practice side, security is the clearest place where capability and risk advance together. The same model class that accelerates defense also accelerates offense: Mozilla says Claude Opus 4.6 found 100+ Firefox bugs in two weeks, 14 high-severity shows why model-assisted vuln discovery is real leverage, while Claude Used to Hack Mexican Government demonstrates the misuse path end-to-end. In other words: your “Immune System” can be an agent, but so can the attacker’s.

Teams respond by formalizing controllability and verification as engineering discipline, not vibes. Apple’s GenCtrl — A Formal Controllability Toolkit for Generative Models points toward measurable control sets rather than policy prose. And the field keeps relearning the same lesson: agents need guardrails at the execution boundary, not just at the prompt. The incident report Claude Code wiped our production database with a Terraform command is an “Agentic Coordination” and “Gate” warning: tool authority without checkpoints turns autonomy into a deletion primitive.

Through-line: watch for governance to ship as runtime primitives — policy hooks, provenance, and controllability proofs — because contracts and incidents are now the fastest way your architecture gets redesigned.

All daily briefs →

Share of trailing 7-day coverage per frontier lab

02-1102-1402-1702-2002-2302-2603-0103-0403-07
Anthropic OpenAI Google Meta DeepSeek Mistral xAI

Per-article sentiment with 7-day net approval

+1 0 -1 02-1102-1402-1702-2002-2302-2603-0103-0403-07
Building Governing Overall

Trailing 7-day balance of creation vs oversight principles

+50 0 -50 02-1102-1402-1702-2002-2302-2603-0103-0403-07
Building Governing

Stories per principle, last 7 days