AI governance is a logging problem before it's a policy problem

I have sat in the AI governance meeting. The one with the steering committee, the responsible AI charter on the screen, the heat map of risk categories color coded like a weather forecast. Everyone nods. The policy gets approved. Then it goes in a SharePoint folder and the agents keep running.

A few weeks later I ask the only question that matters: pull up a production trace from last Tuesday. Show me one agent run end to end. The input it received, every tool it called, the arguments it passed, the intermediate reasoning, the output it returned, and which version of which prompt produced it.

The room goes quiet. They have a dashboard of token spend. They have an error rate. They do not have the trace.

That gap is the whole problem. Governance was treated as a document exercise when it is, first and underneath everything, an instrumentation exercise.

Policy is an assertion. Observability is the proof.

Every line in an AI policy is a claim about behavior. “Agents will not take irreversible actions without human review.” “Models will not be used on protected attributes for credit decisions.” “PII will not leave the boundary.” These are good claims. They are also unverifiable until you can replay what the system actually did.

A claim you cannot check is not governance. It is a press release.

This is why I tell teams to stop arguing about policy language in week one. The language is cheap and you will rewrite it anyway. The expensive, load bearing work is making every agent decision observable: full traces, structured and queryable, capturing input, tool calls, tool outputs, the decision path, and the final response, all stamped with the model version and prompt version that produced them. Once you have that, the policy writes itself, because now you can see what your system does and decide what it should not.

Do it in the other order and you get theater. A beautiful framework governing a black box.

The regulators are not asking you to be ethical. They are asking you to log.

Read the EU AI Act and notice what it leads with. Not values. Not principles. Logs.

Help Net Security’s breakdown of the logging mandate is blunt about it. Article 12 requires high-risk systems to “technically allow for the automatic recording of events (logs) over the lifetime of the system.” Articles 19 and 26 set a six month minimum retention floor. The penalty for getting it wrong reaches 15 million euros or 3 percent of annual turnover. The author’s sharpest point lands on integrity: the Act may not say the word “tamper proof,” but if your logs can be silently altered and you cannot prove otherwise, their evidentiary value is zero.

Sit with that. The legal weight of your entire governance posture collapses to the quality of your logging. Not your charter. Not your committee. Your traces.

That is regulators telling you, in statute, that governance is a logging problem before it is a policy problem. They skipped straight to the instrumentation because they understand that everything downstream depends on it.

The Cost of Doing Nothing is invisible until it is catastrophic

Here is what makes this dangerous. An uninstrumented agent does not fail loudly. It works. It ships value. The dashboards stay green. The CODN hides inside that green.

Because every untraced production run is a liability you have already taken on and cannot see. The agent that quietly started calling a tool it should not. The prompt regression that shifted refusal behavior three weeks ago. The one decision that, when a customer disputes it or a regulator subpoenas it, you will need to reconstruct and simply cannot. You do not discover the cost when the agent misbehaves. You discover it when someone asks you to explain the misbehavior and you have nothing.

The Cost of Doing Nothing on observability is not a future fine. It is the compounding stock of unexplainable behavior you are accruing right now, every hour your agents run blind. You are underwriting risk you have no instrument to measure.

Build the audit you will eventually face

From the Forward Deployed seat, the sequence is not negotiable. Instrument first. Make every agent decision a queryable trace, tamper evident, retained past the regulatory floor, with model and prompt versions attached. Then write your policy against what the traces actually reveal, not against what you imagine the system does. Then govern, with evidence in hand.

The teams that win the next two years are not the ones with the longest responsible AI documents. They are the ones who can answer “why did the agent do that?” in under a minute, on any run, going back six months. Governance is just the discipline you build on top of that answer.

You cannot govern what you cannot see. So before you write one more policy, go look. If you cannot replay yesterday, you are not governing your agents. You are hoping.