Skip to main content

AI Governance: Lessons from the First Wave

Eighteen months of enterprise AI governance in practice. What worked, what didn't, and what we'd do differently - from the organisations that went first.
1 September 2025·9 min read
Dr Tania Wolfgramm
Dr Tania Wolfgramm
Chief Research Officer
The first wave of enterprise AI governance is 18 months old. We now have enough real-world evidence to assess what works, what doesn't, and what the second wave should look like. The short version: governance frameworks that treat AI like a compliance checkbox have already failed. Governance embedded in engineering practice is succeeding.

What You Need to Know

  • Risk classification works. Categorising AI applications by risk level (low/medium/high/critical) and applying proportional governance is the single most effective governance practice. Organisations that apply the same governance to a content summariser and a credit decision engine waste resources and slow delivery.
  • Human-in-the-loop is necessary but poorly implemented. Most enterprises have a "human reviews AI output" requirement. Few have defined what that review involves, how to do it consistently, or how to measure its effectiveness.
  • Checkbox compliance has already failed. Governance frameworks modelled on traditional compliance (annual audits, policy attestation, training certificates) don't work for AI. AI systems change continuously; governance must be continuous too.
  • The governance-engineering gap is the biggest practical problem. Legal writes policies. Engineering builds systems. Neither speaks the other's language. Bridging this gap is worth more than any framework document.
71%
of enterprises with AI governance frameworks report a significant gap between documented policy and actual practice
Source: Deloitte, AI Governance in Practice Survey, 2025

What Worked

Risk Classification

The most effective governance practice we've seen: classify every AI application by risk level and apply proportionate controls.
Risk LevelExamplesGovernance Requirements
LowInternal content summarisation, search, code assistanceBasic monitoring, usage policy compliance
MediumCustomer-facing content generation, internal analytics, workflow automationOutput monitoring, periodic review, data governance
HighFinancial decisions, claims processing, hiring supportHuman review for decisions, audit trails, bias monitoring, regular model evaluation
CriticalAutonomous decisions affecting individuals, safety-critical systemsMandatory human approval, full traceability, independent audit, regulatory alignment
Why it works: Risk classification prevents governance from becoming a bottleneck for low-risk applications while ensuring high-stakes AI gets appropriate oversight. Without it, organisations either under-govern everything (risk) or over-govern everything (paralysis).

Human-in-the-Loop for High-Stakes Decisions

Every organisation we work with has some form of human-in-the-loop requirement for AI that affects customers, finances, or safety. The principle is right. The ones that succeed implement it as a technical pattern, not just a policy.
What good human-in-the-loop looks like:
  • The AI system presents its output with confidence scores and supporting evidence
  • The human reviewer has clear criteria for what to approve, modify, or escalate
  • The review decision is logged with the reviewer's identity, the AI output, and any modifications
  • Review quality is periodically assessed. Are reviewers actually reviewing, or rubber-stamping?

Monitoring and Alerting

Governance that operates in real-time (monitoring AI outputs, detecting anomalies, alerting on threshold violations) is dramatically more effective than periodic review. The organisations that built AI monitoring into their observability stack (alongside application performance monitoring, error tracking, and security monitoring) report faster incident detection and higher confidence in their AI systems.
The Monitoring Test
Ask your team: "If our claims processing AI started producing consistently biased outputs at 2am on a Saturday, how long would it take us to detect it?" If the answer is "Monday morning when someone notices," your governance is incomplete.

What Didn't Work

Checkbox Compliance

The pattern: AI governance modelled on traditional compliance: an annual policy review, a training module, a signed attestation, a filed report. Check the box, move on.
Why it fails: AI systems aren't static. Models are updated. Data distributions shift. Usage patterns change. A governance review conducted in January is outdated by March. Checkbox compliance creates a false sense of security while AI systems drift beyond the boundaries the governance was designed to enforce.
The lesson: Governance for AI must be continuous, automated where possible, and embedded in the systems themselves, not a periodic human process layered on top.

Policy Without Technical Enforcement

The pattern: A governance document states "All AI-generated customer communications must be reviewed before sending." The AI system has no mechanism to enforce this. A developer or user can bypass the review step. There's no automated check that the review occurred.
Why it fails: Policies that rely on human compliance without technical enforcement are aspirational, not operational. Under time pressure, review steps get skipped. Without detection, the skip goes unnoticed until an incident.
The lesson: If a governance requirement matters enough to document, it matters enough to enforce technically. Review gates should be in the code, not just the policy. Bypass attempts should be logged and alerted.

Centralised Governance Without Engineering Input

The pattern: The governance framework was designed by legal, risk, and compliance. It reads like a regulatory document. Engineering was consulted late or not at all. The framework includes requirements that are technically impractical or impossible to implement.
Why it fails: Governance that can't be implemented is worse than no governance. It creates the illusion of control. Requirements like "the AI must be able to explain every decision" are meaningless if the underlying model is a black box. Requirements like "all training data must be reviewed" are impractical for models trained on billions of data points.
The lesson: AI governance must be co-designed by governance experts and engineers. Engineers understand what's technically feasible. Governance experts understand what's necessary. Neither group alone produces workable frameworks.
43%
of AI governance requirements in enterprise frameworks are technically impractical to implement as written
Source: MIT Sloan Management Review, AI Governance Implementation Study, 2025

Practical Lessons for the Second Wave

Based on 18 months of evidence, here's what second-wave AI governance should look like:
1. Start with risk classification, not full frameworks. A simple four-level risk classification that's actually implemented beats a detailed framework that sits in a document.
2. Embed governance in the engineering workflow. Governance checks should run in CI/CD pipelines, not in quarterly reviews. Audit trails should be generated automatically, not maintained manually.
3. Make human-in-the-loop a technical pattern. Define the interface, the decision criteria, the logging requirements, and the quality metrics. Then build them into the system.
4. Monitor continuously, review periodically. Automated monitoring catches drift in real-time. Periodic reviews assess whether the governance framework itself needs updating. Both are necessary; neither alone is sufficient.
5. Bridge the governance-engineering gap. The single highest-value governance investment is putting governance experts and engineers in the same room (or the same team) so that frameworks are designed to be implementable and implementations reflect governance intent.
The first wave of AI governance was written by people who understand risk but not AI systems. The organisations that invest in governance engineering, not just governance policy, will manage AI risk effectively - the rest will manage paperwork.
Dr Tania Wolfgramm
Chief Research Officer
We already have a governance framework. Should we start over?
No. Audit what you have against the patterns above. Most first-wave frameworks have solid risk classification and policy foundations. They just lack technical enforcement and continuous monitoring. Extend what works; replace what doesn't. Starting over wastes the organisational alignment that your existing framework achieved.
How do we staff AI governance?
You need a bridge role: someone who understands both governance requirements and technical implementation. This might be a senior engineer with governance training, a governance professional with technical curiosity, or a dedicated AI governance lead. The role matters more than the title. In smaller organisations, this can be a part-time responsibility rather than a dedicated position.
What frameworks or standards should we align with?
ISO 42001 (AI Management Systems) is the emerging standard. The NIST AI Risk Management Framework provides practical guidance. The EU AI Act, while not directly applicable in NZ/AU, is influencing global practice. Align with ISO 42001 as your foundation and adapt to local regulatory requirements.