Skip to main content

Responsible AI in Practice, Not Just Policy

Moving from AI ethics documents to operational practices. What responsible AI actually looks like when you're shipping enterprise systems.
10 April 2025·7 min read
Dr Tania Wolfgramm
Dr Tania Wolfgramm
Chief Research Officer
Isaac Rolfe
Isaac Rolfe
Managing Director
Every enterprise we work with has an AI ethics policy. Almost none of them can show how that policy translates into daily operational practice. The gap between principle and practice is where responsible AI fails, and it's where the real work begins.

What You Need to Know

  • AI ethics policies are necessary but insufficient. A policy document doesn't prevent harm. Operational practices, tooling, and review processes prevent harm.
  • Responsible AI is an engineering discipline, not just a governance function. It requires technical implementation: bias testing, output monitoring, audit trails, and feedback loops.
  • The organisations doing this well embed responsible AI into delivery workflows, not as a separate review gate that happens after the build is done.
  • Responsible AI and effective AI are the same thing. Systems that are transparent, auditable, and fair are also systems that users trust and adopt. The business case and the ethical case align.
78%
of enterprises have AI ethics policies, but only 24% report systematic implementation in delivery workflows
Source: Deloitte, State of AI in the Enterprise, 2024

The Practice Gap

Here's what the gap looks like in practice:
The policy says: "AI systems will be fair and unbiased." The practice says: Nobody has tested the system for bias, because there's no testing framework, no budget for testing, and no clear definition of what "fair" means in this specific context.
The policy says: "AI decisions will be explainable." The practice says: The model is a black box, the data lineage isn't tracked, and when a customer asks why the AI made a particular decision, nobody can answer.
The policy says: "Human oversight will be maintained." The practice says: A human reviews AI outputs for the first week, gets overwhelmed by volume, starts rubber-stamping, and within a month the "human in the loop" is effectively automated out.
These aren't hypothetical. They're patterns we see in every enterprise AI assessment we conduct.

From Policy to Practice

1. Translate Principles into Specific, Testable Requirements

"Fair and unbiased" is a principle. "The model's accuracy must not vary by more than 5% across demographic groups in our test dataset" is a testable requirement.
Every principle in your AI ethics policy needs to be translated into specific requirements for each AI capability you build. This translation requires domain expertise (what does "fair" mean in this context?) and technical expertise (how do we measure it?).

2. Build Testing into the Delivery Pipeline

Responsible AI testing shouldn't be a separate gate at the end of development. It should be built into the delivery pipeline, just like security testing and performance testing.
For each AI capability, define:
  • Bias tests: Run against demographic subgroups in your test data. Automated, repeatable, tracked over time.
  • Accuracy monitoring: Not just at launch, but continuously. Model performance degrades. Data distributions shift. Without monitoring, you won't know until users complain.
  • Output auditing: A sample of AI outputs reviewed by domain experts on a regular cadence. Monthly at minimum for high-stakes applications.

3. Design for Transparency

Transparency isn't a report. It's a design requirement.
Source attribution. Every AI-generated recommendation should cite the data it used. Not "based on your data" but "based on Policy Document v3.2, Section 4.1, last updated 2024-11-15."
Confidence communication. The interface should signal when the AI is uncertain. Users need different information to assess a high-confidence result vs a low-confidence one.
Decision logs. For consequential decisions, log the input, the model's reasoning (to the extent extractable), the output, and any human override. This log is your audit trail.

4. Operationalise Human Oversight

"Human in the loop" is meaningless without operational design. The key questions:
  • What does the human review? All outputs? A sample? Only flagged cases? The answer depends on the stakes and the volume.
  • How long does review take? If each review takes 5 minutes and the system processes 500 cases a day, you need a review team, not a single reviewer.
  • What happens when the human disagrees? Is there a clear override process? Does the override feed back into the system to improve future performance?
  • How do you prevent review fatigue? When accuracy is high, reviewers start trusting automatically. This is natural but dangerous. Rotate reviewers. Inject known errors to keep attention sharp. Measure review quality, not just review completion.
The Audit Test
Pick a random AI decision from last week. Can you show a regulator: (1) what data the AI used, (2) why it reached that conclusion, (3) whether a human reviewed it, and (4) how the affected person can challenge it? If you can't do all four, your responsible AI practice has gaps.

5. Build Feedback Loops

The best responsible AI practice treats every error as a learning opportunity. When an AI system produces a wrong or harmful output:
  • Capture it. Users need an easy way to flag problems. A single click, not a support ticket.
  • Classify it. Was it a data quality issue, a model limitation, an edge case, or a systemic bias?
  • Fix it. Update the data, adjust the prompt, add a guardrail, or retrain the model.
  • Verify the fix. Confirm the fix addresses the root cause without introducing new problems.
  • Share the learning. Document what went wrong and what was done. This builds institutional knowledge about AI risk.

The Business Case

Responsible AI isn't a cost centre. It's a competitive advantage.
Regulatory readiness. The EU AI Act is coming into force. Australia is developing AI regulation. New Zealand's Algorithm Charter sets expectations. Organisations with operational responsible AI practices are years ahead of those scrambling to comply.
User trust. Enterprise users adopt AI tools they trust. Trust comes from transparency, accuracy, and the ability to challenge AI decisions. Responsible AI practices build all three.
Client confidence. Increasingly, enterprise buyers ask about AI governance in procurement. "Show us your responsible AI framework" is becoming as common as "show us your security certifications."
I've reviewed responsible AI frameworks from organisations across the NZ/AU market. The daily practice is what prevents harm.
Dr Tania Wolfgramm
Chief Research Officer