Skip to main content

The AI Foundation at Work

What RIVER Group's AI foundation looks like in production. Sovereign, compound, enterprise-grade. The thesis, delivered.
24 March 2026·9 min read
Isaac Rolfe
Isaac Rolfe
Managing Director
Mak Khan
Mak Khan
Chief AI Officer
John Li
John Li
Chief Technology Officer
We've spent two years talking about compound AI value, shared foundations, and platform-based delivery. We've written the theory. We've published the frameworks. Now the foundation is live, in production, serving enterprise clients. Here's what it actually looks like.

From Theory to Production

The AI foundation started as an architectural thesis: build shared infrastructure once, deploy capabilities on it repeatedly, and watch the marginal cost of each new capability decrease while the marginal value increases. The compound advantage, we called it.
The thesis was correct. But the journey from thesis to production taught us things that the thesis didn't predict. Infrastructure that looked elegant on a whiteboard needed rearchitecting under production load. Governance frameworks that seemed comprehensive had gaps that real-world use cases exposed. Monitoring that seemed sufficient proved inadequate when multiple capabilities shared the same foundation.
What's live now is the result of building, breaking, rebuilding, and refining. It's battle-tested, not theoretical.

The Architecture

The Intelligence Layer

The intelligence layer handles model management, orchestration, and the routing logic that directs queries to the right model for the task.
What's in production:
  • Multi-model orchestration across four model providers. Queries are routed based on task complexity, cost sensitivity, latency requirements, and compliance constraints.
  • Automatic failover. If a model provider experiences an outage or degradation, queries reroute to alternatives within seconds. No client-facing impact.
  • Cost optimisation. Simple tasks go to efficient models. Complex tasks go to capable models. The routing saves 30-40% on inference costs compared to routing everything to the most capable model.
  • Prompt management. Versioned, tested, and deployed like code. Every prompt change goes through evaluation before production.
What we learned building it:
  • Model routing is more art than science. Performance benchmarks don't always predict production behaviour. We run continuous evaluation against real-world queries, not just test sets.
  • Failover needs to be tested regularly. A failover path that hasn't been exercised will fail when you need it.
  • Cost optimisation has quality trade-offs. The cheapest model that passes evaluation isn't always the best choice. We optimise for quality-per-dollar, not minimum dollar.

The Knowledge Layer

The knowledge layer manages enterprise knowledge: documents, data, relationships, and context.
What's in production:
  • Hybrid retrieval combining vector search, keyword search, and knowledge graph queries. The retrieval strategy adapts based on the query type and the knowledge domain.
  • Document processing pipeline that handles PDFs, Word documents, emails, spreadsheets, images, and structured data. Ingestion, chunking, embedding, and indexing are automated with quality checks at each stage.
  • Entity resolution across document types. When three documents mention the same entity differently, the knowledge layer resolves them to a single entity with merged context.
  • Context management for multi-step workflows. When an agent processes a complex task across multiple steps, the context layer maintains coherence and relevance.
What we learned building it:
  • Chunking strategy matters more than embedding model choice. The way you break documents into pieces has a larger impact on retrieval quality than which embedding model you use.
  • Knowledge graphs and vector search are complementary, not competitive. Vector search finds semantically similar content. Knowledge graphs find structurally related content. You need both.
  • Document processing is never finished. New document types, new formats, new edge cases. The pipeline needs continuous attention.

The Governance Layer

The governance layer enforces policies, maintains audit trails, and monitors for compliance.
What's in production:
  • Policy-as-code enforcement. Guardrails are defined in configuration, enforced by the platform, and logged for audit. They can't be bypassed by individual capabilities.
  • Complete audit trails. Every query, every retrieval, every model call, every output. Traceable from input to output and back. Essential for regulated industries.
  • Content safety filters. Output validation that catches hallucinations, inappropriate content, data leakage, and off-topic responses before they reach the user.
  • Access control. Role-based and capability-based access controls that determine which users can access which capabilities with which data.
What we learned building it:
  • Governance is not overhead. It's the thing that makes enterprise clients say yes. Every enterprise client we work with asks about governance in the first meeting. The foundation's governance layer is often the deciding factor.
  • Audit trails need to be queryable, not just stored. When a compliance officer asks "show me every decision the AI made about this customer," the answer needs to be fast.
  • Guardrails need regular testing. Like security controls, guardrails that aren't tested regularly degrade in effectiveness.

The Operations Layer

The operations layer monitors everything above and keeps it running.
What's in production:
  • Real-time quality monitoring per capability. Accuracy, relevance, completeness, and confidence tracked continuously with alerting on degradation.
  • Cost tracking per query, per capability, per client. Granular enough for billing, strategic enough for optimisation.
  • Performance monitoring. Latency, throughput, error rates, and availability. Standard DevOps practices applied to AI-specific metrics.
  • Drift detection. Automated detection of distribution shifts in inputs, outputs, and model behaviour. Alerts when the system's behaviour changes beyond expected ranges.
What we learned building it:
  • AI monitoring is not software monitoring. Traditional APM tools catch crashes and latency spikes. They don't catch quality degradation, which is the more common and more dangerous failure mode.
  • Evaluation is ongoing, not pre-deployment. The evaluation pipeline that tests the system before deployment continues running after deployment, catching degradation in real time.
  • Operations costs are real. The monitoring, evaluation, and maintenance infrastructure is a meaningful portion of the total platform cost. Budget for it from the start.
99.7%
platform availability across all production deployments in the last 90 days
Source: RIVER Group platform monitoring, Q1 2026

The Compound Effect, Measured

The thesis predicted compound returns. Here's what we've measured:
Deployment speed. The first capability on the platform took 10 weeks to deploy. The most recent took 2.5 weeks. Same quality standards. Same governance requirements. The platform does more of the work each time.
Cost efficiency. The per-capability cost of deployment has decreased by approximately 65% from the first to the most recent. Infrastructure is shared. Patterns are reusable. Integration is standardised.
Quality improvement. Each deployment improves the platform. Better retrieval strategies discovered in one engagement are available to all. Governance patterns proven in one domain apply to others. The platform gets better with use.
Operational efficiency. Monitoring and maintenance effort per capability decreases as the platform matures. Shared infrastructure means shared operations. Five capabilities don't require five times the operational effort.

Sovereignty

The foundation is designed for data sovereignty from the architecture level. This isn't a compliance checkbox. It's a structural property:
  • Data residency. Client data stays in NZ (or wherever the client's governance requires). No data crosses jurisdictional boundaries without explicit authorisation.
  • Model hosting. For clients with strict sovereignty requirements, models run on infrastructure within NZ. For others, we use cloud inference with data residency controls.
  • Indigenous data governance. The platform supports indigenous data governance frameworks. Data belonging to Māori and Pacific communities can be governed according to community-defined principles, not just regulatory requirements.
Sovereignty matters for NZ enterprise. Government clients require it. Health clients require it. Financial services clients prefer it. The foundation provides it as a standard capability, not an expensive add-on.

What This Means

The AI foundation is live. It works. It compounds. It's governed. It's sovereign. And it's available to NZ enterprises that want platform-grade AI without building a platform from scratch.
This is what RIVER Group was built to deliver. Not AI projects. Not AI consultancy. A compound AI platform that makes every capability faster, cheaper, and better than the last.
The thesis is proven. The platform is in production. The compound effect is measured and real.

Two years ago, the compound advantage was an idea. One year ago, it was an architecture. Today, it's an operating platform serving enterprise clients across health, financial services, and government. Each deployment makes it stronger. That's not marketing. That's the maths of compound infrastructure.