2025 was the year NZ enterprise AI got real. Not the hype cycle kind of real. The messy, operational, learning-from-mistakes kind of real. We shipped more AI capabilities in production this year than the previous two combined, and we learned more than we expected. Here is what the year actually taught us.
What You Need to Know
- The gap between AI demo and AI production is wider than most organisations expect. Getting AI to work in a demo takes weeks. Getting it to work reliably in production takes months. The gap is not technical. It is operational: monitoring, evaluation, edge cases, integration, and change management.
- The AI talent problem is real but solvable. NZ does not have enough AI specialists, but it has plenty of smart technologists who can add AI capability with the right support and training.
- AI adoption is a people problem, not a technology problem. The organisations that succeeded invested as much in change management as in technology. The ones that struggled invested almost entirely in technology.
- Compound AI architectures work. The foundation-first approach, building shared infrastructure before building capabilities, delivers on its promise. Second and third capabilities are genuinely faster and cheaper than the first.
The Patterns
Pattern 1: Foundation First Pays Off
The biggest lesson of 2025 is that the compound AI thesis works in practice, not just in theory. Organisations that invested in a shared AI foundation (orchestration, retrieval, evaluation, monitoring) before building capabilities had a measurably better experience than those that built capabilities independently.
The first capability took longer. The investment in foundation alongside capability one added roughly 30% to the timeline. But capability two was 40% faster. Capability three was 60% faster. By capability four, new capabilities were being built in weeks rather than months.
Compound Effect: Foundation vs Independent Build
The organisations that skipped the foundation and built each capability independently had the opposite experience. Each capability was a standalone project. There was no shared learning, no shared infrastructure, and no compounding. By the third capability, they were spending more per capability, not less.
Pattern 2: The 80% Trap
AI systems that are 80% accurate feel impressive in a demo and fail in production. The gap between 80% and 95% accuracy is where almost all the real work lives, and it is work that does not look like AI work. It looks like data cleaning, edge case handling, retrieval optimisation, and prompt engineering.
We fell into this trap ourselves on an early engagement. The demo was impressive. The client was enthusiastic. We shipped to production and immediately discovered that the 20% of cases the AI got wrong were disproportionately the high-value, complex cases that mattered most. The system was accurate on easy cases and unreliable on hard ones.
The lesson: evaluate AI on the cases that matter, not on the average case.
Pattern 3: Change Management Is Half the Work
The technical work of building AI capabilities is roughly half the total effort. The other half is change management: training users, redesigning workflows, managing resistance, communicating honestly about capabilities and limitations, and supporting the transition.
Organisations that allocated budget and time for change management alongside technical delivery had dramatically higher adoption rates. Organisations that built the technology and expected adoption to follow naturally were disappointed.
The best AI system we built this year has the lowest model sophistication of anything we shipped. It struggled because nobody prepared the users.
Isaac Rolfe
Managing Director
Pattern 4: Evaluation Is Not Optional
Every organisation that shipped AI to production without a proper evaluation framework regretted it. Every one. Without evaluation, you do not know if the AI is working. You do not know if a model update broke something. You do not know if data drift is degrading performance. You find out from user complaints, weeks or months after the problem started.
The organisations that invested in evaluation from day one had a fundamentally different experience. They caught issues early. They improved continuously. They had confidence in their systems that evaluation-free organisations did not.
The Surprises
Enterprise AI Task Distribution by Model Size (2025)
Source: RIVER Group, enterprise engagement data, 2025
Surprise 1: Small Models Won More Often Than Expected
We started the year assuming most enterprise tasks would need the largest, most capable models. We finished the year knowing that 60-70% of enterprise AI tasks can be handled by smaller, cheaper models with minimal quality tradeoff. Model routing, using the right-sized model for each task, was the single highest-impact optimisation we applied across engagements.
Surprise 2: Integration Was Harder Than AI
The AI part of enterprise AI, the model calls, the prompts, the retrieval, was consistently the easiest part of each project. The hard part was integration: connecting AI to existing systems, handling data from legacy formats, navigating authentication across enterprise platforms, and fitting AI workflows into existing business processes.
The Model Context Protocol helped with some of this. But the fundamental challenge is that enterprise systems are complex, inconsistent, and poorly documented. That is not an AI problem. It is an enterprise problem that AI amplifies.
Surprise 3: The Pacific Opportunity Is Real
Our work exploring AI for Pacific contexts went from theoretical to practical this year. The opportunities in health, climate adaptation, and community services are genuine, and the communities we engaged with have a clarity of vision about what they want from AI that many enterprise clients lack.
The Pacific approach, community-led, values-driven, and designed for collective benefit, is not just culturally appropriate. It is a better design methodology. Some of our best design thinking this year came from Pacific engagements.
What We Would Do Differently
Start with evaluation. On every engagement, we wish we had invested in evaluation infrastructure earlier. It always feels like overhead at the start and always proves essential within months.
Invest more in integration architecture. We underestimated the integration effort consistently. Next year, our scoping will allocate significantly more time and budget to system integration.
Prioritise change management from day one. On some engagements, change management started after deployment. It should start at the beginning, running in parallel with technical delivery.
Be more aggressive with model routing. We were conservative about using smaller models, defaulting to larger models "just in case." The data shows this caution was unnecessary for most tasks.
Looking Forward
NZ enterprise AI in 2025 moved from experimental to operational. Not everywhere, and not perfectly, but meaningfully. The organisations that committed to production AI this year have capabilities that will compound through 2026 and beyond.
The gap between AI-adopting and AI-waiting organisations is widening. Not because the technology is dramatically better, but because the operational knowledge, the organisational learning, the data foundations, and the cultural readiness that AI-adopting organisations have built are not easily replicated by latecomers.
The lesson of 2025 is not that AI works. We knew that. The lesson is that making AI work in the real world, with real data, real users, and real constraints, is harder and more rewarding than any of us expected. The organisations that leaned into that difficulty are the ones positioned for what comes next.
