Choosing an AI implementation partner is one of the highest-stakes decisions an enterprise makes in its AI journey. Get it right and you accelerate capability. Get it wrong and you waste six months, burn budget, and - worst of all - lose organisational confidence in AI. This framework helps you get it right.
What You Need to Know
- AI partner selection is fundamentally different from traditional technology vendor selection. The evaluation criteria that work for SaaS, consulting, or systems integration don't adequately cover AI-specific capabilities.
- This framework was developed through our experience running enterprise discovery sprints across multiple NZ sectors. It reflects what actually predicts partnership success, not what looks good in an RFP.
- The 8 criteria are weighted. Not all matter equally. Technical depth and delivery evidence matter most. Brand recognition and firm size matter least.
- Use this framework to evaluate, not to eliminate. The goal is to understand a partner's strengths and gaps, then decide whether those gaps matter for your specific context.
The Framework
Criterion 1: Technical Depth (Weight: Critical)
Can they build it? Not "can they manage the project" or "can they advise on strategy" - can they actually build working AI systems?
How to evaluate:
- Ask to see architecture diagrams from previous engagements (anonymised if needed). Can they explain their technical choices and trade-offs?
- Discuss specific technical challenges. How do they handle retrieval quality? What's their approach to hallucination mitigation? How do they manage model updates?
- Ask about their infrastructure. Do they build data pipelines? Do they understand vector databases? Can they implement observability?
- Request a technical assessment of your specific use case. Not a proposal - a technical opinion on feasibility, approach, and challenges.
Red flags: Inability to discuss technical architecture beyond marketing terminology. Over-reliance on a single framework or platform. Technical team that's entirely junior.
68%
of enterprise AI project delays are attributed to partner technical capability gaps
Source: Gartner, AI Implementation Partner Assessment, Q3 2024
Criterion 2: Delivery Evidence (Weight: Critical)
Have they done this before? Not "have they worked in AI" - have they delivered working AI systems to production in enterprise environments?
How to evaluate:
- Request case studies with specific outcomes (not testimonials - outcomes). What was deployed? What value did it deliver? Is it still running?
- Ask about failure. What went wrong and how did they handle it? Partners who claim zero failures are either lying or haven't done enough work.
- Speak to references. Not the references they choose - references you identify through your network. Ask about what didn't go smoothly.
- Ask about their production track record. How many of their AI projects are running in production vs stopped at pilot?
Red flags: No production deployments. Case studies that focus on the pilot but don't mention production. References that describe the partner as "strategic" but not "hands-on."
Criterion 3: Domain Understanding (Weight: High)
Do they understand your industry? AI implementation isn't just technical work. It requires understanding the business context, regulatory environment, and operational reality of your sector.
How to evaluate:
- Have they worked in your sector before? If so, what did they learn?
- Can they discuss sector-specific challenges without prompting? Do they understand your regulatory environment, your data landscape, your competitive dynamics?
- Do they ask good questions about your business? A partner who jumps straight to technical solutions without understanding the business problem is dangerous.
Red flags: Generic proposals that could apply to any sector. Inability to discuss your regulatory environment. No questions about your business before proposing solutions.
Criterion 4: Methodology and Governance (Weight: High)
How do they work? AI projects need a methodology that handles uncertainty, iteration, and the unique governance requirements of AI systems.
How to evaluate:
- What's their delivery methodology? It should be iterative, not waterfall. It should include validation gates, not just milestone deliveries.
- How do they approach AI governance? Do they build governance into the delivery, or bolt it on at the end?
- What's their testing approach? How do they validate AI output quality? How do they handle edge cases?
- How do they manage scope when (not if) the project evolves?
Red flags: Rigid waterfall methodology. No mention of governance until asked. Testing approach limited to "the model works."
Criterion 5: Team Composition (Weight: High)
Who will actually do the work? Not the partner principals who attend the pitch. The people who'll be in your codebase and your data.
How to evaluate:
- Meet the delivery team. Not just the project lead - the engineers, data scientists, and designers who'll do the work.
- Assess the seniority mix. A team of all juniors won't navigate enterprise complexity. A team of all seniors won't be cost-effective.
- Ask about team stability. Will these people be on the project for its duration, or will they rotate off?
- Ask about their bench. If a key team member leaves, what's the plan?
Red flags: "We'll assign the team after contract signing." Delivery team with no enterprise experience. High reliance on subcontractors.
Criterion 6: Partnership Model (Weight: Medium)
How do they view the relationship? A good AI partner builds your capability, not just your system. The best partners work themselves out of a job.
How to evaluate:
- What's their knowledge transfer approach? How will your team learn from the engagement?
- Do they build for handover? Will your team be able to maintain and extend the system after the engagement?
- How do they handle intellectual property? Is the code yours? The models? The prompts?
- What's their ongoing support model? What happens after the initial engagement?
Red flags: Proprietary platforms that create lock-in. No knowledge transfer plan. Ongoing licensing for work built on your data.
Criterion 7: Cultural Fit (Weight: Medium)
This is subjective but matters. AI projects require close collaboration between the partner and your team. If the working relationship is difficult, the project suffers.
How to evaluate:
- How do they communicate? Transparently, with bad news delivered early? Or optimistically, with problems hidden until they're critical?
- Do they challenge your assumptions? A good partner tells you when your approach is wrong. A poor one tells you what you want to hear.
- How do they handle disagreement? Constructively, with evidence? Or defensively?
- Do they understand NZ business culture? For NZ engagements, cultural alignment matters more than in larger markets.
Red flags: Reluctance to share bad news. Agreement with everything you say. Cultural friction in early interactions.
Criterion 8: Commercial Alignment (Weight: Medium)
Are the commercial incentives aligned? The best technical partner with misaligned commercial incentives will still produce poor outcomes.
How to evaluate:
- What's the pricing model? Time-and-materials provides flexibility but less cost certainty. Fixed-price provides certainty but incentivises scope limitation. Outcome-based aligns incentives but requires clear metrics.
- How do they handle change? AI projects inevitably evolve. The commercial model should accommodate this without adversarial negotiations.
- What are the exit terms? If the partnership isn't working, can you exit without catastrophic cost?
- Are there ongoing commercial interests? Licensing, hosting, support agreements - are these reasonable and transparent?
Red flags: Aggressive upselling during the evaluation. Pricing that doesn't scale predictably. Exit terms that create lock-in.
The Evaluation Process
Step 1: Long-List (Criteria 1-2 only)
Filter your initial list on technical depth and delivery evidence. These are non-negotiable. A partner without these capabilities is not a viable AI implementation partner regardless of their other strengths.
Step 2: Short-List (Criteria 3-5)
Evaluate remaining candidates on domain understanding, methodology, and team composition. These are differentiators among technically capable partners.
Step 3: Final Selection (Criteria 6-8)
Among finalists, evaluate partnership model, cultural fit, and commercial alignment. These determine whether a technically strong partner will be a good working partner.
Step 4: Proof-of-Capability
Before committing to a large engagement, run a small, paid proof-of-capability. Give the partner a defined problem, real (anonymised) data, and 2-4 weeks. Evaluate the output and the experience of working together.
This step costs money. It saves far more.
Actionable Takeaways
- Weight technical depth and evidence highest. Everything else can be managed. Technical capability can't be faked.
- Meet the delivery team, not just the sales team. The people in the pitch aren't the people doing the work. Insist on meeting the actual team.
- Run a paid proof-of-capability. Don't commit to a six-figure engagement based on a proposal. Test the partnership first.
- Check for lock-in. Proprietary platforms, opaque licensing, restrictive IP terms - these create dependency that outlasts the engagement.
- Trust your instincts on cultural fit. If something feels off in the evaluation, it'll be worse during delivery.
