How to Evaluate an AI Vendor: A Procurement Checklist for Enterprise

Procuring enterprise AI is nothing like buying software. The evaluation criteria that work for SaaS (feature checklists, pricing tiers, user reviews) fall apart when you're evaluating AI capabilities. Here's the framework we've developed across dozens of enterprise evaluations in New Zealand and Australia.

What You Need to Know

41% of enterprise AI procurements result in vendor replacement within 24 months. Traditional software procurement criteria don't work for AI
Evaluate across five dimensions: technical capability, IP/data ownership, knowledge transfer, NZ/AU regulatory alignment, and commercial model
The most important contract clause isn't what they'll build - it's what you own when the relationship ends
Always run a paid proof of value (not a free POC) with 1-2 finalists before signing
Score vendors against our 30-point checklist. Below 50 means look elsewhere

41%

of enterprise AI procurements result in vendor replacement within 24 months

Source: Gartner, AI Vendor Management Survey 2024

78%

of NZ enterprises cite data sovereignty as a top AI procurement concern

Source: NZTech, Digital Nation Survey 2024

Why Traditional Procurement Fails for AI

Traditional software procurement evaluates a finished product. You can demo it, test it, compare features. AI procurement is different because you're evaluating a capability that will be built or configured specifically for your context.

The vendor who demos well isn't necessarily the vendor who delivers well. The features that look impressive in a sales presentation may not transfer to your data, your workflows, or your regulatory environment.

The Five Evaluation Dimensions

1. Technical Capability

This is where most evaluations start, and often where they get stuck. Technical capability matters, but it's only one of five dimensions.

What to assess:

Can the vendor work with your existing data infrastructure?
Do they support the model architectures relevant to your use case?
What's their approach to model monitoring and drift detection?
How do they handle data privacy and on-premise deployment?

Red Flag

Any vendor who leads with "we use GPT-4" as a differentiator is selling commodity access, not capability. The model is the least differentiated part of an AI solution.

2. IP and Data Ownership

This is the dimension most enterprises evaluate too late, after the contract is signed.

Critical questions:

Who owns the fine-tuned models at contract end?
Where is training data stored, and who can access it?
Can you export models and pipelines in standard formats?
What happens to your data if the vendor is acquired or ceases operations?

The most important clause in an AI vendor contract isn't about what they'll build. It's about what you keep when the relationship ends.

Dr Tania Wolfgramm

Chief Research Officer

3. Knowledge Transfer

An AI vendor should make themselves progressively less necessary, not more necessary.

What to assess:

Does the engagement include structured knowledge transfer?
Will your team be able to maintain and extend the solution independently?
Is there documentation, training, and ongoing capability building?
What's the plan for reducing vendor dependency over 12-24 months?

4. NZ/AU Regulatory Alignment

For New Zealand and Australian enterprises, regulatory context matters. The Privacy Act 2020, the emerging AI governance frameworks, sector-specific regulations: your vendor needs to understand this market.

What to assess:

Does the vendor understand NZ Privacy Act 2020 requirements for AI?
Can they demonstrate compliance with relevant sector regulations?
How do they handle data sovereignty, especially for government and health?
Do they have experience with the NZ/AU regulatory environment?

5. Commercial Model Alignment

How the vendor charges directly shapes what they incentivise.

What to assess:

Is the pricing model aligned with your success? (Value-based vs hourly)
Are there exit costs or lock-in mechanisms?
What's the total cost of ownership over 3 years, including internal resources?
How does pricing scale with usage?

The 30-Point Checklist

Use this checklist during your evaluation process. Score each item 0-3 (0 = not addressed, 3 = excellent).

Technical (Points 1-8)

Works with existing data infrastructure (APIs, databases, formats)
Supports required model architectures and deployment patterns
Provides model monitoring and drift detection
Offers on-premise or sovereign cloud deployment options
Demonstrates performance on data similar to yours (not just benchmarks)
Has a clear MLOps and CI/CD approach
Supports model versioning and rollback
Can integrate with existing security and identity infrastructure

Ownership (Points 9-14)

Client retains ownership of all fine-tuned models
Training data remains under client control
Models exportable in standard formats (ONNX, etc.)
Clear IP assignment clauses in contract
Data deletion guarantees at contract end
No vendor lock-in through proprietary formats

Knowledge Transfer (Points 15-20)

Structured knowledge transfer programme included
Documentation provided for all custom components
Client team trained to maintain and extend solution
Reducing vendor dependency is an explicit goal
Internal capability assessment at engagement start and end
Ongoing support model that decreases over time

Regulatory (Points 21-25)

Understands NZ Privacy Act 2020 AI implications
Experience with relevant sector regulations
Data sovereignty options for NZ/AU
Bias testing and fairness assessment methodology
Audit trail for AI decision-making (explainability)

Commercial (Points 26-30)

Pricing aligned with value delivery, not hours
Clear exit terms with no punitive costs
Total cost of ownership transparent and documented
Scaling costs predictable and reasonable
Performance guarantees with measurable SLAs

Scoring Guide

90+ (out of 90): Exceptional. Rare in the current market. 70-89: Strong candidate. Negotiate on gaps. 50-69: Significant gaps. Proceed with caution. Below 50: Look elsewhere.

Running the Evaluation

Phase 1: Long List (2 weeks)

Desktop research. Apply the checklist at a high level based on public information, case studies, and initial conversations. Reduce to 3-5 vendors.

Phase 2: Deep Dive (4 weeks)

Structured evaluation against all 30 points. Request demos on your data (not theirs). Check references from similar-sized organisations in your sector.

Phase 3: Proof of Value (4-6 weeks)

Short paid engagement with 1-2 finalists. Real data, real use case, real constraints. This is where vendor capability shows up or doesn't.

What Good Looks Like

The best AI vendor relationships we've seen share these characteristics:

Declining dependency curve - the vendor's involvement decreases over time as internal capability grows
Shared IP model - the client owns the differentiated IP, the vendor retains their reusable frameworks
Honest scoping - the vendor pushes back on scope that won't deliver value, rather than accepting everything
NZ/AU context - the vendor understands the market, the regulations, and the talent pool

How long should an AI vendor evaluation take?: Plan for 10-12 weeks end-to-end: 2 weeks for long-listing, 4 weeks for deep evaluation, and 4-6 weeks for proof of value. Rushing this process is the most common procurement mistake. It leads to vendor regret within 12 months.
Should we require a proof of concept before signing?: Yes, always. But make it a paid proof of value, not a free proof of concept. Free POCs incentivise the vendor to demo well rather than solve well. A paid engagement with clear success criteria gives you a real signal.
What's the biggest red flag in AI vendor evaluation?: A vendor who can't clearly articulate what you'll own at the end of the engagement. If the conversation about IP ownership is vague or evasive, the contract terms will be worse.