An AI foundation is the shared infrastructure layer that every AI capability in your organisation builds on. Without it, each project starts from scratch. With it, each project is faster and cheaper than the last. This guide covers the architecture patterns, technology choices, and build sequence for your first foundation.
What an AI Foundation Actually Is
Strip away the marketing language and an AI foundation has four concrete layers:
- Data layer. How data moves from source systems to AI workloads
- Intelligence layer. Model hosting, orchestration, and the logic that makes AI useful
- Integration layer. How AI capabilities connect to business workflows
- Governance layer. Monitoring, access control, audit trails, and guardrails
Each layer is built once and shared across every AI project. That's where the compound advantage comes from.
60%
of enterprise AI project effort goes into data pipelines - the same pipelines, rebuilt for each project
Source: Gartner, Data Engineering for AI Survey 2024
Architecture Patterns
Pattern 1: Hub and Spoke
The most common pattern for enterprises starting their AI journey.
Hub: A central AI platform team owns the shared infrastructure: data pipelines, model hosting, governance tools, and API gateway.
Spokes: Business unit teams build AI capabilities on top of the shared platform. They own the business logic and domain models; the hub provides the plumbing.
When to use: Organisations with 3+ business units that need AI capabilities. The hub prevents each unit from building isolated infrastructure.
Trade-off: Requires a dedicated platform team (2-4 people initially). If the hub team becomes a bottleneck, spokes slow down.
Pattern 2: Composable Services
For organisations with mature engineering teams and a microservices culture.
Each AI capability is a self-contained service with a defined API. Services share common libraries and conventions but are independently deployable.
When to use: Engineering-led organisations that already have strong DevOps practices. The composable approach requires more engineering discipline but scales better.
Trade-off: Higher initial complexity. Requires strong API design skills and shared conventions.
Pattern 3: Managed Platform
For organisations that want AI capabilities without building AI infrastructure.
A managed platform (internal or from a trusted partner) provides the data layer, intelligence layer, and integration layer as a service. Business teams configure and extend rather than build.
When to use: Organisations where AI infrastructure isn't a core competency and engineering resources are limited.
Trade-off: Less flexibility. Dependency on the platform provider. Must ensure IP ownership is contractually clear.
Start Simple
Most enterprises should start with Hub and Spoke. It's the simplest pattern that delivers compound value. You can evolve to Composable Services as your maturity grows.
Technology Choices
Data Layer
The data layer is the most critical, and most underinvested, part of the foundation.
Key components:
- Ingestion: How data gets from source systems into the AI pipeline. Use event-driven ingestion (webhooks, message queues) over batch where possible.
- Storage: A structured data warehouse for analytics workloads, plus object storage for unstructured data (documents, images, audio).
- Transformation: Clean, normalise, and enrich data before it reaches models. dbt, Apache Spark, or simple Python pipelines depending on scale.
- Vector store: For retrieval-augmented generation (RAG) workloads, a vector database stores embeddings for semantic search. PostgreSQL with pgvector handles most enterprise workloads.
The most common mistake in AI infrastructure is underinvesting in the data layer. In practice, data quality determines 80% of AI outcomes.
John Li
Chief Technology Officer
Intelligence Layer
Key components:
- LLM access: API-based access to frontier models (OpenAI, Anthropic, Google). Don't self-host unless you have specific regulatory or latency requirements.
- Orchestration: A framework for chaining AI operations: prompt management, tool use, retrieval, and response generation. LangChain, Semantic Kernel, or custom orchestration depending on complexity.
- Fine-tuned models: For domain-specific tasks, fine-tuned models on your data outperform prompted general models. Start with few-shot prompting, graduate to fine-tuning when you have signal.
- Embedding pipeline: Automated processing of documents and data into vector embeddings for RAG.
Integration Layer
Key components:
- API gateway: A single entry point for all AI capabilities. Handles authentication, rate limiting, logging, and routing.
- Webhooks and events: AI capabilities publish events when they complete work. Downstream systems subscribe to relevant events.
- UI components: Reusable frontend components for AI interactions: chat interfaces, inline suggestions, review workflows.
Governance Layer
Key components:
- Access control: Who can use which AI capabilities, with what data.
- Audit trail: Every AI decision is logged: input, output, model version, confidence score.
- Monitoring: Model performance metrics, drift detection, cost tracking.
- Guardrails: Input/output validation, content filtering, and safety checks.
Build Sequence
Build in this order. Each phase delivers value while setting up the next.
Phase 1: Data + First Use Case (Weeks 1-6)
Build the data pipeline for your highest-value use case. This gives you a working data layer and a tangible deliverable.
Deliverables:
- Data ingestion from 1-2 source systems
- Storage infrastructure (database + object storage)
- One working AI capability (e.g., document analysis, customer insight)
- Basic monitoring and logging
Phase 2: Intelligence Layer + Second Use Case (Weeks 7-12)
Formalise the intelligence layer. Build the orchestration framework, add a second use case that shares infrastructure with the first.
Deliverables:
- LLM access with API gateway
- Orchestration framework
- Vector store and RAG pipeline
- Second AI capability, built 40-50% faster than the first
Phase 3: Governance + Scale (Weeks 13-18)
Add governance, expand access, and start enabling other teams.
Deliverables:
- Access control and audit trail
- Monitoring dashboard
- Documentation and onboarding for new teams
- Third AI capability, built by a different team using the foundation
3×
typical speed improvement from first to third AI project when built on a shared foundation
Source: RIVER Group, enterprise engagement data 2024
Common Mistakes
Over-Engineering the First Version
The first foundation doesn't need to handle every edge case. Build for your first 2-3 use cases. Evolve from there.
Avoid This
Don't spend 6 months building the "perfect" AI platform before delivering any business value. The foundation should emerge from real use cases, not precede them.
Ignoring the Data Layer
Spending 80% of budget on models and 20% on data is backwards. Reverse it. Good data with a simple model beats bad data with a sophisticated model every time.
Building Your Own LLM Infrastructure
Unless you have specific regulatory requirements that prevent API access, don't self-host LLMs. The cost, complexity, and operational overhead of running GPU infrastructure is not a differentiator. It's a distraction.
No Governance Until "Later"
Governance added after deployment is governance that's fighting existing patterns. Build it in from Phase 1, even if it's lightweight. Access control and audit logging cost almost nothing to add early and are painful to retrofit.
- How much does an AI foundation cost to build?
- For a typical NZ/AU enterprise, Phase 1 costs $40-80K and delivers a working AI capability plus shared infrastructure. The full three-phase build runs $100-250K depending on complexity. This is significantly less than building 3-4 isolated AI projects, which typically cost $80-150K each with no shared infrastructure.
- Do we need a dedicated AI platform team?
- For Phase 1, no. The team building the first use case builds the foundation alongside it. By Phase 3, you need 2-4 people maintaining the shared infrastructure. This team can be internal, external, or a blend, but someone needs to own the platform.
- Can we build an AI foundation on Azure/AWS/Google Cloud?
- Yes, and you should. Cloud providers offer the building blocks (managed databases, object storage, API gateways, ML services). Your foundation is the architecture and logic layer on top, not the infrastructure itself. Use managed services wherever possible and focus your engineering effort on what's unique to your organisation.
- How does this relate to existing data warehouses?
- Your existing data warehouse is part of the data layer, not a replacement for it. AI workloads need additional data flows (real-time ingestion, embedding pipelines, vector storage) that complement traditional analytics infrastructure. The AI foundation extends your data architecture, it doesn't replace it.

