In 2023, we published The Enterprise AI Stack Explained, a guide to the technology layers underpinning enterprise AI. Two and a half years later, the stack has transformed. Single-model RAG architectures have given way to multi-model orchestration. Agentic workflows have moved from research papers to production. Knowledge graphs are standard. And governance is no longer a process - it's code. Here's the updated reference architecture.
What You Need to Know
- The enterprise AI stack has five layers in 2026: Infrastructure, Data, Intelligence, Orchestration, and Governance. Each is more mature and more interconnected than in 2023.
- Multi-model orchestration is the new default. Enterprises are using 3-7 models in production, routing queries to the right model based on task, cost, and compliance requirements. Single-model deployments are a legacy pattern.
- Agentic AI is in early production. AI systems that take actions, not just answer questions, are live in claims processing, compliance monitoring, and customer service. The architectural implications are significant.
- Knowledge graphs have become standard alongside vector search. The combination of vector search (semantic retrieval) and knowledge graphs (structured relationships) delivers significantly better accuracy for enterprise queries.
- Governance-as-code is the most important architectural shift. Governance is no longer a review process. It's embedded in the infrastructure through policy engines, automated compliance checks, and continuous monitoring.
5.2
average number of AI models in production per enterprise in 2026
Source: Gartner, AI Platform Engineering Survey, 2025
The Five-Layer Architecture
Layer 1: Infrastructure
The compute, storage, and networking foundation that everything runs on.
What's changed since 2023:
- GPU access is commoditised. Cloud GPU availability has stabilised. Reserved capacity and spot pricing make compute budgets predictable. On-premises GPU deployment is realistic for enterprises with sovereignty requirements.
- Inference is now separate from training. Most enterprises don't train models. They run inference on foundation models. This changes the infrastructure profile: less GPU-intensive, more latency-sensitive, and more cost-conscious.
- Edge inference is emerging. For low-latency or sovereignty-sensitive use cases, enterprises are running smaller models on-premises or at the edge. The infrastructure layer now spans cloud and local deployment.
Key decisions:
- Cloud provider selection (Azure, AWS, GCP, or multi-cloud)
- Sovereignty requirements (data residency, compute location)
- Cost management strategy (reserved vs on-demand, model routing by cost)
Layer 2: Data
The data platform that feeds AI systems: storage, pipelines, embedding, and retrieval.
What's changed since 2023:
- Hybrid retrieval is standard. The best enterprise architectures combine vector search for semantic retrieval, keyword search for precision, and knowledge graphs for structured relationships. Single-retrieval approaches are insufficient for production enterprise use.
- Embedding pipelines are mature. Chunking strategies, embedding model selection, and incremental re-embedding are well-understood problems with established patterns. The art is in tuning: which chunks, which model, which parameters for your specific content.
- Data lineage is required. For governance and audit, every piece of data that influences an AI output must be traceable, from source document to embedding to retrieval to generation. This is architecture, not afterthought.
The data layer components:
| Component | Purpose | Options |
|---|---|---|
| Document ingestion | Parse, chunk, embed documents | Custom pipeline, Unstructured.io, LlamaIndex |
| Vector store | Semantic retrieval | pgvector, Pinecone, Weaviate, Qdrant |
| Knowledge graph | Structured relationships | Neo4j, Amazon Neptune, PostgreSQL |
| Search index | Keyword/hybrid search | Elasticsearch, Typesense, Meilisearch |
| Data pipeline | ETL/ELT for AI data | Custom, Airflow, Dagster |
| Feature store | Structured features for models | Feast, Tecton, custom |
Key decisions:
- Knowledge graph scope (start narrow, expand by domain)
- Embedding model selection (quality vs cost vs speed trade-off)
- Data pipeline cadence (real-time vs batch, by data source)
Layer 3: Intelligence
The AI models that understand, reason, and generate.
What's changed since 2023:
- Multi-model is the norm. Enterprises route queries to different models based on task complexity, cost, latency, and compliance. A customer FAQ uses a small, fast model. A contract review uses a large, capable model. Sensitive data stays on a local model.
- Model capabilities have converged. The gap between frontier models (Claude, GPT-4) and the tier below has narrowed. For many enterprise use cases, the second-tier model at 30% of the cost delivers equivalent results.
- Fine-tuning is selective. Most enterprises use foundation models with retrieval augmentation rather than fine-tuning. Fine-tuning is reserved for domain-specific tasks where retrieval alone isn't sufficient: classification, extraction, and domain-specific language.
The multi-model architecture:
+------------------+
| Model Router |
+---------+--------+
|
+----------------+----------------+
v v v
+-----------+ +-----------+ +-----------+
| Frontier | | Mid-tier | | Local |
| (Complex) | | (General) | |(Sovereign)|
+-----------+ +-----------+ +-----------+
Model routing criteria:
| Factor | Frontier Model | Mid-Tier Model | Local Model |
|---|---|---|---|
| Task complexity | High (multi-step reasoning) | Medium (standard Q&A) | Low (classification, extraction) |
| Cost per query | High | Medium | Low (infrastructure only) |
| Latency | Higher | Lower | Lowest |
| Data sovereignty | Cloud-based | Cloud-based | On-premises |
| Compliance | Provider terms apply | Provider terms apply | Full control |
Key decisions:
- Primary model provider (with fallback strategy)
- Routing logic (rule-based vs learned)
- Sovereignty model selection and deployment
Layer 4: Orchestration
The coordination layer that connects data, models, and actions into coherent workflows.
What's changed since 2023:
This is the layer that barely existed in 2023 and is now the most architecturally significant. Orchestration handles:
- RAG pipeline management. Query understanding, retrieval strategy selection, context assembly, prompt construction, response generation, and post-processing, all coordinated as a pipeline with error handling and fallback behaviour.
- Agentic workflows. Agentic AI systems that plan, execute, and verify multi-step tasks. A claims processing agent that reads a claim, extracts data, checks policy, calculates liability, and drafts a response, with human approval at key decision points.
- Tool orchestration. AI systems that use external tools (databases, APIs, calculators, search engines) need an orchestration layer that manages tool selection, parameter handling, error recovery, and security.
- Conversation management. Multi-turn interactions with context management, memory, and state tracking across sessions.
Agentic architecture patterns:
The production-ready pattern for agentic AI in 2026 is constrained autonomy:
- Plan. The agent analyses the task and proposes a plan
- Verify. The plan is checked against policy rules and constraints
- Execute. The agent executes approved steps, one at a time
- Checkpoint. High-risk actions require human approval before proceeding
- Record. Every action is logged for audit and improvement
This is explicitly not "autonomous AI." It's AI with bounded authority, explicit constraints, and human oversight at decision points. Enterprises that skip the constraint layer in pursuit of full autonomy are building risk, not capability.
Key decisions:
- Orchestration framework (LangGraph, custom, platform-native)
- Human-in-the-loop boundaries (which decisions require approval)
- Error handling strategy (retry, fallback, escalate)
- State management approach (stateless vs persistent agents)
Layer 5: Governance
The policy, compliance, and monitoring layer that ensures AI operates within bounds.
What's changed since 2023:
Governance has evolved from a process (reviews, approvals, sign-offs) to infrastructure (policy engines, automated checks, continuous monitoring). This is the single most important architectural shift in the enterprise AI stack.
Governance-as-code components:
| Component | Purpose | Implementation |
|---|---|---|
| Policy engine | Define and enforce AI usage policies | OPA, Cedar, custom rules engine |
| Access control | Model-level and data-level permissions | RBAC/ABAC integrated with IdP |
| Input filtering | Block injection attacks, enforce content policies | Pipeline middleware |
| Output filtering | Remove PII, flag hallucination, enforce format | Post-processing pipeline |
| Audit logging | Record every inference for compliance and review | Structured logging, immutable store |
| Monitoring | Detect drift, anomalies, and degradation | Dashboards, alerting, automated checks |
| Bias detection | Continuous testing for demographic bias | Statistical monitoring, red-team testing |
The governance-as-code principle: Every governance rule should be expressed as code that runs automatically, not as a document that requires manual review. Data classification checks, access control verification, output filtering, and audit logging all happen in the pipeline, not in a committee meeting.
This doesn't eliminate human governance. It automates the routine checks so human reviewers focus on novel risks, policy evolution, and strategic oversight.
Key decisions:
- Policy engine selection (custom vs off-the-shelf)
- Monitoring granularity (per-query vs sampled)
- Audit retention and access policies
- Incident detection and response automation
Start With Layer 2 and Layer 5
If you're building your enterprise AI stack from scratch, invest first in the Data layer (get retrieval right) and the Governance layer (get controls right). The Intelligence and Orchestration layers are easier to swap and evolve. Data and Governance are foundational. Getting them wrong is expensive to fix.
The Reference Architecture Diagram
┌─────────────────────────────────────────────────────────┐
│ GOVERNANCE (Layer 5) │
│ Policy Engine │ Access Control │ Monitoring │ Audit Log │
├─────────────────────────────────────────────────────────┤
│ ORCHESTRATION (Layer 4) │
│ RAG Pipeline │ Agentic Workflows │ Tool Use │ Memory │
├─────────────────────────────────────────────────────────┤
│ INTELLIGENCE (Layer 3) │
│ Frontier Models │ Mid-Tier Models │ Local Models │
│ Model Router │
├─────────────────────────────────────────────────────────┤
│ DATA (Layer 2) │
│ Vector Store │ Knowledge Graph │ Search Index │ Pipeline │
├─────────────────────────────────────────────────────────┤
│ INFRASTRUCTURE (Layer 1) │
│ Cloud Compute │ Storage │ Network │ Edge / On-Premises │
└─────────────────────────────────────────────────────────┘
Governance spans all layers. It's drawn at the top but its controls are embedded throughout the stack.
What Changed From 2023 to 2026
| Dimension | 2023 | 2026 |
|---|---|---|
| Models | Single model (usually GPT-4) | Multi-model with routing |
| Retrieval | Basic RAG with vector search | Hybrid retrieval + knowledge graphs |
| Orchestration | Simple prompt chains | Agentic workflows with tool use |
| Governance | Manual review process | Governance-as-code, automated |
| Infrastructure | Cloud-only | Cloud + edge + on-premises |
| Integration | API-based, simple | Deep system integration, bidirectional |
| Maturity | Experimental | Production-grade |
Average AI Models in Production Per Enterprise
Source: Gartner, AI Platform Engineering Survey, 2025
The stack hasn't just grown. It's fundamentally restructured. The 2023 stack was built around a single model answering questions. The 2026 stack is built around an orchestration layer coordinating multiple models, data sources, and actions within a governance framework. The model is no longer the centre. The orchestration and governance layers are.
Building Your Stack
For Organisations Starting Fresh
- Layer 1: Choose a cloud provider. Don't over-invest in infrastructure before you know your workload profile.
- Layer 2: Build a data pipeline with hybrid retrieval (vector + keyword). Add knowledge graph when accuracy demands it.
- Layer 3: Start with one frontier model. Add routing when you have cost or sovereignty reasons for multiple models.
- Layer 4: Start with simple RAG. Evolve to agentic workflows as your use cases demand actions, not just answers.
- Layer 5: Build governance-as-code from day one. Audit logging, access control, and input/output filtering are non-negotiable.
For Organisations Evolving From 2023-Era Stacks
- Add knowledge graphs alongside your existing vector search. Start with one domain.
- Implement model routing to optimise cost and compliance. Most queries don't need your most expensive model.
- Upgrade governance from process to code. Automate the checks your team currently does manually.
- Pilot agentic workflows in one bounded domain. Constrained autonomy with human checkpoints.
- Add data lineage if you don't have it. Every AI output should be traceable to its source data.
- Do we need all five layers from day one?
- You need all five layers represented, but not all fully built. Layer 1 can be a single cloud account. Layer 2 can be a vector store and a simple pipeline. Layer 3 can be one model. Layer 4 can be a basic RAG pipeline. Layer 5 must include audit logging and access control from day one. Governance is the one layer you can't add later without major rework.
- How do we choose between AI platform vendors?
- Evaluate on three criteria: data sovereignty (where your data goes), portability (can you move away), and integration depth (how well it connects to your existing systems). Avoid platforms that lock you into a single model provider. Multi-model flexibility is essential in a fast-moving market.
- Is agentic AI ready for enterprise production?
- For bounded, well-defined workflows with human oversight, yes. Claims processing, compliance monitoring, and document review are live in production today. For unbounded, autonomous decision-making, no. The constrained autonomy pattern (plan, verify, execute, checkpoint, record) is the only production-safe approach. Any vendor telling you agentic AI can operate autonomously in enterprise is selling you risk.

