The Enterprise AI Stack in 2026: The Reference Architecture

In 2023, we published The Enterprise AI Stack Explained, a guide to the technology layers underpinning enterprise AI. Two and a half years later, the stack has transformed. Single-model RAG architectures have given way to multi-model orchestration. Agentic workflows have moved from research papers to production. Knowledge graphs are standard. And governance is no longer a process - it's code. Here's the updated reference architecture.

What You Need to Know

The enterprise AI stack has five layers in 2026: Infrastructure, Data, Intelligence, Orchestration, and Governance. Each is more mature and more interconnected than in 2023.
Multi-model orchestration is the new default. Enterprises are using 3-7 models in production, routing queries to the right model based on task, cost, and compliance requirements. Single-model deployments are a legacy pattern.
Agentic AI is in early production. AI systems that take actions, not just answer questions, are live in claims processing, compliance monitoring, and customer service. The architectural implications are significant.
Knowledge graphs have become standard alongside vector search. The combination of vector search (semantic retrieval) and knowledge graphs (structured relationships) delivers significantly better accuracy for enterprise queries.
Governance-as-code is the most important architectural shift. Governance is no longer a review process. It's embedded in the infrastructure through policy engines, automated compliance checks, and continuous monitoring.

5.2

average number of AI models in production per enterprise in 2026

Source: Gartner, AI Platform Engineering Survey, 2025

The Five-Layer Architecture

Layer 1: Infrastructure

The compute, storage, and networking foundation that everything runs on.

What's changed since 2023:

GPU access is commoditised. Cloud GPU availability has stabilised. Reserved capacity and spot pricing make compute budgets predictable. On-premises GPU deployment is realistic for enterprises with sovereignty requirements.
Inference is now separate from training. Most enterprises don't train models. They run inference on foundation models. This changes the infrastructure profile: less GPU-intensive, more latency-sensitive, and more cost-conscious.
Edge inference is emerging. For low-latency or sovereignty-sensitive use cases, enterprises are running smaller models on-premises or at the edge. The infrastructure layer now spans cloud and local deployment.

Key decisions:

Cloud provider selection (Azure, AWS, GCP, or multi-cloud)
Sovereignty requirements (data residency, compute location)
Cost management strategy (reserved vs on-demand, model routing by cost)

Layer 2: Data

The data platform that feeds AI systems: storage, pipelines, embedding, and retrieval.

What's changed since 2023:

Hybrid retrieval is standard. The best enterprise architectures combine vector search for semantic retrieval, keyword search for precision, and knowledge graphs for structured relationships. Single-retrieval approaches are insufficient for production enterprise use.
Embedding pipelines are mature. Chunking strategies, embedding model selection, and incremental re-embedding are well-understood problems with established patterns. The art is in tuning: which chunks, which model, which parameters for your specific content.
Data lineage is required. For governance and audit, every piece of data that influences an AI output must be traceable, from source document to embedding to retrieval to generation. This is architecture, not afterthought.

The data layer components:

Component	Purpose	Options
Document ingestion	Parse, chunk, embed documents	Custom pipeline, Unstructured.io, LlamaIndex
Vector store	Semantic retrieval	pgvector, Pinecone, Weaviate, Qdrant
Knowledge graph	Structured relationships	Neo4j, Amazon Neptune, PostgreSQL
Search index	Keyword/hybrid search	Elasticsearch, Typesense, Meilisearch
Data pipeline	ETL/ELT for AI data	Custom, Airflow, Dagster
Feature store	Structured features for models	Feast, Tecton, custom

Key decisions:

Knowledge graph scope (start narrow, expand by domain)
Embedding model selection (quality vs cost vs speed trade-off)
Data pipeline cadence (real-time vs batch, by data source)

Layer 3: Intelligence

The AI models that understand, reason, and generate.

What's changed since 2023:

Multi-model is the norm. Enterprises route queries to different models based on task complexity, cost, latency, and compliance. A customer FAQ uses a small, fast model. A contract review uses a large, capable model. Sensitive data stays on a local model.
Model capabilities have converged. The gap between frontier models (Claude, GPT-4) and the tier below has narrowed. For many enterprise use cases, the second-tier model at 30% of the cost delivers equivalent results.
Fine-tuning is selective. Most enterprises use foundation models with retrieval augmentation rather than fine-tuning. Fine-tuning is reserved for domain-specific tasks where retrieval alone isn't sufficient: classification, extraction, and domain-specific language.

The multi-model architecture:

                +------------------+
                |   Model Router   |
                +---------+--------+
                          |
         +----------------+----------------+
         v                v                v
   +-----------+    +-----------+    +-----------+
   |  Frontier |    |  Mid-tier |    |   Local   |
   | (Complex) |    | (General) |    |(Sovereign)|
   +-----------+    +-----------+    +-----------+

Model routing criteria:

Factor	Frontier Model	Mid-Tier Model	Local Model
Task complexity	High (multi-step reasoning)	Medium (standard Q&A)	Low (classification, extraction)
Cost per query	High	Medium	Low (infrastructure only)
Latency	Higher	Lower	Lowest
Data sovereignty	Cloud-based	Cloud-based	On-premises
Compliance	Provider terms apply	Provider terms apply	Full control

Key decisions:

Primary model provider (with fallback strategy)
Routing logic (rule-based vs learned)
Sovereignty model selection and deployment

Layer 4: Orchestration

The coordination layer that connects data, models, and actions into coherent workflows.

What's changed since 2023:

This is the layer that barely existed in 2023 and is now the most architecturally significant. Orchestration handles:

RAG pipeline management. Query understanding, retrieval strategy selection, context assembly, prompt construction, response generation, and post-processing, all coordinated as a pipeline with error handling and fallback behaviour.
Agentic workflows. Agentic AI systems that plan, execute, and verify multi-step tasks. A claims processing agent that reads a claim, extracts data, checks policy, calculates liability, and drafts a response, with human approval at key decision points.
Tool orchestration. AI systems that use external tools (databases, APIs, calculators, search engines) need an orchestration layer that manages tool selection, parameter handling, error recovery, and security.
Conversation management. Multi-turn interactions with context management, memory, and state tracking across sessions.

Agentic architecture patterns:

The production-ready pattern for agentic AI in 2026 is constrained autonomy:

Plan. The agent analyses the task and proposes a plan
Verify. The plan is checked against policy rules and constraints
Execute. The agent executes approved steps, one at a time
Checkpoint. High-risk actions require human approval before proceeding
Record. Every action is logged for audit and improvement

This is explicitly not "autonomous AI." It's AI with bounded authority, explicit constraints, and human oversight at decision points. Enterprises that skip the constraint layer in pursuit of full autonomy are building risk, not capability.

Key decisions:

Orchestration framework (LangGraph, custom, platform-native)
Human-in-the-loop boundaries (which decisions require approval)
Error handling strategy (retry, fallback, escalate)
State management approach (stateless vs persistent agents)

Layer 5: Governance

The policy, compliance, and monitoring layer that ensures AI operates within bounds.

What's changed since 2023:

Governance has evolved from a process (reviews, approvals, sign-offs) to infrastructure (policy engines, automated checks, continuous monitoring). This is the single most important architectural shift in the enterprise AI stack.

Governance-as-code components:

Component	Purpose	Implementation
Policy engine	Define and enforce AI usage policies	OPA, Cedar, custom rules engine
Access control	Model-level and data-level permissions	RBAC/ABAC integrated with IdP
Input filtering	Block injection attacks, enforce content policies	Pipeline middleware
Output filtering	Remove PII, flag hallucination, enforce format	Post-processing pipeline
Audit logging	Record every inference for compliance and review	Structured logging, immutable store
Monitoring	Detect drift, anomalies, and degradation	Dashboards, alerting, automated checks
Bias detection	Continuous testing for demographic bias	Statistical monitoring, red-team testing

The governance-as-code principle: Every governance rule should be expressed as code that runs automatically, not as a document that requires manual review. Data classification checks, access control verification, output filtering, and audit logging all happen in the pipeline, not in a committee meeting.

This doesn't eliminate human governance. It automates the routine checks so human reviewers focus on novel risks, policy evolution, and strategic oversight.

Key decisions:

Policy engine selection (custom vs off-the-shelf)
Monitoring granularity (per-query vs sampled)
Audit retention and access policies
Incident detection and response automation

Start With Layer 2 and Layer 5

If you're building your enterprise AI stack from scratch, invest first in the Data layer (get retrieval right) and the Governance layer (get controls right). The Intelligence and Orchestration layers are easier to swap and evolve. Data and Governance are foundational. Getting them wrong is expensive to fix.

The Reference Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                  GOVERNANCE (Layer 5)                     │
│  Policy Engine │ Access Control │ Monitoring │ Audit Log  │
├─────────────────────────────────────────────────────────┤
│                 ORCHESTRATION (Layer 4)                   │
│  RAG Pipeline │ Agentic Workflows │ Tool Use │ Memory     │
├─────────────────────────────────────────────────────────┤
│                 INTELLIGENCE (Layer 3)                    │
│  Frontier Models │ Mid-Tier Models │ Local Models         │
│                   Model Router                            │
├─────────────────────────────────────────────────────────┤
│                     DATA (Layer 2)                        │
│  Vector Store │ Knowledge Graph │ Search Index │ Pipeline  │
├─────────────────────────────────────────────────────────┤
│                INFRASTRUCTURE (Layer 1)                   │
│  Cloud Compute │ Storage │ Network │ Edge / On-Premises   │
└─────────────────────────────────────────────────────────┘

Governance spans all layers. It's drawn at the top but its controls are embedded throughout the stack.

What Changed From 2023 to 2026

Dimension	2023	2026
Models	Single model (usually GPT-4)	Multi-model with routing
Retrieval	Basic RAG with vector search	Hybrid retrieval + knowledge graphs
Orchestration	Simple prompt chains	Agentic workflows with tool use
Governance	Manual review process	Governance-as-code, automated
Infrastructure	Cloud-only	Cloud + edge + on-premises
Integration	API-based, simple	Deep system integration, bidirectional
Maturity	Experimental	Production-grade

Average AI Models in Production Per Enterprise

Source: Gartner, AI Platform Engineering Survey, 2025

The stack hasn't just grown. It's fundamentally restructured. The 2023 stack was built around a single model answering questions. The 2026 stack is built around an orchestration layer coordinating multiple models, data sources, and actions within a governance framework. The model is no longer the centre. The orchestration and governance layers are.

Building Your Stack

For Organisations Starting Fresh

Layer 1: Choose a cloud provider. Don't over-invest in infrastructure before you know your workload profile.
Layer 2: Build a data pipeline with hybrid retrieval (vector + keyword). Add knowledge graph when accuracy demands it.
Layer 3: Start with one frontier model. Add routing when you have cost or sovereignty reasons for multiple models.
Layer 4: Start with simple RAG. Evolve to agentic workflows as your use cases demand actions, not just answers.
Layer 5: Build governance-as-code from day one. Audit logging, access control, and input/output filtering are non-negotiable.

For Organisations Evolving From 2023-Era Stacks

Add knowledge graphs alongside your existing vector search. Start with one domain.
Implement model routing to optimise cost and compliance. Most queries don't need your most expensive model.
Upgrade governance from process to code. Automate the checks your team currently does manually.
Pilot agentic workflows in one bounded domain. Constrained autonomy with human checkpoints.
Add data lineage if you don't have it. Every AI output should be traceable to its source data.

Do we need all five layers from day one?: You need all five layers represented, but not all fully built. Layer 1 can be a single cloud account. Layer 2 can be a vector store and a simple pipeline. Layer 3 can be one model. Layer 4 can be a basic RAG pipeline. Layer 5 must include audit logging and access control from day one. Governance is the one layer you can't add later without major rework.
How do we choose between AI platform vendors?: Evaluate on three criteria: data sovereignty (where your data goes), portability (can you move away), and integration depth (how well it connects to your existing systems). Avoid platforms that lock you into a single model provider. Multi-model flexibility is essential in a fast-moving market.
Is agentic AI ready for enterprise production?: For bounded, well-defined workflows with human oversight, yes. Claims processing, compliance monitoring, and document review are live in production today. For unbounded, autonomous decision-making, no. The constrained autonomy pattern (plan, verify, execute, checkpoint, record) is the only production-safe approach. Any vendor telling you agentic AI can operate autonomously in enterprise is selling you risk.