Skip to main content

The Enterprise AI Stack in 2026: The Reference Architecture

The enterprise AI technology stack has matured dramatically. Multi-model orchestration, agentic workflows, knowledge graphs, and governance-as-code - here's the reference architecture for 2026.
5 February 2026·15 min read
Mak Khan
Mak Khan
Chief AI Officer
John Li
John Li
Chief Technology Officer
In 2023, we published The Enterprise AI Stack Explained, a guide to the technology layers underpinning enterprise AI. Two and a half years later, the stack has transformed. Single-model RAG architectures have given way to multi-model orchestration. Agentic workflows have moved from research papers to production. Knowledge graphs are standard. And governance is no longer a process - it's code. Here's the updated reference architecture.

What You Need to Know

  • The enterprise AI stack has five layers in 2026: Infrastructure, Data, Intelligence, Orchestration, and Governance. Each is more mature and more interconnected than in 2023.
  • Multi-model orchestration is the new default. Enterprises are using 3-7 models in production, routing queries to the right model based on task, cost, and compliance requirements. Single-model deployments are a legacy pattern.
  • Agentic AI is in early production. AI systems that take actions, not just answer questions, are live in claims processing, compliance monitoring, and customer service. The architectural implications are significant.
  • Knowledge graphs have become standard alongside vector search. The combination of vector search (semantic retrieval) and knowledge graphs (structured relationships) delivers significantly better accuracy for enterprise queries.
  • Governance-as-code is the most important architectural shift. Governance is no longer a review process. It's embedded in the infrastructure through policy engines, automated compliance checks, and continuous monitoring.
5.2
average number of AI models in production per enterprise in 2026
Source: Gartner, AI Platform Engineering Survey, 2025

The Five-Layer Architecture

Layer 1: Infrastructure

The compute, storage, and networking foundation that everything runs on.
What's changed since 2023:
  • GPU access is commoditised. Cloud GPU availability has stabilised. Reserved capacity and spot pricing make compute budgets predictable. On-premises GPU deployment is realistic for enterprises with sovereignty requirements.
  • Inference is now separate from training. Most enterprises don't train models. They run inference on foundation models. This changes the infrastructure profile: less GPU-intensive, more latency-sensitive, and more cost-conscious.
  • Edge inference is emerging. For low-latency or sovereignty-sensitive use cases, enterprises are running smaller models on-premises or at the edge. The infrastructure layer now spans cloud and local deployment.
Key decisions:
  • Cloud provider selection (Azure, AWS, GCP, or multi-cloud)
  • Sovereignty requirements (data residency, compute location)
  • Cost management strategy (reserved vs on-demand, model routing by cost)

Layer 2: Data

The data platform that feeds AI systems: storage, pipelines, embedding, and retrieval.
What's changed since 2023:
  • Hybrid retrieval is standard. The best enterprise architectures combine vector search for semantic retrieval, keyword search for precision, and knowledge graphs for structured relationships. Single-retrieval approaches are insufficient for production enterprise use.
  • Embedding pipelines are mature. Chunking strategies, embedding model selection, and incremental re-embedding are well-understood problems with established patterns. The art is in tuning: which chunks, which model, which parameters for your specific content.
  • Data lineage is required. For governance and audit, every piece of data that influences an AI output must be traceable, from source document to embedding to retrieval to generation. This is architecture, not afterthought.
The data layer components:
ComponentPurposeOptions
Document ingestionParse, chunk, embed documentsCustom pipeline, Unstructured.io, LlamaIndex
Vector storeSemantic retrievalpgvector, Pinecone, Weaviate, Qdrant
Knowledge graphStructured relationshipsNeo4j, Amazon Neptune, PostgreSQL
Search indexKeyword/hybrid searchElasticsearch, Typesense, Meilisearch
Data pipelineETL/ELT for AI dataCustom, Airflow, Dagster
Feature storeStructured features for modelsFeast, Tecton, custom
Key decisions:
  • Knowledge graph scope (start narrow, expand by domain)
  • Embedding model selection (quality vs cost vs speed trade-off)
  • Data pipeline cadence (real-time vs batch, by data source)

Layer 3: Intelligence

The AI models that understand, reason, and generate.
What's changed since 2023:
  • Multi-model is the norm. Enterprises route queries to different models based on task complexity, cost, latency, and compliance. A customer FAQ uses a small, fast model. A contract review uses a large, capable model. Sensitive data stays on a local model.
  • Model capabilities have converged. The gap between frontier models (Claude, GPT-4) and the tier below has narrowed. For many enterprise use cases, the second-tier model at 30% of the cost delivers equivalent results.
  • Fine-tuning is selective. Most enterprises use foundation models with retrieval augmentation rather than fine-tuning. Fine-tuning is reserved for domain-specific tasks where retrieval alone isn't sufficient: classification, extraction, and domain-specific language.
The multi-model architecture:
                +------------------+
                |   Model Router   |
                +---------+--------+
                          |
         +----------------+----------------+
         v                v                v
   +-----------+    +-----------+    +-----------+
   |  Frontier |    |  Mid-tier |    |   Local   |
   | (Complex) |    | (General) |    |(Sovereign)|
   +-----------+    +-----------+    +-----------+
Model routing criteria:
FactorFrontier ModelMid-Tier ModelLocal Model
Task complexityHigh (multi-step reasoning)Medium (standard Q&A)Low (classification, extraction)
Cost per queryHighMediumLow (infrastructure only)
LatencyHigherLowerLowest
Data sovereigntyCloud-basedCloud-basedOn-premises
ComplianceProvider terms applyProvider terms applyFull control
Key decisions:
  • Primary model provider (with fallback strategy)
  • Routing logic (rule-based vs learned)
  • Sovereignty model selection and deployment

Layer 4: Orchestration

The coordination layer that connects data, models, and actions into coherent workflows.
What's changed since 2023:
This is the layer that barely existed in 2023 and is now the most architecturally significant. Orchestration handles:
  • RAG pipeline management. Query understanding, retrieval strategy selection, context assembly, prompt construction, response generation, and post-processing, all coordinated as a pipeline with error handling and fallback behaviour.
  • Agentic workflows. Agentic AI systems that plan, execute, and verify multi-step tasks. A claims processing agent that reads a claim, extracts data, checks policy, calculates liability, and drafts a response, with human approval at key decision points.
  • Tool orchestration. AI systems that use external tools (databases, APIs, calculators, search engines) need an orchestration layer that manages tool selection, parameter handling, error recovery, and security.
  • Conversation management. Multi-turn interactions with context management, memory, and state tracking across sessions.
Agentic architecture patterns:
The production-ready pattern for agentic AI in 2026 is constrained autonomy:
  1. Plan. The agent analyses the task and proposes a plan
  2. Verify. The plan is checked against policy rules and constraints
  3. Execute. The agent executes approved steps, one at a time
  4. Checkpoint. High-risk actions require human approval before proceeding
  5. Record. Every action is logged for audit and improvement
This is explicitly not "autonomous AI." It's AI with bounded authority, explicit constraints, and human oversight at decision points. Enterprises that skip the constraint layer in pursuit of full autonomy are building risk, not capability.
Key decisions:
  • Orchestration framework (LangGraph, custom, platform-native)
  • Human-in-the-loop boundaries (which decisions require approval)
  • Error handling strategy (retry, fallback, escalate)
  • State management approach (stateless vs persistent agents)

Layer 5: Governance

The policy, compliance, and monitoring layer that ensures AI operates within bounds.
What's changed since 2023:
Governance has evolved from a process (reviews, approvals, sign-offs) to infrastructure (policy engines, automated checks, continuous monitoring). This is the single most important architectural shift in the enterprise AI stack.
Governance-as-code components:
ComponentPurposeImplementation
Policy engineDefine and enforce AI usage policiesOPA, Cedar, custom rules engine
Access controlModel-level and data-level permissionsRBAC/ABAC integrated with IdP
Input filteringBlock injection attacks, enforce content policiesPipeline middleware
Output filteringRemove PII, flag hallucination, enforce formatPost-processing pipeline
Audit loggingRecord every inference for compliance and reviewStructured logging, immutable store
MonitoringDetect drift, anomalies, and degradationDashboards, alerting, automated checks
Bias detectionContinuous testing for demographic biasStatistical monitoring, red-team testing
The governance-as-code principle: Every governance rule should be expressed as code that runs automatically, not as a document that requires manual review. Data classification checks, access control verification, output filtering, and audit logging all happen in the pipeline, not in a committee meeting.
This doesn't eliminate human governance. It automates the routine checks so human reviewers focus on novel risks, policy evolution, and strategic oversight.
Key decisions:
  • Policy engine selection (custom vs off-the-shelf)
  • Monitoring granularity (per-query vs sampled)
  • Audit retention and access policies
  • Incident detection and response automation
Start With Layer 2 and Layer 5
If you're building your enterprise AI stack from scratch, invest first in the Data layer (get retrieval right) and the Governance layer (get controls right). The Intelligence and Orchestration layers are easier to swap and evolve. Data and Governance are foundational. Getting them wrong is expensive to fix.

The Reference Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                  GOVERNANCE (Layer 5)                     │
│  Policy Engine │ Access Control │ Monitoring │ Audit Log  │
├─────────────────────────────────────────────────────────┤
│                 ORCHESTRATION (Layer 4)                   │
│  RAG Pipeline │ Agentic Workflows │ Tool Use │ Memory     │
├─────────────────────────────────────────────────────────┤
│                 INTELLIGENCE (Layer 3)                    │
│  Frontier Models │ Mid-Tier Models │ Local Models         │
│                   Model Router                            │
├─────────────────────────────────────────────────────────┤
│                     DATA (Layer 2)                        │
│  Vector Store │ Knowledge Graph │ Search Index │ Pipeline  │
├─────────────────────────────────────────────────────────┤
│                INFRASTRUCTURE (Layer 1)                   │
│  Cloud Compute │ Storage │ Network │ Edge / On-Premises   │
└─────────────────────────────────────────────────────────┘
Governance spans all layers. It's drawn at the top but its controls are embedded throughout the stack.

What Changed From 2023 to 2026

Dimension20232026
ModelsSingle model (usually GPT-4)Multi-model with routing
RetrievalBasic RAG with vector searchHybrid retrieval + knowledge graphs
OrchestrationSimple prompt chainsAgentic workflows with tool use
GovernanceManual review processGovernance-as-code, automated
InfrastructureCloud-onlyCloud + edge + on-premises
IntegrationAPI-based, simpleDeep system integration, bidirectional
MaturityExperimentalProduction-grade
Average AI Models in Production Per Enterprise
Source: Gartner, AI Platform Engineering Survey, 2025
The stack hasn't just grown. It's fundamentally restructured. The 2023 stack was built around a single model answering questions. The 2026 stack is built around an orchestration layer coordinating multiple models, data sources, and actions within a governance framework. The model is no longer the centre. The orchestration and governance layers are.

Building Your Stack

For Organisations Starting Fresh

  1. Layer 1: Choose a cloud provider. Don't over-invest in infrastructure before you know your workload profile.
  2. Layer 2: Build a data pipeline with hybrid retrieval (vector + keyword). Add knowledge graph when accuracy demands it.
  3. Layer 3: Start with one frontier model. Add routing when you have cost or sovereignty reasons for multiple models.
  4. Layer 4: Start with simple RAG. Evolve to agentic workflows as your use cases demand actions, not just answers.
  5. Layer 5: Build governance-as-code from day one. Audit logging, access control, and input/output filtering are non-negotiable.

For Organisations Evolving From 2023-Era Stacks

  1. Add knowledge graphs alongside your existing vector search. Start with one domain.
  2. Implement model routing to optimise cost and compliance. Most queries don't need your most expensive model.
  3. Upgrade governance from process to code. Automate the checks your team currently does manually.
  4. Pilot agentic workflows in one bounded domain. Constrained autonomy with human checkpoints.
  5. Add data lineage if you don't have it. Every AI output should be traceable to its source data.
Do we need all five layers from day one?
You need all five layers represented, but not all fully built. Layer 1 can be a single cloud account. Layer 2 can be a vector store and a simple pipeline. Layer 3 can be one model. Layer 4 can be a basic RAG pipeline. Layer 5 must include audit logging and access control from day one. Governance is the one layer you can't add later without major rework.
How do we choose between AI platform vendors?
Evaluate on three criteria: data sovereignty (where your data goes), portability (can you move away), and integration depth (how well it connects to your existing systems). Avoid platforms that lock you into a single model provider. Multi-model flexibility is essential in a fast-moving market.
Is agentic AI ready for enterprise production?
For bounded, well-defined workflows with human oversight, yes. Claims processing, compliance monitoring, and document review are live in production today. For unbounded, autonomous decision-making, no. The constrained autonomy pattern (plan, verify, execute, checkpoint, record) is the only production-safe approach. Any vendor telling you agentic AI can operate autonomously in enterprise is selling you risk.