Skip to main content

The AI Stack Decisions That Matter

Five architecture decisions that determine whether your enterprise AI scales or stalls. A technical decision framework for teams building production AI.
22 April 2025·8 min read
John Li
John Li
Chief Technology Officer
Most enterprise AI architecture discussions focus on model selection. Which LLM? Which embedding model? Which vector database? These are the decisions teams agonise over, and they are the least consequential. The five decisions that actually determine whether your AI scales or stalls happen at the system level, and most teams make them by default rather than by design.

Decision 1: Orchestration Pattern

The question: How do AI components coordinate with each other and with existing systems?
The options:
  • Direct integration. Each AI capability is built independently and talks directly to the model provider. Simple to start, impossible to manage at scale.
  • Central orchestrator. A single orchestration layer manages all AI interactions, context, and routing. More complex upfront, dramatically simpler at scale.
  • Event-driven. AI components subscribe to events and act independently. Flexible but harder to debug and monitor.
The right choice for most enterprises: Central orchestrator. The upfront investment pays for itself by capability three or four. Direct integration leads to a mess of duplicated logic, inconsistent error handling, and no way to manage costs or quality centrally.
The orchestration layer handles model routing, context management, token budgets, caching, fallback logic, and monitoring. Without it, each team builds their own version of each of these, and none of them do it well.
I have never regretted building an orchestration layer. I have frequently regretted not building one.
John Li
Chief Technology Officer

Decision 2: Context Architecture

The question: How does the AI access the information it needs to produce useful outputs?
This is the RAG decision, but it goes deeper than most teams realise. Context architecture includes:
  • Retrieval strategy. How do you find the right information? Vector search, keyword search, hybrid, graph-based? The answer depends on your data types and query patterns.
  • Context window management. How do you fit the right information into a limited context window? Chunk size, overlap, ranking, and compression all matter.
  • Knowledge freshness. How current does the information need to be? Real-time indexing is expensive. Batch indexing has lag. Most enterprises need a mix.
  • Multi-source orchestration. Enterprise knowledge lives across dozens of systems. The context architecture needs to query, rank, and combine information from multiple sources.
The mistake most teams make: Treating RAG as a simple retrieval problem. Build a vector store, embed your documents, search and retrieve. This works for demos. It breaks in production when documents have conflicting information, when freshness matters, and when the user's question requires synthesising across sources.
The right approach: Design the context architecture for your actual information landscape. Map your knowledge sources. Understand the freshness requirements. Design for the hard cases (conflicting information, multi-source synthesis) from the start, not as patches later.

Decision 3: Evaluation Infrastructure

The question: How do you know the AI is working, continuously, across all capabilities?
This is the decision most teams skip or defer. It is also the decision that determines whether you can confidently scale, update models, or change prompts without breaking things.
Evaluation infrastructure includes:
  • Automated testing. A suite of test cases that run against every model update, prompt change, or data refresh. Not unit tests. Evaluation tests that measure output quality.
  • Regression detection. Automated comparison of current performance against baselines. When a model update degrades performance on a specific task, you need to know before users notice.
  • Human-in-the-loop evaluation. For tasks where automated metrics are insufficient, structured human evaluation with calibrated reviewers.
  • Production monitoring. Real-time tracking of output quality in production, not just technical metrics (latency, errors) but quality metrics (user acceptance, correction rates).
Why this matters now: Every model provider updates their models regularly. Each update can change behaviour in subtle ways. Without evaluation infrastructure, you discover regressions from user complaints. With it, you discover them from automated alerts.

Decision 4: Security and Governance Boundary

The question: Where does data go, who can access what, and how do you enforce it?
Enterprise AI introduces new data flows that existing security models may not cover. Customer data going to model providers. Internal documents being embedded and stored in new systems. AI outputs containing synthesised information from multiple access-controlled sources.
The critical decisions:
  • Data residency. Where is data processed? On-premise, in your cloud, at the model provider? For regulated industries, this is a compliance requirement, not a preference.
  • Access control propagation. If a user cannot access a document, can the AI use that document to answer their question? Access controls must propagate through the AI pipeline, not just the document system.
  • Output governance. What guardrails prevent the AI from generating inappropriate, incorrect, or harmful outputs? Where are these enforced (prompt level, output filter, review workflow)?
  • Audit trail. Can you trace any AI output back to its inputs, context, model, and prompt? For regulated industries, this is a requirement. For everyone else, it is a best practice.
The mistake: Treating AI security as an extension of existing application security. AI introduces new attack surfaces (prompt injection, data leakage through context, model manipulation) that require specific mitigations.

Decision 5: Integration Pattern

The question: How do AI capabilities connect to existing enterprise systems?
This is where most AI projects stall. The AI works in isolation. It does not work when it needs to read from the CRM, write to the ERP, trigger a workflow, or update a record.
The options:
  • API-based. AI capabilities expose and consume REST/GraphQL APIs. Standard, well-understood, but requires API availability for every system.
  • Tool calling. The AI model calls functions that interact with enterprise systems. Powerful for agentic patterns but requires careful safety design.
  • Event-driven. AI capabilities subscribe to and publish events. Good for loose coupling but complex to debug.
  • MCP (Model Context Protocol). An emerging standard for AI-to-system communication. Promising for standardisation but early in maturity.
The right choice: Most enterprises need a combination. API-based for structured integrations. Tool calling for agentic capabilities. Event-driven for asynchronous workflows. The key is a consistent integration framework that makes adding new system connections predictable, not a custom integration for every system.

The Compounding Effect

These five decisions interact. A central orchestrator makes model routing and evaluation easier. Good context architecture reduces the need for expensive models. Evaluation infrastructure gives you confidence to optimise costs. Security boundaries constrain integration patterns.
Get them right and each AI capability you build makes the next one faster and cheaper. Get them wrong and each capability is a standalone project that shares nothing with the last.
The architecture is not the exciting part of enterprise AI. It is the part that determines whether the exciting parts work.