Skip to main content

The Knowledge Architecture Problem

Enterprise knowledge is scattered across 50 systems. AI can help - but only if you solve the architecture problem first.
28 December 2023·7 min read
Mak Khan
Mak Khan
Chief AI Officer
Isaac Rolfe
Isaac Rolfe
Managing Director
The average enterprise department uses 47 SaaS applications. Knowledge is fragmented across email, SharePoint, Slack, CRMs, ERPs, knowledge bases, shared drives, and - most inaccessibly - people's heads. AI can help unify this knowledge. But it can't do it without architecture.

The Problem

Mak: Every enterprise AI project we've worked on this year has hit the same wall: the knowledge architecture problem. The AI model is ready. The use case is clear. And then we try to connect it to the organisation's knowledge, and we discover that the knowledge doesn't exist in any unified, accessible form.
It's not that the knowledge doesn't exist. It does - in abundance. The problem is that it exists in 50 different places, in 50 different formats, with 50 different access controls, and no unifying layer.
A customer service team needs to answer a complex question. The answer requires information from the CRM (customer history), the knowledge base (product documentation), the policy system (current policies), email (recent correspondence), and the team lead's memory (that exception they approved last month).
No AI system can synthesise an answer from these sources without an architecture that connects them.

Why This Matters Now

Knowledge fragmentation isn't new. Enterprises have been dealing with siloed information for decades. What's new is that AI makes the cost of fragmentation visible and quantifiable.
Before AI, knowledge fragmentation meant people spent time searching, asking colleagues, and piecing together information manually. Annoying, but invisible in most metrics.
With AI, fragmentation means the AI can't access the knowledge it needs to be useful. The cost is suddenly visible: "The AI can't answer this question because the information is in three systems that aren't connected." That's a measurable gap between what AI could do and what it actually does.
47
average number of SaaS applications per enterprise department
Source: Productiv, State of SaaS Spend Report, 2023

The Architecture

Mak: The knowledge architecture for enterprise AI has four layers:

Layer 1: Ingestion

Getting knowledge out of source systems and into a format AI can process.
This is integration work. Document processors for PDFs and Word documents. API connectors for SaaS applications. Email parsers. Web scrapers for internal sites. OCR for scanned documents. Each source system requires a specific connector.
The engineering isn't complex per connector, but the number of connectors adds up. A typical enterprise might need 15-20 connectors to cover its primary knowledge sources.

Layer 2: Processing

Transforming raw content into AI-ready representations.
This is where chunking, embedding, and indexing happen. Documents get broken into meaningful segments. Each segment gets converted into vector embeddings that capture semantic meaning. Metadata gets extracted and attached: source, date, author, classification, access controls.
The key decisions at this layer:
  • Chunking strategy. Too large and retrieval is imprecise. Too small and context is lost. The right chunk size depends on the content type and the use case.
  • Embedding model. Which model produces the most useful representations for your content? This matters more than most teams realise.
  • Metadata schema. What metadata do you need to support filtering, access control, and source attribution?

Layer 3: Retrieval

Finding the right knowledge for a given query.
This is the vector database and search layer. When a user (or an AI agent) asks a question, the system converts the question into an embedding, searches the vector database for similar content, and returns the most relevant chunks.
Simple in concept. Nuanced in practice. Retrieval quality is the single biggest determinant of RAG system performance. Getting this right requires:
  • Hybrid search (combining vector similarity with keyword matching)
  • Re-ranking (using a second model to re-order results by relevance)
  • Filtering (using metadata to restrict results to relevant sources)

Layer 4: Synthesis

Combining retrieved knowledge with AI to produce useful outputs.
This is where the LLM takes the retrieved content, the user's question, and any system instructions, and generates an answer. The quality of synthesis depends on the quality of retrieval (what information does the model have to work with?) and the quality of prompting (what instructions guide the model's response?).
Isaac: What Mak is describing is essentially a knowledge operating system. A layer that sits between your existing systems and your AI applications, making organisational knowledge accessible, structured, and usable.
This is what we've been calling an "AI foundation" - the infrastructure that every AI application in your organisation builds on. Get this right, and every subsequent AI initiative is easier. Get it wrong (or skip it), and every AI initiative has to solve the knowledge problem from scratch.

The Practical Path

Start Small

Don't try to connect every system at once. Pick one knowledge domain - one department's documentation, one product's policy library, one team's knowledge base. Build the architecture for that domain. Prove it works. Expand.

Invest in Connectors

The ingestion layer is unglamorous but critical. Build robust connectors for your highest-value knowledge sources first. Make them reliable, make them incremental (so they can pick up changes), and make them monitored (so you know when they fail).

Iterate on Retrieval

Retrieval quality is where you'll spend most of your optimisation effort. Don't expect the first version to be perfect. Build measurement into the system (are users getting relevant results?) and iterate continuously.

Don't Forget Governance

Knowledge architecture inherits all the governance requirements of the underlying data. Access controls need to propagate from source systems to the knowledge layer. Sensitive data needs to be handled appropriately. Audit trails need to track what knowledge was used to generate what outputs.
Mak: The unsexy truth about enterprise AI is that most of the engineering effort goes into data plumbing. Getting knowledge from where it is to where AI can use it. It's not the part that demos well. But it's the part that determines whether AI delivers real value or just impressive prototypes.