Solving the Integration Problem (Or At Least Making It Manageable)

We've been building enterprise integrations since 2017. We've done point-to-point. We've done middleware. We've built custom API layers and inherited other people's. After eight years and dozens of enterprise environments, we don't have a silver bullet. But we do have patterns that compound, and that's worth more.

What You Need to Know

Event-driven architecture is the most durable integration pattern we've found. It doesn't eliminate complexity, but it contains it
Your integration layer should be part of your AI foundation, not bolted on afterwards
API gateways need to handle both traditional and AI traffic. The patterns differ, and a single gateway design can serve both
Integration debt is real, measurable, and worth auditing before you start any AI initiative
There's no single fix. The goal is a system that gets easier to extend over time, not one that's perfect today

Why We're Writing This Now

RIVER has been doing integration work for most of its existence. Early on, it was connecting CRMs to billing systems and building data pipelines between legacy platforms. Bread-and-butter enterprise engineering.

Over the past two years, the integration problem has changed shape. AI capabilities need richer data from more sources. They produce probabilistic output that requires different handling. They introduce latency patterns that break synchronous architectures.

We wrote about the complexity multiplier back in 2023 when we were still figuring out what AI did to integration patterns. We wrote about the integration tax earlier this year. This post is about what we're actually doing about it.

Start With an Audit

Before you redesign anything, understand what you have. Most organisations don't have a clear picture of their integration landscape. Systems were connected over years by different teams, different vendors, and different technology generations. Nobody drew the map.

An integration audit doesn't need to be exhaustive. Focus on three things:

1. Inventory your connections. Every system-to-system data flow. Who sends what to whom, how often, and through what mechanism. You'll find connections nobody remembers building. You'll find redundant paths doing the same thing differently. You'll find critical business processes running through a cron job someone wrote in 2019.

2. Classify by fragility. Not all integrations are equal. A nightly CSV export between two internal systems is different from a real-time webhook processing customer payments. Rank each connection by: how often it breaks, how quickly you detect the break, and what happens downstream when it fails.

3. Map the data flows AI will need. If you're planning AI capabilities (and in 2025, you probably are), identify which existing integrations will need to feed AI workloads. These are your priority targets for improvement. Don't upgrade integrations the AI won't touch.

A client thought they had about twelve integrations. The actual count was thirty-one - eleven built by a contractor who left in 2021, three of them broken, and nobody had noticed.

John Li

Chief Technology Officer

Event-Driven Architecture as the Foundation

We've tried all three major integration patterns across dozens of clients: point-to-point, middleware hubs, and event-driven. Each has a place. But if you're building something that needs to scale with AI capabilities, event-driven architecture is the foundation we keep coming back to.

The core idea: systems publish events when things happen. Other systems subscribe to the events they care about. Nobody knows or cares who else is listening.

A customer updates their address. The CRM publishes an "address-changed" event. The billing system picks it up and updates the invoice address. The logistics system picks it up and updates the delivery routes. The AI risk model picks it up and recalculates the customer's geographic risk profile. Each consumer handles the event independently.

Why This Works for AI

AI capabilities are consumers. They need data from many sources, and the set of sources changes as capabilities evolve. With event-driven architecture, adding a new AI consumer means subscribing to existing events. You don't need to modify the source systems. You don't need to build new point-to-point connections.

It also handles the latency problem naturally. Events are asynchronous by default. An AI capability that takes 15 seconds to process a document doesn't block anything. It consumes the event, processes it, and publishes its own result event when it's done.

And when the AI is wrong and a human corrects it, that correction is just another event. The feedback loop is built into the architecture, not bolted on.

What It Requires

Event-driven architecture isn't free. You need:

A message broker. Apache Kafka for high-throughput enterprise workloads. AWS EventBridge or Azure Event Grid if you're cloud-native. RabbitMQ for simpler deployments. The choice matters less than having one.
An event schema. A clear, versioned definition of what each event contains. This is the contract between publishers and consumers. Get this wrong and you've just replaced point-to-point chaos with event chaos.
Eventual consistency tolerance. Systems won't be in sync at every instant. There's a window between when an event is published and when all consumers have processed it. Your organisation needs to accept this. For most business processes, a few seconds of inconsistency is fine. For some, it's not. Know which is which.

73%

of enterprises adopting AI are prioritising event-driven architecture for their integration layer

Source: Confluent, Data Streaming Report 2025

The API Gateway Layer

Events handle system-to-system communication well. But you also need a gateway layer for synchronous interactions: user-facing applications calling backend services, external partners connecting to your APIs, and AI capabilities that need real-time responses.

A well-designed API gateway does four things for you:

Authentication and authorisation in one place. Every request passes through the gateway. Identity verification, permission checks, and rate limiting happen once, consistently, regardless of which backend service handles the request.

Traffic routing for both traditional and AI workloads. Traditional API calls route to deterministic services with predictable latency. AI calls route to inference services with variable latency and need different timeout configurations, retry strategies, and circuit-breaker thresholds. One gateway, two routing profiles.

Response transformation. Backend services return data in whatever format makes sense internally. The gateway transforms it into the format consumers expect. This includes adding confidence metadata to AI responses, normalising error formats across services, and handling versioning.

Observability. Every request logged, every latency measured, every error captured. When something breaks at 2am, the gateway logs tell you where and why.

AI-Specific Gateway Patterns

We've added three patterns to our gateway design specifically for AI traffic:

Confidence headers. Every AI response includes a confidence score in a standardised header. Consuming applications can route on confidence without parsing the response body. High confidence goes straight through. Low confidence gets flagged.

Model version tracking. The gateway records which model version served each request. When a client reports that "the AI is behaving differently," you can correlate the change with a model update, a prompt change, or a threshold adjustment.

Graceful degradation. When the AI service is slow or unavailable, the gateway returns a structured fallback response rather than an error. The consuming application knows it didn't get an AI-enhanced result and can proceed with its default logic. The user experience degrades gracefully instead of breaking.

Integration as Part of the AI Foundation

The biggest mistake we see is treating integration as a separate concern from AI. Organisations build their AI capabilities in one workstream and their integration infrastructure in another. The two teams meet in the middle and discover their assumptions don't align.

Our approach: the integration layer is one of the four layers of the AI foundation. It gets designed alongside the data layer, the intelligence layer, and the governance layer. Not after.

This means:

Event schemas are designed with AI consumers in mind from day one. Events carry the unstructured data and context that AI capabilities need, not just the structured fields that traditional integrations expect.
The API gateway is configured for AI traffic patterns from the start. Timeouts, retry policies, and circuit breakers are set for variable-latency AI calls, not just millisecond-latency database queries.
Feedback loops are part of the initial architecture. When an AI decision gets corrected by a human, that correction flows back through the event system. It's not a separate "phase two" project.

We used to treat integration as the last step - and every time, it ended up being 60% of the project and 80% of the ongoing maintenance. Now the integration architecture comes first, and that single change has cut our delivery timelines by roughly a third.

Isaac Rolfe

Managing Director

Prioritising Your Integration Debt

You can't fix everything at once. And honestly, some integrations don't need fixing. That nightly CSV export between two systems that nobody's expanding? Leave it alone. It works. Your effort is better spent elsewhere.

We prioritise integration improvements using three criteria:

1. AI adjacency. Will this integration feed or consume AI capabilities in the next 12 months? If yes, it's high priority. The cost of retrofitting AI support onto a fragile integration is always higher than building it right the first time.

2. Fragility score. How often does this integration break, how long does it take to detect, and what's the blast radius? High fragility, high impact connections get upgraded regardless of AI plans.

3. Extension frequency. How often do new consumers need data from this source? If you're adding a new connection to the same system every quarter, that system needs an event-based interface. The point-to-point approach is already failing you.

Focus on the top five. Get those right. Then reassess. The landscape changes as you improve the foundation, and priorities shift accordingly.

Honest Limitations

We don't have a silver bullet. Event-driven architecture introduces its own complexity. Schema evolution is hard. Debugging distributed events is harder than debugging synchronous API calls. Eventual consistency confuses people who are used to transactions.

API gateways can become bottlenecks if not properly scaled. They add latency to every request. They're another thing to monitor, maintain, and upgrade.

And the tooling is still maturing. Enterprise event platforms work well for traditional data. Making them work well for the unstructured, high-volume data that AI capabilities need is still an evolving practice.

What we can say: these patterns compound. Each integration you build on the event-driven foundation is easier than the last. Each AI capability you add to a well-designed gateway is faster to deliver. The first few months feel slower. By month six, you're moving faster than the point-to-point approach ever allowed.

That's the goal. Not solving the integration problem for good. Making it manageable, and getting better at it over time.