The glamorous part of enterprise AI is the model. The unglamorous part is the data connector that feeds it. Hassan and I have spent more time debugging data connectors than debugging models, because connectors break more often, in more ways, and with less informative error messages. A model is deterministic for a given input. A data connector contends with network timeouts, API changes, authentication expiry, rate limits, malformed responses, and the infinite creativity of enterprise data formats.
What You Need to Know
- Data connectors are the most common point of failure in enterprise AI systems, not the model
- The three most frequent connector failures: authentication expiry, API schema changes, and rate limit exhaustion
- Every connector needs four things: health checks, retry logic, circuit breaking, and data validation
- Build connectors as isolated services. When a CRM connector breaks, your document processing pipeline should keep running.
The Failure Taxonomy
Authentication Failures
OAuth tokens expire. API keys get rotated. Service accounts get disabled during security reviews. The connector worked yesterday. It doesn't work today. The error message says "401 Unauthorized" and nothing about why.
Prevention: Token refresh should be automatic, with alerts when refresh fails. API key rotation should be a documented process with connector updates included. Service account permissions should be audited before security reviews, not discovered after them.
Schema Changes
The CRM adds a field. The document management system changes an API response format. The ERP vendor releases an update that renames a column. Your connector, which expected a specific schema, receives something different and either fails or silently drops data.
Never trust the schema. Validate every response against what you expect. The 15 minutes you spend building validation saves hours of debugging when the schema inevitably changes.
Hassan Nawaz
Senior Developer
Prevention: Schema validation on every response. Not just "did we get a 200?" but "does the response contain the fields we need, in the types we expect, within the ranges we accept?"
Rate Limit Exhaustion
Enterprise APIs have rate limits. AI pipelines that process hundreds or thousands of items can exhaust those limits quickly. The connector starts getting 429 responses, and if it doesn't handle them, the pipeline either fails or retries in a way that makes the rate limiting worse.
Prevention: Rate limiting on the connector side (respect the API's limits before hitting them). Queue-based processing that spreads load over time. Exponential backoff on 429 responses. Monitoring of API quota usage.
The Four Essentials
Health Checks
Every connector should have a health check endpoint that verifies:
- Can it authenticate with the source system?
- Can it execute a lightweight query successfully?
- Is the response schema as expected?
- Is the response time within acceptable bounds?
Run health checks on a schedule (every 5-15 minutes). Alert when a health check fails twice consecutively.
Retry Logic
Not all failures are permanent. Timeouts, transient network issues, and temporary rate limits resolve on their own. Retry logic should:
- Retry on transient errors (timeouts, 429s, 503s)
- Use exponential backoff (1s, 2s, 4s, 8s)
- Set a maximum retry count (typically 3-5)
- Not retry on permanent errors (401s, 404s, malformed requests)
Circuit Breaking
If a source system is consistently failing, the circuit breaker prevents the connector from hammering it with requests. After N consecutive failures, the circuit opens and requests are routed to a fallback (cached data, graceful degradation, human notification).
The circuit closes again after a configurable cool-down period, and the connector attempts to resume normal operation.
Data Validation
Every piece of data that enters the AI pipeline should be validated:
- Type checking (is the date a date? is the number a number?)
- Completeness checking (are all required fields present?)
- Range checking (is the value within expected bounds?)
- Format checking (does the text meet minimum quality thresholds?)
Invalid data should be logged, flagged, and routed to an exception queue. It should not be silently fed to the model.
Isolation
Build each connector as an isolated service. The CRM connector should be independent of the document connector, which should be independent of the email connector. When one breaks, the others continue.
Isolation also enables independent scaling, independent deployment, and independent monitoring. If the CRM connector needs to handle a spike in sync volume, you scale that connector without affecting anything else.
Data connectors are boring infrastructure that makes interesting AI possible. The time invested in building them properly, with health checks, retry logic, circuit breaking, and validation, pays back every time a source system changes, fails, or behaves unexpectedly. And in enterprise environments, that is not a matter of if but when.

