Skip to main content

Enterprise Data Migration Done Right

Data migration is the unglamorous heart of most enterprise projects. A practical framework for getting it right without losing your mind or your data.
1 November 2020·7 min read
John Li
John Li
Chief Technology Officer
Nobody gets excited about data migration. It's the part of enterprise projects that gets scoped last, staffed reluctantly, and underestimated consistently. And it's the part that, when it goes wrong, takes the entire project down with it. I've worked on migrations that were expected to take two weeks and took three months. Not because the technology was hard. Because the data was worse than anyone admitted.

What You Need to Know

  • Data migration is underestimated in nearly every enterprise project we've worked on
  • The biggest risk isn't technical. It's data quality in the source system.
  • A phased approach (assess, cleanse, transform, validate, migrate, verify) reduces risk significantly
  • Plan for the migration to take twice as long as estimated. Budget for three times.

Why Migrations Fail

Nobody Knows What's in the Source System

The most common failure mode isn't a technical error. It's discovering, mid-migration, that the source data is nothing like what was documented. Fields that should contain dates contain free text. Required fields are empty. Duplicate records number in the thousands. Encoding is inconsistent. The data dictionary, if one exists, describes the system as it was designed, not as it's actually used.
88%
of data migration projects exceed their initial budget or timeline
Source: Bloor Research, The State of Data Migration, 2019
I've seen a client's "customer ID" field contain customer IDs, internal notes, phone numbers, and in one memorable case, a recipe for banana bread. The system allowed free text entry. Humans used it accordingly. The migration plan assumed clean data. Reality disagreed.

The Testing Gap

Most migration plans include testing. Few include enough testing. A migration test that validates 100 records out of 500,000 catches structural issues but misses the edge cases that will surface in production. And edge cases in enterprise data aren't rare. They're everywhere.
The migration that works perfectly in the test environment, with a clean data subset, will fail in production when it encounters the records that no test thought to include. The customer with a name that contains special characters. The transaction dated January 1st, 1900 because someone needed to enter something. The record that references a deleted record in another table.

A Practical Framework

Phase 1: Assessment (Don't Skip This)

Before touching any migration tooling, profile the source data. Every table. Every field. What percentage is populated? What are the actual data types (not the schema types)? Where are the duplicates? Where are the inconsistencies?
This phase typically reveals that the migration is more complex than anyone estimated. That's the point. It's better to know now than during the migration window.
The assessment phase is where projects save themselves. Take the time.
John Li
Chief Technology Officer

Phase 2: Cleanse

Fix the source data before migrating it. This is contentious. Some organisations want to migrate everything and clean it up later. That approach moves the mess to a new system where it's equally hard to clean up, and now you've contaminated your new platform.
Clean before you move. Deduplicate records. Standardise formats. Fill critical gaps. Flag records that can't be automatically cleaned for manual review. This phase is labour-intensive and unglamorous. It's also the single biggest factor in migration success.

Phase 3: Transform

Map source data to target schema. This is where most migration planning starts, which is why so many migrations fail. Without the assessment and cleansing, the mapping is built on assumptions about data quality that don't hold.
Build the transformations as code, not manual processes. Every transformation should be repeatable, testable, and version-controlled. You'll run the migration multiple times before the final cut. Manual steps introduce variation between runs.

Phase 4: Validate

Run the migration against a full copy of source data. Not a subset. Full copy. Compare record counts, check referential integrity, validate business rules. Every anomaly found here is a bug you won't have in production.
Build automated validation checks that run after every migration attempt. Record counts by type. Financial totals. Referential integrity between tables. Spot checks of specific records that represent known edge cases.

Phase 5: Migrate

The actual migration should be the least eventful part of the process. If the previous phases are done well, migration night is execution of a well-tested plan. If migration night is exciting, something went wrong earlier.

Phase 6: Verify

Post-migration verification by the business. Not by IT. The business users who work with this data daily are the ones who'll spot the record that migrated incorrectly. Give them time and a structured process for reviewing the migrated data before the old system is decommissioned.

Lessons From Recent Migrations

Always keep the source system available after migration. For at least three months. Users will find discrepancies. You'll need to check the source to determine whether the error was in the source or the migration.
Document every decision. "We decided to map status 'Pending Review' to 'Open' because the new system doesn't have a pending state" is a decision that someone will question in six months. Document it when you make it.
Budget for data quality work. If the project budget doesn't include a line item for data cleansing, the migration will either go over budget or go badly. Usually both.
Communicate early and often. Users need to know that their data is moving, when it's moving, and what might look different afterwards. Surprises erode trust in the new system before it's even launched.
Data migration isn't exciting. It's essential. Treat it with the seriousness it deserves and your enterprise project has a fighting chance. Treat it as an afterthought and you'll learn the hard way why 88% of migrations exceed their estimates.