I've been migrating data from old systems to new ones for the better part of a decade. Every project starts with the same optimism. The new platform is modern, well-designed, properly structured. The migration will be straightforward. And every project hits the same wall: the old system contains a decade of decisions, workarounds, and accumulated human behaviour that no schema diagram prepared anyone for.
What You Need to Know
- Data migration is consistently the most underestimated phase of enterprise platform projects
- Choose your migration strategy deliberately: big-bang for simplicity, incremental for risk management, parallel running for critical systems
- Never go live without a tested rollback plan. Not a theoretical one. One you've actually executed.
- The migration window isn't just technical. It's political. Stakeholder communication matters as much as the ETL pipeline.
Nobody Wants to Talk About This
When a project kicks off, the energy goes to the new system. The shiny interface. The modern architecture. The features that will fix everything. Migration gets a line item in the project plan and a vague time estimate.
Then someone opens the source database.
I worked on a migration for a logistics company in 2019. Their system had been running since 2006. Fifteen years of data. The schema had been modified so many times that the documentation described maybe 60% of the actual structure. The rest was tribal knowledge held by two database administrators, one of whom had left the company three years prior.
The migration was estimated at four weeks. It took fourteen.
83%
of data migration projects experience significant unexpected delays
Source: Bloor Research, The State of Data Migration, 2017
The Three Migration Patterns
Every migration falls into one of three approaches. Picking the wrong one for your context is a common source of pain.
Extract-Transform-Load (ETL)
The traditional approach. Pull data from the source, transform it to match the target schema, load it into the new system.
ETL works well when the transformation logic is complex and you need to validate data between extraction and loading. It's the right choice when the source and target schemas differ significantly, when you need to merge data from multiple sources, or when data cleansing is a major component.
The downside: you need a staging environment for the transformation layer, and the transform step can become a bottleneck. I've seen ETL pipelines where the transform logic grew to thousands of lines because every edge case in the source data needed handling.
Extract-Load-Transform (ELT)
Load the raw data first, transform it in the target environment. This approach has gained traction with modern cloud data platforms that have cheap storage and powerful compute.
ELT works when your target platform handles transformation well (modern data warehouses, cloud platforms with built-in transformation tooling). It's faster to get data into the system, and you can iterate on transformation logic without re-extracting.
The risk: you're putting raw, untransformed data into your target system, even temporarily. For some organisations, compliance requirements make this problematic. And if the transformation fails, you've got dirty data sitting in production infrastructure.
Trickle Migration
Move data incrementally over an extended period. Both systems run simultaneously. New records go to the new system while historical data migrates in the background.
This is the lowest-risk approach for critical systems where downtime isn't acceptable. It's also the most complex to implement. You need bidirectional sync logic, conflict resolution rules, and clear criteria for when the old system gets retired.
I've used trickle migration for a healthcare provider where system downtime literally affected patient care. The migration ran over eight weeks. Both systems were live throughout. It was the most complex migration I've worked on, but the organisation couldn't accept any other approach.
Big-Bang vs Incremental
Beyond the data movement pattern, there's the question of timing. Do you migrate everything at once (big-bang) or in phases (incremental)?
Big-bang is simpler to plan and eliminates the need for systems to coexist. You pick a weekend, migrate everything, verify, go live on Monday. When it works, it's clean.
When it doesn't work, you're in trouble. A failed big-bang migration on a Sunday night with a Monday morning go-live deadline is one of the most stressful situations in enterprise IT. I've been there twice. Both times we rolled back. Both times the project timeline slipped by months.
Incremental migration is harder to set up but dramatically safer. Migrate one department, one data domain, or one geographical region at a time. Each phase is smaller. Each phase teaches you something. If phase three fails, phases one and two are already live and working.
I've stopped recommending big-bang migrations for anything with more than 100,000 records or more than five integrated systems. Sleep better.
John Li
Chief Technology Officer
The Rollback Non-Negotiable
Every migration plan needs a rollback strategy. Not "we'll figure it out if something goes wrong." A documented, tested, rehearsed procedure for returning to the previous state.
I worked with a government agency that migrated their case management system without a rollback plan. The migration completed successfully. Data was in the new system. But a transformation bug had silently corrupted date fields on about 8% of records. They discovered it two weeks later.
Without a rollback plan, they had two options: manually fix 12,000 records, or re-migrate from a backup that was now two weeks stale (and would lose two weeks of new data entered into the new system). They chose option one. It took six weeks of manual work.
If they'd maintained the old system in read-only mode for the first month after migration, the fix would have been straightforward: re-migrate the affected records from the preserved source.
The rule: maintain the source system in a recoverable state for at least 30 days post-migration. For critical systems, 90 days.
War Stories
The Encoding Problem
Migrating a customer database for a New Zealand organisation. Everything looked clean in testing. Go-live went smoothly. Two days later, customer service started getting calls. Names with macrons were corrupted. Every instance of a macronised vowel had been replaced with garbled characters.
The source system stored data in Latin-1 encoding. The target used UTF-8. The migration script didn't specify encoding conversion. For the 95% of records that only contained ASCII characters, this was invisible. For the 5% that contained te reo Māori names, addresses, and notes, it was destructive.
We caught it quickly because our validation included specific checks for non-ASCII characters. That check exists because we learned this lesson the hard way on a previous project.
The Phantom Records
A financial services migration where record counts didn't match after migration. Source system: 847,293 records. Target system: 851,107 records. We had more data after migration than before.
The source system had soft deletes. Records marked as deleted weren't visible in the application but were still in the database. The migration extracted everything from the database directly, including 3,814 "deleted" records that the application had hidden for years.
This is why migration testing can't just be record counts. You need to understand the application's data access patterns, not just the database schema.
The Weekend That Wasn't
A retail client scheduled a big-bang migration for a long weekend. The plan allocated 48 hours for migration and verification. Everything was tested. The rehearsal migration completed in 18 hours.
Production migration hit a table with 40 million transaction records that performed differently at scale than in testing (the test environment had 2 million records). The migration was still running at hour 36. Verification hadn't started. Go-live was in 12 hours.
We rolled back. The business operated on the old system for another six weeks while we optimised the migration pipeline for the actual data volume. The lesson: test with production-scale data, not a representative sample.
Practical Advice
If you're facing a legacy migration, here is what I'd tell you.
Start the assessment early. Months before the migration date. Profile the source data. Document every table, every field, every relationship. Find the tribal knowledge and write it down. The assessment will reveal complexity you didn't budget for. Better to know now.
Automate everything you can. Manual migration steps are error-prone and can't be rehearsed reliably. Every step that a human performs is a step that can go wrong differently in production than it did in rehearsal.
Rehearse the full migration at least twice. Not a subset. The full dataset, in an environment that mirrors production as closely as possible. Time it. Log every error. Fix what breaks. Rehearse again.
Plan your communication. Stakeholders need to know when systems will be unavailable, what the fallback is, and who to contact if something looks wrong. The technical migration might succeed perfectly while the organisational transition fails because nobody told the Auckland office they needed to log in differently on Monday.
Budget for the unexpected. Whatever your migration estimate is, add 50%. If you finish early, celebrate. If you don't, you'll be glad you planned for it.
Nobody gets excited about data migration. That's fine. The goal isn't excitement. The goal is moving a decade of business-critical data from one system to another without losing anything, breaking anything, or making anyone's Monday worse than it needs to be.
