"We need to fine-tune a model on our data" is one of the most common (and most often unnecessary) requests in enterprise AI. Here's what fine-tuning actually is, and why RAG is usually the better answer.
The Definition
Fine-tuning is the process of taking a pre-trained language model and training it further on a specific dataset to improve its performance on specific tasks. You're not building a model from scratch. You're teaching an existing model your organisation's language, patterns, and domain expertise.
Think of it as hiring a brilliant generalist and then training them specifically in your industry. The generalist capability remains; the domain-specific performance improves significantly.
How It Works
- Start with a base model - GPT-4, Claude, Llama, etc.
- Prepare training data - hundreds to thousands of examples of inputs and desired outputs from your domain
- Train - the model's parameters are adjusted to perform better on your specific task
- Evaluate - test the fine-tuned model against held-out examples
- Deploy - use the fine-tuned model for your specific application
The result is a model that's still generally capable but performs significantly better on your specific task: claims analysis, contract review, medical coding, whatever you trained it on.
Fine-Tuning vs RAG: The Key Decision
| Fine-Tuning | RAG | |
|---|---|---|
| What changes | The model itself | The context provided to the model |
| Cost | $10K-500K+ | $1K-50K |
| Data needs | Hundreds-thousands of examples | Any document collection |
| Update speed | Retrain required (days-weeks) | Update index (minutes-hours) |
| Data sovereignty | Data used in training | Data stays in your systems |
| Best for | Consistent behaviour patterns | Knowledge-grounded responses |
Use RAG when: you need the model to answer questions based on your documents, policies, or knowledge base. This covers 80%+ of enterprise use cases.
Use fine-tuning when: you need the model to consistently behave in a specific way (a particular tone, a specific output format, a domain-specific reasoning pattern) across thousands of interactions where providing examples every time isn't practical.
Use both when: you need domain-specific behaviour AND knowledge-grounded responses. Fine-tuned model + RAG pipeline is the most powerful (and most expensive) approach.
When Enterprise Should Consider Fine-Tuning
- High-volume, pattern-consistent tasks - processing thousands of similar documents daily with the same output structure
- Domain-specific language - your industry uses terminology that general models consistently misinterpret
- Consistent tone/style requirements - customer communications that must match your brand voice across thousands of interactions
- Cost optimisation - a smaller fine-tuned model can be cheaper at high volume than a larger general model with extensive prompting
When Enterprise Should Avoid Fine-Tuning
- Your first AI capability - start with RAG, prove the value, then fine-tune if needed
- Rapidly changing knowledge - fine-tuning bakes in the knowledge at training time; RAG accesses current knowledge
- Small data volume - fine-tuning needs hundreds of quality examples; without them, it can make the model worse
- Exploration phase - until you know exactly what you need, the flexibility of RAG + prompting is more valuable
The 90/10 Rule
In our experience, 90% of enterprise AI capabilities work well with RAG + good prompt engineering. Fine-tuning is the 10%, powerful when needed, but not the starting point. Build your AI foundation on RAG first, and fine-tune specific capabilities once you have the data and the need.
- How much data do we need for fine-tuning?
- Minimum: 200-500 high-quality input-output examples. For best results: 1,000-5,000+. Quality matters more than quantity. 300 carefully curated examples outperform 3,000 noisy ones. If you don't have this data yet, RAG is your better starting point.
- Can we fine-tune models like GPT-4?
- As of early 2024, GPT-3.5 supports fine-tuning via API. GPT-4 fine-tuning is in limited access. Open-source models (Llama 2, Mistral) can be fine-tuned freely and self-hosted for full data control. The fine-tuning space is evolving quickly.
