Skip to main content

What Is Fine-Tuning? (And When You Don't Need It)

Fine-tuning trains AI models on your data. It's powerful, expensive, and usually not the right first step. When it actually matters, and when RAG or prompt engineering will get you further.
18 March 2024·5 min read
Mak Khan
Mak Khan
Chief AI Officer
"We need to fine-tune a model on our data" is one of the most common (and most often unnecessary) requests in enterprise AI. Here's what fine-tuning actually is, and why RAG is usually the better answer.

The Definition

Fine-tuning is the process of taking a pre-trained language model and training it further on a specific dataset to improve its performance on specific tasks. You're not building a model from scratch. You're teaching an existing model your organisation's language, patterns, and domain expertise.
Think of it as hiring a brilliant generalist and then training them specifically in your industry. The generalist capability remains; the domain-specific performance improves significantly.

How It Works

  1. Start with a base model - GPT-4, Claude, Llama, etc.
  2. Prepare training data - hundreds to thousands of examples of inputs and desired outputs from your domain
  3. Train - the model's parameters are adjusted to perform better on your specific task
  4. Evaluate - test the fine-tuned model against held-out examples
  5. Deploy - use the fine-tuned model for your specific application
The result is a model that's still generally capable but performs significantly better on your specific task: claims analysis, contract review, medical coding, whatever you trained it on.

Fine-Tuning vs RAG: The Key Decision

Fine-TuningRAG
What changesThe model itselfThe context provided to the model
Cost$10K-500K+$1K-50K
Data needsHundreds-thousands of examplesAny document collection
Update speedRetrain required (days-weeks)Update index (minutes-hours)
Data sovereigntyData used in trainingData stays in your systems
Best forConsistent behaviour patternsKnowledge-grounded responses
Use RAG when: you need the model to answer questions based on your documents, policies, or knowledge base. This covers 80%+ of enterprise use cases.
Use fine-tuning when: you need the model to consistently behave in a specific way (a particular tone, a specific output format, a domain-specific reasoning pattern) across thousands of interactions where providing examples every time isn't practical.
Use both when: you need domain-specific behaviour AND knowledge-grounded responses. Fine-tuned model + RAG pipeline is the most powerful (and most expensive) approach.

When Enterprise Should Consider Fine-Tuning

  • High-volume, pattern-consistent tasks - processing thousands of similar documents daily with the same output structure
  • Domain-specific language - your industry uses terminology that general models consistently misinterpret
  • Consistent tone/style requirements - customer communications that must match your brand voice across thousands of interactions
  • Cost optimisation - a smaller fine-tuned model can be cheaper at high volume than a larger general model with extensive prompting

When Enterprise Should Avoid Fine-Tuning

  • Your first AI capability - start with RAG, prove the value, then fine-tune if needed
  • Rapidly changing knowledge - fine-tuning bakes in the knowledge at training time; RAG accesses current knowledge
  • Small data volume - fine-tuning needs hundreds of quality examples; without them, it can make the model worse
  • Exploration phase - until you know exactly what you need, the flexibility of RAG + prompting is more valuable
The 90/10 Rule
In our experience, 90% of enterprise AI capabilities work well with RAG + good prompt engineering. Fine-tuning is the 10%, powerful when needed, but not the starting point. Build your AI foundation on RAG first, and fine-tune specific capabilities once you have the data and the need.
How much data do we need for fine-tuning?
Minimum: 200-500 high-quality input-output examples. For best results: 1,000-5,000+. Quality matters more than quantity. 300 carefully curated examples outperform 3,000 noisy ones. If you don't have this data yet, RAG is your better starting point.
Can we fine-tune models like GPT-4?
As of early 2024, GPT-3.5 supports fine-tuning via API. GPT-4 fine-tuning is in limited access. Open-source models (Llama 2, Mistral) can be fine-tuned freely and self-hosted for full data control. The fine-tuning space is evolving quickly.