What Is Fine-Tuning? (And When You Don't Need It)

"We need to fine-tune a model on our data" is one of the most common (and most often unnecessary) requests in enterprise AI. Here's what fine-tuning actually is, and why RAG is usually the better answer.

The Definition

Fine-tuning is the process of taking a pre-trained language model and training it further on a specific dataset to improve its performance on specific tasks. You're not building a model from scratch. You're teaching an existing model your organisation's language, patterns, and domain expertise.

Think of it as hiring a brilliant generalist and then training them specifically in your industry. The generalist capability remains; the domain-specific performance improves significantly.

How It Works

Start with a base model - GPT-4, Claude, Llama, etc.
Prepare training data - hundreds to thousands of examples of inputs and desired outputs from your domain
Train - the model's parameters are adjusted to perform better on your specific task
Evaluate - test the fine-tuned model against held-out examples
Deploy - use the fine-tuned model for your specific application

The result is a model that's still generally capable but performs significantly better on your specific task: claims analysis, contract review, medical coding, whatever you trained it on.

Fine-Tuning vs RAG: The Key Decision

	Fine-Tuning	RAG
What changes	The model itself	The context provided to the model
Cost	$10K-500K+	$1K-50K
Data needs	Hundreds-thousands of examples	Any document collection
Update speed	Retrain required (days-weeks)	Update index (minutes-hours)
Data sovereignty	Data used in training	Data stays in your systems
Best for	Consistent behaviour patterns	Knowledge-grounded responses

Use RAG when: you need the model to answer questions based on your documents, policies, or knowledge base. This covers 80%+ of enterprise use cases.

Use fine-tuning when: you need the model to consistently behave in a specific way (a particular tone, a specific output format, a domain-specific reasoning pattern) across thousands of interactions where providing examples every time isn't practical.

Use both when: you need domain-specific behaviour AND knowledge-grounded responses. Fine-tuned model + RAG pipeline is the most powerful (and most expensive) approach.

When Enterprise Should Consider Fine-Tuning

High-volume, pattern-consistent tasks - processing thousands of similar documents daily with the same output structure
Domain-specific language - your industry uses terminology that general models consistently misinterpret
Consistent tone/style requirements - customer communications that must match your brand voice across thousands of interactions
Cost optimisation - a smaller fine-tuned model can be cheaper at high volume than a larger general model with extensive prompting

When Enterprise Should Avoid Fine-Tuning

Your first AI capability - start with RAG, prove the value, then fine-tune if needed
Rapidly changing knowledge - fine-tuning bakes in the knowledge at training time; RAG accesses current knowledge
Small data volume - fine-tuning needs hundreds of quality examples; without them, it can make the model worse
Exploration phase - until you know exactly what you need, the flexibility of RAG + prompting is more valuable

The 90/10 Rule

In our experience, 90% of enterprise AI capabilities work well with RAG + good prompt engineering. Fine-tuning is the 10%, powerful when needed, but not the starting point. Build your AI foundation on RAG first, and fine-tune specific capabilities once you have the data and the need.

How much data do we need for fine-tuning?: Minimum: 200-500 high-quality input-output examples. For best results: 1,000-5,000+. Quality matters more than quantity. 300 carefully curated examples outperform 3,000 noisy ones. If you don't have this data yet, RAG is your better starting point.
Can we fine-tune models like GPT-4?: As of early 2024, GPT-3.5 supports fine-tuning via API. GPT-4 fine-tuning is in limited access. Open-source models (Llama 2, Mistral) can be fine-tuned freely and self-hosted for full data control. The fine-tuning space is evolving quickly.