Fine-Tuning Guide

🚀

Getting Started

🧪

Methods & Techniques

⚙️

Implementation

🌐

Deployment

Fine-Tuning Cheatsheet

Comprehensive quick reference for fine-tuning LLMs with state-of-the-art techniques, parameters, and best practices for 2025.

Data Requirements

Dataset Size Guidelines

• Minimum: 100-1000 examples
• Optimal: 1000-10000 examples
• Quality > Quantity: 50 high-quality examples can outperform 1000s
• NLFT breakthrough: 219% improvement with 50 examples

Data Quality Checklist

• Diverse, representative samples
• Consistent formatting
• Proper deduplication
• Domain-specific relevance
• Error-free labeling

Data Preprocessing

• Tokenization normalization
• Quality filtering pipeline
• Prompt structure consistency
• Train/validation split (80/20)

Key Parameters

Core Training Parameters

• Learning Rate: 1e-5 to 5e-4
• Batch Size: 1-16 (memory dependent)
• Epochs: 1-5 (avoid overfitting)
• Warmup Steps: 10% of total steps
• Weight Decay: 0.01-0.1

LoRA Parameters

• LoRA Rank (r): 8-64
• LoRA Alpha: 16-32
• LoRA Dropout: 0.1
• Target Modules: q_proj, v_proj, k_proj, o_proj
• Use RSLoRA: True (better stability)

QLoRA Quantization

• 4-bit NormalFloat (NF4): Optimal for normal weights
• Double Quantization: True
• Compute Type: bfloat16
• Paged Optimizers: AdamW 8-bit

Memory Optimization

Memory Efficiency Techniques

• Gradient Checkpointing: Enable
• Mixed Precision (FP16): Enable
• QLoRA 4-bit: Up to 75% memory reduction
• Gradient Accumulation: 4-8 steps
• DeepSpeed ZeRO: Stage 2 or 3

Hardware Requirements

• 7B Model (QLoRA): 12-16GB VRAM
• 13B Model (QLoRA): 24GB VRAM
• 70B Model (QLoRA): 48GB VRAM
• Full Fine-tuning 7B: 60-80GB VRAM

Cost Optimization

• LoRA: ~13$ vs $322 full fine-tuning
• Train only 0.19-1.16% of parameters
• Use model merging for combining capabilities
• Spot instances for non-critical training

Best Practices

Training Strategy

• Start with a strong base model (Llama 2/3, Mistral)
• Use validation sets to monitor overfitting
• Implement early stopping (patience: 2-3)
• Save checkpoints regularly
• Monitor loss curves and metrics continuously
• Use learning rate scheduling (cosine annealing)
• Preprocess data consistently
• Test on held-out data

Production Ready

• Version control datasets and models
• Document hyperparameters and results
• Use experiment tracking (MLflow, W&B)
• Implement proper evaluation metrics
• Test edge cases and failure modes
• Set up monitoring for model drift
• Plan rollback strategies
• Secure sensitive training data

Common Pitfalls

Training Issues

• Using too high learning rates (causes instability)
• Training for too many epochs (overfitting)
• Insufficient or biased training data
• Ignoring data preprocessing quality
• Not monitoring for overfitting
• Inconsistent evaluation metrics
• Forgetting to set random seeds
• Not testing edge cases

Production Failures

• Catastrophic forgetting (losing base capabilities)
• Model drift without monitoring
• Security vulnerabilities (data poisoning)
• Inadequate failure recovery plans
• Poor documentation of changes
• Missing evaluation on diverse test sets
• Ignoring bias amplification
• No performance degradation tracking

Framework Quick Commands

Hugging Face PEFT + LoRA

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

QLoRA with BitsAndBytes

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id, quantization_config=bnb_config
)

Axolotl Configuration

# axolotl.yml
base_model: meta-llama/Llama-2-7b-hf
model_type: LlamaForCausalLM

adapter: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1

load_in_4bit: true
strict: false

datasets:
  - path: your_dataset.json
    type: alpaca

Unsloth (2-5x Faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-2-7b-bnb-4bit",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model, r=16, alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj"]
)

Quick Decision Tree

When to Fine-Tune

• Domain-specific knowledge needed
• Consistent tone/style required
• High-quality labeled data available
• Complex reasoning improvements needed
• Cost-effective vs API calls

Consider Alternatives

• RAG for knowledge integration
• Prompt engineering for behavior
• Few-shot learning for examples
• Tool use for external capabilities
• Ensemble methods for robustness

Avoid Fine-Tuning If

• Data quality is poor
• Dataset is too small (<100 examples)
• Base model already performs well
• Requirements change frequently
• Limited computational resources

Monitoring & Evaluation

Key Metrics to Track

• Training Loss: Should decrease consistently
• Validation Loss: Watch for overfitting gap
• Perplexity: Lower is better for language tasks
• BLEU/ROUGE: For generation quality
• Accuracy: For classification tasks
• F1 Score: For balanced evaluation
• Inference Speed: Latency requirements
• Memory Usage: Resource constraints

Production Monitoring

• Model Drift: Performance degradation over time
• Data Drift: Input distribution changes
• Bias Detection: Fairness across groups
• Error Analysis: Failure pattern tracking
• Resource Usage: GPU/CPU/memory utilization
• Response Quality: Human evaluation scores
• User Feedback: Satisfaction metrics
• Security Alerts: Adversarial inputs

Key Research Sources

• ArXiv 2408.13296: "The Ultimate Guide to Fine-Tuning LLMs"

• ArXiv 2305.14314: "QLoRA: Efficient Finetuning of Quantized LLMs"

• Hugging Face PEFT Documentation: Parameter-efficient methods

• LoRA Paper: Low-Rank Adaptation of Large Language Models

• DeepSpeed Documentation: Memory optimization techniques

• MLflow Tutorials: Experiment tracking and model management

• Axolotl Framework: Production fine-tuning workflows

• Unsloth Research: 2-5x faster training optimizations

Fine-Tuning Cheatsheet

Comprehensive quick reference for fine-tuning LLMs with state-of-the-art techniques, parameters, and best practices for 2025.

Data Requirements

Dataset Size Guidelines

• Minimum: 100-1000 examples
• Optimal: 1000-10000 examples
• Quality > Quantity: 50 high-quality examples can outperform 1000s
• NLFT breakthrough: 219% improvement with 50 examples

Data Quality Checklist

• Diverse, representative samples
• Consistent formatting
• Proper deduplication
• Domain-specific relevance
• Error-free labeling

Data Preprocessing

• Tokenization normalization
• Quality filtering pipeline
• Prompt structure consistency
• Train/validation split (80/20)

Key Parameters

Core Training Parameters

• Learning Rate: 1e-5 to 5e-4
• Batch Size: 1-16 (memory dependent)
• Epochs: 1-5 (avoid overfitting)
• Warmup Steps: 10% of total steps
• Weight Decay: 0.01-0.1

LoRA Parameters

• LoRA Rank (r): 8-64
• LoRA Alpha: 16-32
• LoRA Dropout: 0.1
• Target Modules: q_proj, v_proj, k_proj, o_proj
• Use RSLoRA: True (better stability)

QLoRA Quantization

• 4-bit NormalFloat (NF4): Optimal for normal weights
• Double Quantization: True
• Compute Type: bfloat16
• Paged Optimizers: AdamW 8-bit

Memory Optimization

Memory Efficiency Techniques

• Gradient Checkpointing: Enable
• Mixed Precision (FP16): Enable
• QLoRA 4-bit: Up to 75% memory reduction
• Gradient Accumulation: 4-8 steps
• DeepSpeed ZeRO: Stage 2 or 3

Hardware Requirements

• 7B Model (QLoRA): 12-16GB VRAM
• 13B Model (QLoRA): 24GB VRAM
• 70B Model (QLoRA): 48GB VRAM
• Full Fine-tuning 7B: 60-80GB VRAM

Cost Optimization

• LoRA: ~13$ vs $322 full fine-tuning
• Train only 0.19-1.16% of parameters
• Use model merging for combining capabilities
• Spot instances for non-critical training

Best Practices

Training Strategy

• Start with a strong base model (Llama 2/3, Mistral)
• Use validation sets to monitor overfitting
• Implement early stopping (patience: 2-3)
• Save checkpoints regularly
• Monitor loss curves and metrics continuously
• Use learning rate scheduling (cosine annealing)
• Preprocess data consistently
• Test on held-out data

Production Ready

• Version control datasets and models
• Document hyperparameters and results
• Use experiment tracking (MLflow, W&B)
• Implement proper evaluation metrics
• Test edge cases and failure modes
• Set up monitoring for model drift
• Plan rollback strategies
• Secure sensitive training data

Common Pitfalls

Training Issues

• Using too high learning rates (causes instability)
• Training for too many epochs (overfitting)
• Insufficient or biased training data
• Ignoring data preprocessing quality
• Not monitoring for overfitting
• Inconsistent evaluation metrics
• Forgetting to set random seeds
• Not testing edge cases

Production Failures

• Catastrophic forgetting (losing base capabilities)
• Model drift without monitoring
• Security vulnerabilities (data poisoning)
• Inadequate failure recovery plans
• Poor documentation of changes
• Missing evaluation on diverse test sets
• Ignoring bias amplification
• No performance degradation tracking

Framework Quick Commands

Hugging Face PEFT + LoRA

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

QLoRA with BitsAndBytes

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id, quantization_config=bnb_config
)

Axolotl Configuration

# axolotl.yml
base_model: meta-llama/Llama-2-7b-hf
model_type: LlamaForCausalLM

adapter: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1

load_in_4bit: true
strict: false

datasets:
  - path: your_dataset.json
    type: alpaca

Unsloth (2-5x Faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-2-7b-bnb-4bit",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model, r=16, alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj"]
)

Quick Decision Tree

When to Fine-Tune

• Domain-specific knowledge needed
• Consistent tone/style required
• High-quality labeled data available
• Complex reasoning improvements needed
• Cost-effective vs API calls

Consider Alternatives

• RAG for knowledge integration
• Prompt engineering for behavior
• Few-shot learning for examples
• Tool use for external capabilities
• Ensemble methods for robustness

Avoid Fine-Tuning If

• Data quality is poor
• Dataset is too small (<100 examples)
• Base model already performs well
• Requirements change frequently
• Limited computational resources

Monitoring & Evaluation

Key Metrics to Track

• Training Loss: Should decrease consistently
• Validation Loss: Watch for overfitting gap
• Perplexity: Lower is better for language tasks
• BLEU/ROUGE: For generation quality
• Accuracy: For classification tasks
• F1 Score: For balanced evaluation
• Inference Speed: Latency requirements
• Memory Usage: Resource constraints

Production Monitoring

• Model Drift: Performance degradation over time
• Data Drift: Input distribution changes
• Bias Detection: Fairness across groups
• Error Analysis: Failure pattern tracking
• Resource Usage: GPU/CPU/memory utilization
• Response Quality: Human evaluation scores
• User Feedback: Satisfaction metrics
• Security Alerts: Adversarial inputs

Key Research Sources

• ArXiv 2408.13296: "The Ultimate Guide to Fine-Tuning LLMs"

• ArXiv 2305.14314: "QLoRA: Efficient Finetuning of Quantized LLMs"

• Hugging Face PEFT Documentation: Parameter-efficient methods

• LoRA Paper: Low-Rank Adaptation of Large Language Models

• DeepSpeed Documentation: Memory optimization techniques

• MLflow Tutorials: Experiment tracking and model management

• Axolotl Framework: Production fine-tuning workflows

• Unsloth Research: 2-5x faster training optimizations

Fine-Tuning Guide

closed

Fine-Tuning Guide

🚀

Getting Started

🧪

Methods & Techniques

⚙️

Implementation

🌐

Agentic Design

Agentic Design

Fine-Tuning Guide

Getting Started

Overview & Quick Start

Cheatsheet & Best Practices

Model Selection Guide

Methods & Techniques

Implementation

Deployment

Fine-Tuning Cheatsheet

Data Requirements

Dataset Size Guidelines

Data Quality Checklist

Data Preprocessing

Key Parameters

Core Training Parameters

LoRA Parameters

QLoRA Quantization

Memory Optimization

Memory Efficiency Techniques

Hardware Requirements

Cost Optimization

Best Practices

Training Strategy

Production Ready

Common Pitfalls

Training Issues

Production Failures

Framework Quick Commands

Hugging Face PEFT + LoRA

QLoRA with BitsAndBytes

Axolotl Configuration

Unsloth (2-5x Faster)

Quick Decision Tree

When to Fine-Tune

Consider Alternatives

Avoid Fine-Tuning If

Monitoring & Evaluation

Key Metrics to Track

Production Monitoring

Key Research Sources

Fine-Tuning Cheatsheet

Data Requirements

Dataset Size Guidelines

Data Quality Checklist

Data Preprocessing

Key Parameters

Core Training Parameters

LoRA Parameters

QLoRA Quantization

Memory Optimization

Memory Efficiency Techniques

Hardware Requirements

Cost Optimization

Best Practices

Training Strategy

Production Ready

Common Pitfalls

Training Issues

Production Failures

Framework Quick Commands

Hugging Face PEFT + LoRA

QLoRA with BitsAndBytes

Axolotl Configuration

Unsloth (2-5x Faster)

Quick Decision Tree

When to Fine-Tune

Consider Alternatives

Avoid Fine-Tuning If

Monitoring & Evaluation

Key Metrics to Track

Production Monitoring

Key Research Sources

Fine-Tuning Guide

Fine-Tuning Guide

Getting Started

Overview & Quick Start

Cheatsheet & Best Practices

Model Selection Guide