Patterns

Fine-Tuning Cheatsheet

Comprehensive quick reference for fine-tuning LLMs with state-of-the-art techniques, parameters, and best practices for 2025.

Data Requirements

Dataset Size Guidelines

  • Minimum: 100-1000 examples
  • Optimal: 1000-10000 examples
  • Quality > Quantity: 50 high-quality examples can outperform 1000s
  • NLFT breakthrough: 219% improvement with 50 examples

Data Quality Checklist

  • • Diverse, representative samples
  • • Consistent formatting
  • • Proper deduplication
  • • Domain-specific relevance
  • • Error-free labeling

Data Preprocessing

  • • Tokenization normalization
  • • Quality filtering pipeline
  • • Prompt structure consistency
  • • Train/validation split (80/20)

Key Parameters

Core Training Parameters

  • Learning Rate: 1e-5 to 5e-4
  • Batch Size: 1-16 (memory dependent)
  • Epochs: 1-5 (avoid overfitting)
  • Warmup Steps: 10% of total steps
  • Weight Decay: 0.01-0.1

LoRA Parameters

  • LoRA Rank (r): 8-64
  • LoRA Alpha: 16-32
  • LoRA Dropout: 0.1
  • Target Modules: q_proj, v_proj, k_proj, o_proj
  • Use RSLoRA: True (better stability)

QLoRA Quantization

  • 4-bit NormalFloat (NF4): Optimal for normal weights
  • Double Quantization: True
  • Compute Type: bfloat16
  • Paged Optimizers: AdamW 8-bit

Memory Optimization

Memory Efficiency Techniques

  • Gradient Checkpointing: Enable
  • Mixed Precision (FP16): Enable
  • QLoRA 4-bit: Up to 75% memory reduction
  • Gradient Accumulation: 4-8 steps
  • DeepSpeed ZeRO: Stage 2 or 3

Hardware Requirements

  • 7B Model (QLoRA): 12-16GB VRAM
  • 13B Model (QLoRA): 24GB VRAM
  • 70B Model (QLoRA): 48GB VRAM
  • Full Fine-tuning 7B: 60-80GB VRAM

Cost Optimization

  • • LoRA: ~13$ vs $322 full fine-tuning
  • • Train only 0.19-1.16% of parameters
  • • Use model merging for combining capabilities
  • • Spot instances for non-critical training

Best Practices

Training Strategy

  • • Start with a strong base model (Llama 2/3, Mistral)
  • • Use validation sets to monitor overfitting
  • • Implement early stopping (patience: 2-3)
  • • Save checkpoints regularly
  • • Monitor loss curves and metrics continuously
  • • Use learning rate scheduling (cosine annealing)
  • • Preprocess data consistently
  • • Test on held-out data

Production Ready

  • • Version control datasets and models
  • • Document hyperparameters and results
  • • Use experiment tracking (MLflow, W&B)
  • • Implement proper evaluation metrics
  • • Test edge cases and failure modes
  • • Set up monitoring for model drift
  • • Plan rollback strategies
  • • Secure sensitive training data

Common Pitfalls

Training Issues

  • • Using too high learning rates (causes instability)
  • • Training for too many epochs (overfitting)
  • • Insufficient or biased training data
  • • Ignoring data preprocessing quality
  • • Not monitoring for overfitting
  • • Inconsistent evaluation metrics
  • • Forgetting to set random seeds
  • • Not testing edge cases

Production Failures

  • • Catastrophic forgetting (losing base capabilities)
  • • Model drift without monitoring
  • • Security vulnerabilities (data poisoning)
  • • Inadequate failure recovery plans
  • • Poor documentation of changes
  • • Missing evaluation on diverse test sets
  • • Ignoring bias amplification
  • • No performance degradation tracking

Framework Quick Commands

Hugging Face PEFT + LoRA

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)

QLoRA with BitsAndBytes

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id, quantization_config=bnb_config
)

Axolotl Configuration

# axolotl.yml
base_model: meta-llama/Llama-2-7b-hf
model_type: LlamaForCausalLM

adapter: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1

load_in_4bit: true
strict: false

datasets:
  - path: your_dataset.json
    type: alpaca

Unsloth (2-5x Faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-2-7b-bnb-4bit",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model, r=16, alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj"]
)

Quick Decision Tree

When to Fine-Tune

  • • Domain-specific knowledge needed
  • • Consistent tone/style required
  • • High-quality labeled data available
  • • Complex reasoning improvements needed
  • • Cost-effective vs API calls

Consider Alternatives

  • • RAG for knowledge integration
  • • Prompt engineering for behavior
  • • Few-shot learning for examples
  • • Tool use for external capabilities
  • • Ensemble methods for robustness

Avoid Fine-Tuning If

  • • Data quality is poor
  • • Dataset is too small (<100 examples)
  • • Base model already performs well
  • • Requirements change frequently
  • • Limited computational resources

Monitoring & Evaluation

Key Metrics to Track

  • Training Loss: Should decrease consistently
  • Validation Loss: Watch for overfitting gap
  • Perplexity: Lower is better for language tasks
  • BLEU/ROUGE: For generation quality
  • Accuracy: For classification tasks
  • F1 Score: For balanced evaluation
  • Inference Speed: Latency requirements
  • Memory Usage: Resource constraints

Production Monitoring

  • Model Drift: Performance degradation over time
  • Data Drift: Input distribution changes
  • Bias Detection: Fairness across groups
  • Error Analysis: Failure pattern tracking
  • Resource Usage: GPU/CPU/memory utilization
  • Response Quality: Human evaluation scores
  • User Feedback: Satisfaction metrics
  • Security Alerts: Adversarial inputs

Key Research Sources

ArXiv 2408.13296: "The Ultimate Guide to Fine-Tuning LLMs"

ArXiv 2305.14314: "QLoRA: Efficient Finetuning of Quantized LLMs"

Hugging Face PEFT Documentation: Parameter-efficient methods

LoRA Paper: Low-Rank Adaptation of Large Language Models

DeepSpeed Documentation: Memory optimization techniques

MLflow Tutorials: Experiment tracking and model management

Axolotl Framework: Production fine-tuning workflows

Unsloth Research: 2-5x faster training optimizations

Fine-Tuning Guide

closed
🚀

Getting Started

3
🧪

Methods & Techniques

1
⚙️

Implementation

1
🌐

Deployment

2
Built by Kortexya