Fine-Tuning Guide
Getting Started
Methods & Techniques
Implementation
Deployment
Choosing the Right Base Model
Comprehensive guide to selecting the optimal foundation model for your fine-tuning project based on performance, licensing, hardware requirements, and use case specifics.
🎯 Find your perfect model in 30 seconds
Based on your use case, hardware, and experience level
🔒 Sign in required for personalized recommendations
Model Selection Decision Framework
Start Here: Define Requirements
- • Use Case: Chat, code, analysis, multimodal
- • Languages: English-only vs multilingual
- • Context Length: Short vs long documents
- • Latency: Real-time vs batch processing
- • Budget: Hardware and inference costs
Licensing Considerations
- • Commercial Use: Apache 2.0 > MIT > Custom
- • Enterprise: Check derivative work clauses
- • Attribution: Required for most licenses
- • Liability: No warranty in open source
- • Patents: Apache 2.0 provides protection
Hardware Constraints
- • 7B Models: 14-16GB VRAM (consumer)
- • 13B Models: 26-30GB VRAM (prosumer)
- • 30B+ Models: 60GB+ VRAM (enterprise)
- • 70B+ Models: Multiple GPUs required
- • Quantization: 50-75% memory reduction
🚀 2025 Breakthrough Models (Just Released)
DeepSeek V3.1
685B • MIT • Hybrid thinking mode beats GPT-5
OpenAI GPT-OSS
120B/20B • Apache 2.0 • OpenAI's first open models
IBM Granite 3.0
8B • Apache 2.0 • Enterprise-ready, 116 languages
Gemma 3 270M
270M • Edge AI • 0.75% battery usage
Qwen-Image-Edit
20B • Apache 2.0 • Advanced image editing with text rendering
OpenVLA
7B • MIT • Vision-language-action for robotics
Cisco Foundation-sec
8B • Apache 2.0 • First open cybersecurity LLM
YOLO v11
Variable • AGPL-3.0 • Latest object detection, 22% fewer params
Top Recommendations by Use Case
Chat & Conversation
- • Ultra-Budget: TinyLlama 1.1B, Gemma 3 270M
- • Budget: SmolLM3 3B, CroissantLLM 1.3B
- • Balanced: IBM Granite 3.0 8B, OpenAI GPT-OSS 20B
- • Premium: DeepSeek V3.1 (685B), Qwen 2.5-Max
Code Generation
- • Enterprise: IBM Granite 3.0 (116 languages)
- • Specialized: StarCoder 15B, DeepSeek Coder V2
- • Latest: OpenAI GPT-OSS 120B, DeepSeek V3.1
- • Edge: MobileLLM-R1 (math/coding on mobile)
Analysis & Reasoning
- • State-of-Art: DeepSeek V3.1 (hybrid thinking)
- • Compact: MobileLLM-R1 (2-5x performance boost)
- • Enterprise: Qwen 2.5-Max, IBM Granite 3.0
- • Agentic: ChatGLM-4.5 (task decomposition)
Enterprise Use
- • Latest Flagship: DeepSeek V3.1, Qwen 2.5-Max
- • Enterprise-Ready: IBM Granite 3.0 series
- • OpenAI Open: GPT-OSS 120B/20B (Apache 2.0)
- • Cost-Effective: ChatGLM-4.5 (cheaper than DeepSeek)
Multilingual
- • 46 Languages: BLOOM 176B (BigScience)
- • Chinese/English: Yi 1.5 34B, Baichuan 4
- • French/English: CroissantLLM (truly bilingual)
- • Japanese: Rakuten AI 2.0 (business-optimized)
Edge & Mobile
- • Ultra-Efficient: Gemma 3 270M (0.75% battery)
- • Reasoning: MobileLLM-R1 950M (2-5x boost)
- • Compact: TinyLlama 1.1B, CroissantLLM 1.3B
- • Quantized: GGUF format (Q2-Q8 levels)
Computer Vision
- • Object Detection: YOLO v11, YOLOv10, Grounding DINO
- • Segmentation: SAM 2 (44 FPS), TinySAM
- • Vision-Language: LLaVA 1.6, Florence-2, MiniGPT-4
- • Document AI: Granite-Docling-258M, PaddleOCR 3.0, TrOCR
Search & Retrieval
- • Image Embedding: CLIP-ViT-L/14, OpenCLIP, SigLIP 2
- • Text Retrieval: ColBERT-v2, E5-Large-v2, BGE-M3
- • Reranking: BGE Reranker v2-M3, Jina Reranker v2
- • Neural Search: OpenVision, all-MiniLM-L6-v2
Audio & Speech
- • Speech Recognition: Wav2Vec2, SpeechT5
- • Speaker Tasks: WavLM (verification, diarization)
- • Synthesis: SpeechT5 (unified speech-text)
- • Self-supervised: Wav2Vec2 (representation learning)
Domain Specialists
- • Finance: FinGPT 7B, BloombergGPT 50B
- • Medical: BioGPT, Palmyra-Med 70B, OpenBioLLM 70B
- • Legal: LawLLM 7B (US legal system)
- • Cybersecurity: Cisco Foundation-sec-8B, Trend Cybertron
Time Series & Forecasting
- • Foundation: Chronos-T5 (250x faster), TimesFM 200M
- • Best Performance: Moirai 2.0 (#1 GIFT-Eval)
- • Business: Prophet (seasonality), NeuralProphet
- • Zero-shot: TimesFM (100B time-points trained)
Tabular & Structured Data
- • Deep Learning: TabNet (attention-based)
- • Gradient Boosting: XGBoost, LightGBM
- • Competitions: XGBoost (proven winner)
- • Efficiency: LightGBM (fast training)
Specialized Applications
- • Creative: Qwen-Image-Edit, InstantID, MusicGen
- • Robotics: OpenVLA 7B, SmolVLA 450M
- • Scientific: UMA (Meta), ChemBERTa-2, BioGPT
- • Security: Cisco Foundation-sec, Trend Cybertron
Detailed Model Comparison
Model | Size | License | VRAM (FP16) | Strengths | Best For |
---|---|---|---|---|---|
Llama 3.3 70B | 70B | Custom (restrictive) | 140GB | Proven, multilingual, community | General purpose, enterprise |
Mistral Small 3.1 | 22B | Apache 2.0 | 44GB | Fast, commercial-friendly | Commercial deployment |
Qwen 2.5 72B | 72B | Apache 2.0 | 144GB | Data analysis, structured output | Enterprise data tasks |
Gemma 3 27B | 27B | Custom (restrictive) | 54GB | Efficient, Google ecosystem | Research, prototyping |
Phi-4 14B | 14B | MIT | 28GB | Strong reasoning, compact | Resource-constrained |
DeepSeek R1 | 671B | MIT | 1342GB+ | Advanced reasoning, coding | Research, complex tasks |
SmolLM3 3B | 3B | Apache 2.0 | 6GB | Multilingual, long context (64k) | Edge devices, mobile |
VibeVoice 1.5B | 1.5B | MIT (disabled) | 4GB | Text-to-speech, 90min audio | Voice synthesis (research) |
Qwen2.5-VL 7B | 7B | Apache 2.0 | 14GB | Vision, OCR, video understanding | Multimodal applications |
ModernBERT | 139M/395M | Apache 2.0 | 1-2GB | Embeddings, 8k context | Text embeddings, RAG |
Nomic-Embed v2 | 100M | Apache 2.0 | 500MB | MoE embeddings, 100 languages | Multilingual embeddings |
FLUX.1 [dev] | 12B | Custom (non-commercial) | 24GB | Text-to-image, best quality | Image generation (research) |
FLUX.1 [schnell] | 12B | Apache 2.0 | 24GB | Fast text-to-image generation | Commercial image generation |
Stable Diffusion 3 | 2B/8B | Custom (restrictive) | 4-16GB | Text-to-image, established | Legacy image generation |
Whisper Large v3 | 1.55B | MIT | 3GB | Speech recognition, 99 languages | Speech-to-text applications |
Distil-Whisper v3 | 756M | MIT | 1.5GB | 6x faster, 49% smaller than Whisper | Real-time transcription |
OpenAI GPT-OSS 120B | 120B | Apache 2.0 | 240GB | OpenAI's first open-weight model, o4-mini level | General purpose, reasoning |
OpenAI GPT-OSS 20B | 20B | Apache 2.0 | 40GB | Compact version, o3-mini level performance | Edge deployment, reasoning |
Qwen3 235B-A22B | 235B | Apache 2.0 | 470GB | MoE, 119 languages, beats DeepSeek R1 | Multilingual, enterprise |
Qwen3 32B | 32B | Apache 2.0 | 64GB | Dense model, excellent multilingual | Production deployment |
OLMo 2 32B | 32B | Apache 2.0 | 64GB | Fully open, beats GPT-3.5 Turbo | Research, transparency |
NVIDIA Nemotron Nano 9B | 9B | Apache 2.0 | 18GB | Mamba-Transformer hybrid, 6x faster | Real-time reasoning |
Command R+ 104B | 104B | CC-BY-NC-4.0 | 208GB | RAG optimized, tool use, 10 languages | Enterprise RAG, agents |
MiniCPM-o 2.6 | 8B | Apache 2.0 | 16GB | Multimodal, beats GPT-4o on vision | Mobile multimodal |
OpenBioLLM 70B | 70B | Apache 2.0 | 140GB | Medical domain, beats Med-PaLM-2 | Healthcare, biomedical |
StarCoder 15B | 15B | OpenRAIL | 30GB | Code generation, 80+ languages | Code completion, development |
MusicGen | 3.3B | CC-BY-NC-4.0 | 7GB | Music generation from text prompts | Audio/music creation |
OpenSora 2.0 | Transformer | Apache 2.0 | Variable | Video generation, commercial quality | Video production |
DeepSeek V3.1 | 685B | MIT | 1370GB | Hybrid thinking mode, beats GPT-5 | Advanced reasoning, research |
Qwen 2.5-Max | ~70B | Apache 2.0 | 140GB | Alibaba's latest, beats DeepSeek V3 | Enterprise, multimodal |
IBM Granite 3.0 8B | 8B | Apache 2.0 | 16GB | Enterprise model, 116 programming languages | Enterprise workflows, tools |
Yi 1.5 34B | 34B | Apache 2.0 | 68GB | Bilingual (Chinese/English), reasoning | 01.AI flagship, bilingual |
Baichuan 4 | 13B | Apache 2.0 | 26GB | Chinese domain specialist (law, finance) | Chinese business applications |
ChatGLM-4.5 | ~13B | Apache 2.0 | 26GB | Agentic AI, cheaper than DeepSeek | Agent workflows, Chinese |
CroissantLLM | 1.3B | MIT | 3GB | Truly bilingual French-English | French language applications |
BLOOM | 176B | BigScience OpenRAIL-M | 352GB | 46 languages, 13 programming languages | Multilingual research |
Rakuten AI 2.0 | MoE | Apache 2.0 | Variable | Japanese-optimized, MoE architecture | Japanese business applications |
FinGPT | 7B | MIT | 14GB | Financial domain, sentiment analysis | Financial analysis, trading |
BloombergGPT | 50B | Research only | 100GB | Finance-specific training data | Financial NLP, research |
Palmyra-Med 70B | 70B | Commercial license | 140GB | Medical domain, beats Med-PaLM-2 | Healthcare applications |
LawLLM | 7B | Apache 2.0 | 14GB | US legal system specialist | Legal research, compliance |
Gemma 3 270M | 270M | Gemma License | 600MB | Ultra-efficient edge AI, 0.75% battery | Mobile, edge devices |
TinyLlama | 1.1B | Apache 2.0 | 2.2GB | Compact LLaMA architecture | Resource-constrained devices |
MobileLLM-R1 | 950M | Apache 2.0 | 2GB | Edge reasoning, 2-5x performance boost | Mobile reasoning, math |
Cisco Foundation-sec-8B | 8B | Apache 2.0 | 16GB | Security-focused, threat detection | Cybersecurity, SOC operations |
Trend Cybertron | 8B | Open Source | 16GB | Autonomous cybersecurity agents | Security automation, defense |
Qwen-Image-Edit | 20B | Apache 2.0 | 40GB | Precise image editing, text rendering | Image editing, visual design |
InstantID | Diffusion | Apache 2.0 | 8GB | Identity-preserving generation | Avatar creation, face swapping |
ControlNet | Various | Apache 2.0 | Variable | Controlled image generation | Guided image synthesis |
OpenVLA | 7B | MIT | 14GB | Vision-language-action for robots | Robotic manipulation |
SmolVLA | 450M | Apache 2.0 | 1GB | Compact robotics model | Lightweight robotics |
UMA (Meta) | Variable | Open Source | Variable | Universal atomic simulation, 10000x faster DFT | Materials science, chemistry |
ChemBERTa-2 | 110M | MIT | 500MB | Chemical foundation model, SMILES | Drug discovery, chemistry |
BioGPT | 355M | MIT | 1GB | Biomedical text generation, 78.2% PubMedQA | Biomedical research, literature |
IBM SMILES-TED | Transformer | Apache 2.0 | Variable | 91M SMILES samples, chemical synthesis | Materials discovery, green chemistry |
YOLO v11 | Varies (n,s,m,l,x) | AGPL-3.0 | Variable | Latest object detection, 22% fewer params | Real-time object detection |
YOLOv10 | Varies (n,s,m,l,x) | AGPL-3.0 | Variable | End-to-end detection, no NMS needed | Efficient object detection |
SAM 2 | Transformer | Apache 2.0 | Variable | Segment anything in images/videos, 44 FPS | Image/video segmentation |
Florence-2 | 230M/770M | MIT | 1-2GB | Lightweight VLM, captioning, detection | Vision-language tasks |
Grounding DINO | Transformer | Apache 2.0 | Variable | Open-set detection, 52.5 AP COCO zero-shot | Zero-shot object detection |
LLaVA 1.6 | 7B/13B/34B | Apache 2.0 | 14-68GB | Large language and vision assistant | Multimodal conversations |
MiniGPT-4 | 7B/13B | BSD 3-Clause | 14-26GB | Aligned vision encoder with LLM | Image understanding, creativity |
BLIP-2 | 2.7B/7.8B | BSD 3-Clause | 6-16GB | Q-Former bridging vision and language | Vision-language pre-training |
PaLI-3 | 5B | Apache 2.0 | 10GB | Multilingual vision-language, 100+ languages | Multilingual VL tasks |
PaddleOCR 3.0 | Various | Apache 2.0 | Variable | PP-OCRv5, 13-point accuracy gain | OCR, document parsing |
TrOCR | Transformer | MIT | Variable | End-to-end text recognition | Handwritten text OCR |
Donut | 200M | MIT | 1GB | OCR-free document understanding | Document AI, form parsing |
LayoutLMv3 | 134M | MIT | 500MB | Document understanding, 83.37 ANLS DocVQA | Document layout analysis |
Granite-Docling-258M | 258M | Apache 2.0 | 1GB | End-to-end document conversion, 30x faster | Enterprise document processing |
CLIP (OpenAI) | ViT-L/14 | MIT | Variable | Vision-language contrastive learning | Image embeddings, zero-shot |
OpenCLIP | ViT-G/14 | Apache 2.0 | Variable | Open source CLIP implementation | Large-scale image embeddings |
SigLIP 2 | Various | Apache 2.0 | Variable | Multilingual vision-language, sigmoid loss | Improved semantic understanding |
OpenVision | Various | Apache 2.0 | Variable | 2-3x faster training than CLIP | Efficient vision encoding |
BGE Reranker v2-M3 | 600M | Apache 2.0 | 1.2GB | Multilingual reranking, SOTA performance | RAG, search reranking |
Jina Reranker v2 | Base | Apache 2.0 | Variable | 6x faster, multilingual, function-calling | Agentic RAG, code search |
ColBERT | BERT-based | MIT | Variable | Efficient neural search with late interaction | Information retrieval |
E5-Large-v2 | 335M | MIT | 1.3GB | Microsoft's text embedding model | Text similarity, retrieval |
Chronos | Various | Apache 2.0 | Variable | Time series foundation model, 250x faster | Time series forecasting |
TimesFM | 200M | Apache 2.0 | 800MB | Google's time series model, 100B time-points | Zero-shot forecasting |
Moirai 2.0 | Transformer | Apache 2.0 | Variable | #1 on GIFT-Eval benchmark, decoder-only | Universal forecasting |
Prophet | Statistical | MIT | Light | Meta's forecasting tool with seasonality | Business forecasting |
NeuralProphet | Neural | MIT | Variable | 55-92% accuracy improvement over Prophet | Interpretable forecasting |
Wav2Vec2 | Large | MIT | Variable | Self-supervised speech representation | Speech recognition, ASR |
WavLM | 316M | MIT | 1.2GB | Speaker verification, diarization | Speaker tasks, speech processing |
SpeechT5 | Transformer | MIT | Variable | Unified speech-text pre-training | Speech synthesis, recognition |
TabNet | Various | Apache 2.0 | Variable | Attention-based tabular learning | Structured data, tabular ML |
XGBoost | Tree-based | Apache 2.0 | Light | Extreme gradient boosting | Tabular data, competitions |
LightGBM | Tree-based | MIT | Light | Fast gradient boosting framework | Efficient tabular learning |
Hardware Requirements
Consumer Hardware (12-24GB)
- • RTX 4090: 24GB - up to 13B models
- • RTX 4080: 16GB - up to 7B models
- • Ultra-Light: Gemma 3 270M, TinyLlama 1.1B
- • Recommended: CroissantLLM 1.3B, IBM Granite 3.0 8B
- • Edge Reasoning: MobileLLM-R1 950M
- • Search/Embedding: CLIP, all-MiniLM-L6-v2
- • Audio: Wav2Vec2, SpeechT5
- • Tabular: XGBoost, LightGBM, TabNet
- • Quantization: GGUF Q4/Q8, QLoRA 4-bit
- • Mobile: 48 tokens/sec on Snapdragon X Elite
Professional (48-80GB)
- • A100 80GB: Single GPU up to 30B
- • H100 80GB: Faster training, larger batches
- • Recommended: OpenAI GPT-OSS 20B, Qwen 2.5-Max
- • Specialists: BioGPT, Cisco Foundation-sec, ChemBERTa
- • Regional: Yi 1.5 34B, Baichuan 4, ChatGLM-4.5
- • Time Series: TimesFM 200M, Chronos-T5, Moirai 2.0
- • Retrieval: BGE Reranker v2-M3, ColBERT-v2, E5-Large-v2
- • Techniques: DeepSpeed ZeRO Stage 2
- • Fine-tuning: Full parameter or large LoRA
Enterprise (Multi-GPU)
- • 2-8x H100: 70B+ models
- • Multi-node: 400B+ models like DeepSeek R1
- • Latest Flagship: DeepSeek V3.1 685B (MIT license)
- • Enterprise: OpenAI GPT-OSS 120B, BLOOM 176B
- • Advanced: Qwen-Image-Edit 20B, OpenVLA 7B
- • Scientific: UMA (Meta), BloombergGPT 50B
- • Vision: SAM 2, YOLO v11, Florence-2, OpenCLIP
- • Audio/Speech: WavLM 316M, large Wav2Vec2 models
- • Techniques: DeepSpeed ZeRO Stage 3, FSDP
- • Infrastructure: InfiniBand, NVLink
Licensing & Legal Considerations
Permissive Licenses (Recommended)
- • Apache 2.0: Mistral, Qwen, EleutherAI models
- • MIT: Phi-4, some research models
- • Benefits: Commercial use, modification, distribution
- • Requirements: Attribution, license inclusion
- • Patent Protection: Apache 2.0 provides coverage
Custom Licenses (Caution)
- • Meta Llama: Custom license with restrictions
- • Gemma: Terms of Use with commercial limits
- • Restrictions: Revenue thresholds, use case limits
- • Derivative Works: Complex fine-tuning implications
- • Legal Review: Required for commercial use
Enterprise Considerations
- • Legal Compliance: OSI-approved preferred
- • Liability: No warranty in any open source
- • IP Rights: Unclear derivative work ownership
- • Commercial Support: Available for some models
- • Risk Assessment: Balance capability vs legal risk
Performance Insights
Key Performance Factors
- • Inference Speed: Llama 3 > Mistral > Qwen > Gemma
- • Reasoning: DeepSeek R1 > Phi-4 > Llama 3.3
- • Multilingual: Qwen 2.5 ≈ Llama 3.3 > others
- • Code Quality: DeepSeek Coder > Qwen Coder > Phi-4
- • Fine-tuning Speed: Smaller models train 2-5x faster
Cost Considerations
- • Training Cost: Scales quadratically with model size
- • Inference Cost: DeepSeek models 90% cheaper than others
- • Hardware: 70B models require $10K+ in GPUs
- • Cloud Training: $13 (LoRA) vs $322 (full fine-tuning)
- • Long-term: Consider inference volume costs
Quick Decision Guide
Start Here (Budget < $5K)
- • General: Phi-4 (14B) - MIT license
- • Commercial: Mistral Small 3.1 - Apache 2.0
- • Hardware: RTX 4090 or cloud instances
- • Technique: QLoRA 4-bit fine-tuning
Scale Up (Budget $5K-50K)
- • Performance: Llama 3.3 70B or Qwen 2.5 72B
- • Commercial: Check licensing carefully
- • Hardware: 2-4x A100/H100 GPUs
- • Technique: DeepSpeed ZeRO + LoRA
Enterprise (Budget $50K+)
- • Performance: DeepSeek R1 for reasoning
- • Reliable: Llama 3.3 for production
- • Infrastructure: Multi-node clusters
- • Support: Consider commercial partnerships
Choosing the Right Base Model
Comprehensive guide to selecting the optimal foundation model for your fine-tuning project based on performance, licensing, hardware requirements, and use case specifics.
🎯 Find your perfect model in 30 seconds
Based on your use case, hardware, and experience level
🔒 Sign in required for personalized recommendations
Model Selection Decision Framework
Start Here: Define Requirements
- • Use Case: Chat, code, analysis, multimodal
- • Languages: English-only vs multilingual
- • Context Length: Short vs long documents
- • Latency: Real-time vs batch processing
- • Budget: Hardware and inference costs
Licensing Considerations
- • Commercial Use: Apache 2.0 > MIT > Custom
- • Enterprise: Check derivative work clauses
- • Attribution: Required for most licenses
- • Liability: No warranty in open source
- • Patents: Apache 2.0 provides protection
Hardware Constraints
- • 7B Models: 14-16GB VRAM (consumer)
- • 13B Models: 26-30GB VRAM (prosumer)
- • 30B+ Models: 60GB+ VRAM (enterprise)
- • 70B+ Models: Multiple GPUs required
- • Quantization: 50-75% memory reduction
🚀 2025 Breakthrough Models (Just Released)
DeepSeek V3.1
685B • MIT • Hybrid thinking mode beats GPT-5
OpenAI GPT-OSS
120B/20B • Apache 2.0 • OpenAI's first open models
IBM Granite 3.0
8B • Apache 2.0 • Enterprise-ready, 116 languages
Gemma 3 270M
270M • Edge AI • 0.75% battery usage
Qwen-Image-Edit
20B • Apache 2.0 • Advanced image editing with text rendering
OpenVLA
7B • MIT • Vision-language-action for robotics
Cisco Foundation-sec
8B • Apache 2.0 • First open cybersecurity LLM
YOLO v11
Variable • AGPL-3.0 • Latest object detection, 22% fewer params
Top Recommendations by Use Case
Chat & Conversation
- • Ultra-Budget: TinyLlama 1.1B, Gemma 3 270M
- • Budget: SmolLM3 3B, CroissantLLM 1.3B
- • Balanced: IBM Granite 3.0 8B, OpenAI GPT-OSS 20B
- • Premium: DeepSeek V3.1 (685B), Qwen 2.5-Max
Code Generation
- • Enterprise: IBM Granite 3.0 (116 languages)
- • Specialized: StarCoder 15B, DeepSeek Coder V2
- • Latest: OpenAI GPT-OSS 120B, DeepSeek V3.1
- • Edge: MobileLLM-R1 (math/coding on mobile)
Analysis & Reasoning
- • State-of-Art: DeepSeek V3.1 (hybrid thinking)
- • Compact: MobileLLM-R1 (2-5x performance boost)
- • Enterprise: Qwen 2.5-Max, IBM Granite 3.0
- • Agentic: ChatGLM-4.5 (task decomposition)
Enterprise Use
- • Latest Flagship: DeepSeek V3.1, Qwen 2.5-Max
- • Enterprise-Ready: IBM Granite 3.0 series
- • OpenAI Open: GPT-OSS 120B/20B (Apache 2.0)
- • Cost-Effective: ChatGLM-4.5 (cheaper than DeepSeek)
Multilingual
- • 46 Languages: BLOOM 176B (BigScience)
- • Chinese/English: Yi 1.5 34B, Baichuan 4
- • French/English: CroissantLLM (truly bilingual)
- • Japanese: Rakuten AI 2.0 (business-optimized)
Edge & Mobile
- • Ultra-Efficient: Gemma 3 270M (0.75% battery)
- • Reasoning: MobileLLM-R1 950M (2-5x boost)
- • Compact: TinyLlama 1.1B, CroissantLLM 1.3B
- • Quantized: GGUF format (Q2-Q8 levels)
Computer Vision
- • Object Detection: YOLO v11, YOLOv10, Grounding DINO
- • Segmentation: SAM 2 (44 FPS), TinySAM
- • Vision-Language: LLaVA 1.6, Florence-2, MiniGPT-4
- • Document AI: Granite-Docling-258M, PaddleOCR 3.0, TrOCR
Search & Retrieval
- • Image Embedding: CLIP-ViT-L/14, OpenCLIP, SigLIP 2
- • Text Retrieval: ColBERT-v2, E5-Large-v2, BGE-M3
- • Reranking: BGE Reranker v2-M3, Jina Reranker v2
- • Neural Search: OpenVision, all-MiniLM-L6-v2
Audio & Speech
- • Speech Recognition: Wav2Vec2, SpeechT5
- • Speaker Tasks: WavLM (verification, diarization)
- • Synthesis: SpeechT5 (unified speech-text)
- • Self-supervised: Wav2Vec2 (representation learning)
Domain Specialists
- • Finance: FinGPT 7B, BloombergGPT 50B
- • Medical: BioGPT, Palmyra-Med 70B, OpenBioLLM 70B
- • Legal: LawLLM 7B (US legal system)
- • Cybersecurity: Cisco Foundation-sec-8B, Trend Cybertron
Time Series & Forecasting
- • Foundation: Chronos-T5 (250x faster), TimesFM 200M
- • Best Performance: Moirai 2.0 (#1 GIFT-Eval)
- • Business: Prophet (seasonality), NeuralProphet
- • Zero-shot: TimesFM (100B time-points trained)
Tabular & Structured Data
- • Deep Learning: TabNet (attention-based)
- • Gradient Boosting: XGBoost, LightGBM
- • Competitions: XGBoost (proven winner)
- • Efficiency: LightGBM (fast training)
Specialized Applications
- • Creative: Qwen-Image-Edit, InstantID, MusicGen
- • Robotics: OpenVLA 7B, SmolVLA 450M
- • Scientific: UMA (Meta), ChemBERTa-2, BioGPT
- • Security: Cisco Foundation-sec, Trend Cybertron
Detailed Model Comparison
Model | Size | License | VRAM (FP16) | Strengths | Best For |
---|---|---|---|---|---|
Llama 3.3 70B | 70B | Custom (restrictive) | 140GB | Proven, multilingual, community | General purpose, enterprise |
Mistral Small 3.1 | 22B | Apache 2.0 | 44GB | Fast, commercial-friendly | Commercial deployment |
Qwen 2.5 72B | 72B | Apache 2.0 | 144GB | Data analysis, structured output | Enterprise data tasks |
Gemma 3 27B | 27B | Custom (restrictive) | 54GB | Efficient, Google ecosystem | Research, prototyping |
Phi-4 14B | 14B | MIT | 28GB | Strong reasoning, compact | Resource-constrained |
DeepSeek R1 | 671B | MIT | 1342GB+ | Advanced reasoning, coding | Research, complex tasks |
SmolLM3 3B | 3B | Apache 2.0 | 6GB | Multilingual, long context (64k) | Edge devices, mobile |
VibeVoice 1.5B | 1.5B | MIT (disabled) | 4GB | Text-to-speech, 90min audio | Voice synthesis (research) |
Qwen2.5-VL 7B | 7B | Apache 2.0 | 14GB | Vision, OCR, video understanding | Multimodal applications |
ModernBERT | 139M/395M | Apache 2.0 | 1-2GB | Embeddings, 8k context | Text embeddings, RAG |
Nomic-Embed v2 | 100M | Apache 2.0 | 500MB | MoE embeddings, 100 languages | Multilingual embeddings |
FLUX.1 [dev] | 12B | Custom (non-commercial) | 24GB | Text-to-image, best quality | Image generation (research) |
FLUX.1 [schnell] | 12B | Apache 2.0 | 24GB | Fast text-to-image generation | Commercial image generation |
Stable Diffusion 3 | 2B/8B | Custom (restrictive) | 4-16GB | Text-to-image, established | Legacy image generation |
Whisper Large v3 | 1.55B | MIT | 3GB | Speech recognition, 99 languages | Speech-to-text applications |
Distil-Whisper v3 | 756M | MIT | 1.5GB | 6x faster, 49% smaller than Whisper | Real-time transcription |
OpenAI GPT-OSS 120B | 120B | Apache 2.0 | 240GB | OpenAI's first open-weight model, o4-mini level | General purpose, reasoning |
OpenAI GPT-OSS 20B | 20B | Apache 2.0 | 40GB | Compact version, o3-mini level performance | Edge deployment, reasoning |
Qwen3 235B-A22B | 235B | Apache 2.0 | 470GB | MoE, 119 languages, beats DeepSeek R1 | Multilingual, enterprise |
Qwen3 32B | 32B | Apache 2.0 | 64GB | Dense model, excellent multilingual | Production deployment |
OLMo 2 32B | 32B | Apache 2.0 | 64GB | Fully open, beats GPT-3.5 Turbo | Research, transparency |
NVIDIA Nemotron Nano 9B | 9B | Apache 2.0 | 18GB | Mamba-Transformer hybrid, 6x faster | Real-time reasoning |
Command R+ 104B | 104B | CC-BY-NC-4.0 | 208GB | RAG optimized, tool use, 10 languages | Enterprise RAG, agents |
MiniCPM-o 2.6 | 8B | Apache 2.0 | 16GB | Multimodal, beats GPT-4o on vision | Mobile multimodal |
OpenBioLLM 70B | 70B | Apache 2.0 | 140GB | Medical domain, beats Med-PaLM-2 | Healthcare, biomedical |
StarCoder 15B | 15B | OpenRAIL | 30GB | Code generation, 80+ languages | Code completion, development |
MusicGen | 3.3B | CC-BY-NC-4.0 | 7GB | Music generation from text prompts | Audio/music creation |
OpenSora 2.0 | Transformer | Apache 2.0 | Variable | Video generation, commercial quality | Video production |
DeepSeek V3.1 | 685B | MIT | 1370GB | Hybrid thinking mode, beats GPT-5 | Advanced reasoning, research |
Qwen 2.5-Max | ~70B | Apache 2.0 | 140GB | Alibaba's latest, beats DeepSeek V3 | Enterprise, multimodal |
IBM Granite 3.0 8B | 8B | Apache 2.0 | 16GB | Enterprise model, 116 programming languages | Enterprise workflows, tools |
Yi 1.5 34B | 34B | Apache 2.0 | 68GB | Bilingual (Chinese/English), reasoning | 01.AI flagship, bilingual |
Baichuan 4 | 13B | Apache 2.0 | 26GB | Chinese domain specialist (law, finance) | Chinese business applications |
ChatGLM-4.5 | ~13B | Apache 2.0 | 26GB | Agentic AI, cheaper than DeepSeek | Agent workflows, Chinese |
CroissantLLM | 1.3B | MIT | 3GB | Truly bilingual French-English | French language applications |
BLOOM | 176B | BigScience OpenRAIL-M | 352GB | 46 languages, 13 programming languages | Multilingual research |
Rakuten AI 2.0 | MoE | Apache 2.0 | Variable | Japanese-optimized, MoE architecture | Japanese business applications |
FinGPT | 7B | MIT | 14GB | Financial domain, sentiment analysis | Financial analysis, trading |
BloombergGPT | 50B | Research only | 100GB | Finance-specific training data | Financial NLP, research |
Palmyra-Med 70B | 70B | Commercial license | 140GB | Medical domain, beats Med-PaLM-2 | Healthcare applications |
LawLLM | 7B | Apache 2.0 | 14GB | US legal system specialist | Legal research, compliance |
Gemma 3 270M | 270M | Gemma License | 600MB | Ultra-efficient edge AI, 0.75% battery | Mobile, edge devices |
TinyLlama | 1.1B | Apache 2.0 | 2.2GB | Compact LLaMA architecture | Resource-constrained devices |
MobileLLM-R1 | 950M | Apache 2.0 | 2GB | Edge reasoning, 2-5x performance boost | Mobile reasoning, math |
Cisco Foundation-sec-8B | 8B | Apache 2.0 | 16GB | Security-focused, threat detection | Cybersecurity, SOC operations |
Trend Cybertron | 8B | Open Source | 16GB | Autonomous cybersecurity agents | Security automation, defense |
Qwen-Image-Edit | 20B | Apache 2.0 | 40GB | Precise image editing, text rendering | Image editing, visual design |
InstantID | Diffusion | Apache 2.0 | 8GB | Identity-preserving generation | Avatar creation, face swapping |
ControlNet | Various | Apache 2.0 | Variable | Controlled image generation | Guided image synthesis |
OpenVLA | 7B | MIT | 14GB | Vision-language-action for robots | Robotic manipulation |
SmolVLA | 450M | Apache 2.0 | 1GB | Compact robotics model | Lightweight robotics |
UMA (Meta) | Variable | Open Source | Variable | Universal atomic simulation, 10000x faster DFT | Materials science, chemistry |
ChemBERTa-2 | 110M | MIT | 500MB | Chemical foundation model, SMILES | Drug discovery, chemistry |
BioGPT | 355M | MIT | 1GB | Biomedical text generation, 78.2% PubMedQA | Biomedical research, literature |
IBM SMILES-TED | Transformer | Apache 2.0 | Variable | 91M SMILES samples, chemical synthesis | Materials discovery, green chemistry |
YOLO v11 | Varies (n,s,m,l,x) | AGPL-3.0 | Variable | Latest object detection, 22% fewer params | Real-time object detection |
YOLOv10 | Varies (n,s,m,l,x) | AGPL-3.0 | Variable | End-to-end detection, no NMS needed | Efficient object detection |
SAM 2 | Transformer | Apache 2.0 | Variable | Segment anything in images/videos, 44 FPS | Image/video segmentation |
Florence-2 | 230M/770M | MIT | 1-2GB | Lightweight VLM, captioning, detection | Vision-language tasks |
Grounding DINO | Transformer | Apache 2.0 | Variable | Open-set detection, 52.5 AP COCO zero-shot | Zero-shot object detection |
LLaVA 1.6 | 7B/13B/34B | Apache 2.0 | 14-68GB | Large language and vision assistant | Multimodal conversations |
MiniGPT-4 | 7B/13B | BSD 3-Clause | 14-26GB | Aligned vision encoder with LLM | Image understanding, creativity |
BLIP-2 | 2.7B/7.8B | BSD 3-Clause | 6-16GB | Q-Former bridging vision and language | Vision-language pre-training |
PaLI-3 | 5B | Apache 2.0 | 10GB | Multilingual vision-language, 100+ languages | Multilingual VL tasks |
PaddleOCR 3.0 | Various | Apache 2.0 | Variable | PP-OCRv5, 13-point accuracy gain | OCR, document parsing |
TrOCR | Transformer | MIT | Variable | End-to-end text recognition | Handwritten text OCR |
Donut | 200M | MIT | 1GB | OCR-free document understanding | Document AI, form parsing |
LayoutLMv3 | 134M | MIT | 500MB | Document understanding, 83.37 ANLS DocVQA | Document layout analysis |
Granite-Docling-258M | 258M | Apache 2.0 | 1GB | End-to-end document conversion, 30x faster | Enterprise document processing |
CLIP (OpenAI) | ViT-L/14 | MIT | Variable | Vision-language contrastive learning | Image embeddings, zero-shot |
OpenCLIP | ViT-G/14 | Apache 2.0 | Variable | Open source CLIP implementation | Large-scale image embeddings |
SigLIP 2 | Various | Apache 2.0 | Variable | Multilingual vision-language, sigmoid loss | Improved semantic understanding |
OpenVision | Various | Apache 2.0 | Variable | 2-3x faster training than CLIP | Efficient vision encoding |
BGE Reranker v2-M3 | 600M | Apache 2.0 | 1.2GB | Multilingual reranking, SOTA performance | RAG, search reranking |
Jina Reranker v2 | Base | Apache 2.0 | Variable | 6x faster, multilingual, function-calling | Agentic RAG, code search |
ColBERT | BERT-based | MIT | Variable | Efficient neural search with late interaction | Information retrieval |
E5-Large-v2 | 335M | MIT | 1.3GB | Microsoft's text embedding model | Text similarity, retrieval |
Chronos | Various | Apache 2.0 | Variable | Time series foundation model, 250x faster | Time series forecasting |
TimesFM | 200M | Apache 2.0 | 800MB | Google's time series model, 100B time-points | Zero-shot forecasting |
Moirai 2.0 | Transformer | Apache 2.0 | Variable | #1 on GIFT-Eval benchmark, decoder-only | Universal forecasting |
Prophet | Statistical | MIT | Light | Meta's forecasting tool with seasonality | Business forecasting |
NeuralProphet | Neural | MIT | Variable | 55-92% accuracy improvement over Prophet | Interpretable forecasting |
Wav2Vec2 | Large | MIT | Variable | Self-supervised speech representation | Speech recognition, ASR |
WavLM | 316M | MIT | 1.2GB | Speaker verification, diarization | Speaker tasks, speech processing |
SpeechT5 | Transformer | MIT | Variable | Unified speech-text pre-training | Speech synthesis, recognition |
TabNet | Various | Apache 2.0 | Variable | Attention-based tabular learning | Structured data, tabular ML |
XGBoost | Tree-based | Apache 2.0 | Light | Extreme gradient boosting | Tabular data, competitions |
LightGBM | Tree-based | MIT | Light | Fast gradient boosting framework | Efficient tabular learning |
Hardware Requirements
Consumer Hardware (12-24GB)
- • RTX 4090: 24GB - up to 13B models
- • RTX 4080: 16GB - up to 7B models
- • Ultra-Light: Gemma 3 270M, TinyLlama 1.1B
- • Recommended: CroissantLLM 1.3B, IBM Granite 3.0 8B
- • Edge Reasoning: MobileLLM-R1 950M
- • Search/Embedding: CLIP, all-MiniLM-L6-v2
- • Audio: Wav2Vec2, SpeechT5
- • Tabular: XGBoost, LightGBM, TabNet
- • Quantization: GGUF Q4/Q8, QLoRA 4-bit
- • Mobile: 48 tokens/sec on Snapdragon X Elite
Professional (48-80GB)
- • A100 80GB: Single GPU up to 30B
- • H100 80GB: Faster training, larger batches
- • Recommended: OpenAI GPT-OSS 20B, Qwen 2.5-Max
- • Specialists: BioGPT, Cisco Foundation-sec, ChemBERTa
- • Regional: Yi 1.5 34B, Baichuan 4, ChatGLM-4.5
- • Time Series: TimesFM 200M, Chronos-T5, Moirai 2.0
- • Retrieval: BGE Reranker v2-M3, ColBERT-v2, E5-Large-v2
- • Techniques: DeepSpeed ZeRO Stage 2
- • Fine-tuning: Full parameter or large LoRA
Enterprise (Multi-GPU)
- • 2-8x H100: 70B+ models
- • Multi-node: 400B+ models like DeepSeek R1
- • Latest Flagship: DeepSeek V3.1 685B (MIT license)
- • Enterprise: OpenAI GPT-OSS 120B, BLOOM 176B
- • Advanced: Qwen-Image-Edit 20B, OpenVLA 7B
- • Scientific: UMA (Meta), BloombergGPT 50B
- • Vision: SAM 2, YOLO v11, Florence-2, OpenCLIP
- • Audio/Speech: WavLM 316M, large Wav2Vec2 models
- • Techniques: DeepSpeed ZeRO Stage 3, FSDP
- • Infrastructure: InfiniBand, NVLink
Licensing & Legal Considerations
Permissive Licenses (Recommended)
- • Apache 2.0: Mistral, Qwen, EleutherAI models
- • MIT: Phi-4, some research models
- • Benefits: Commercial use, modification, distribution
- • Requirements: Attribution, license inclusion
- • Patent Protection: Apache 2.0 provides coverage
Custom Licenses (Caution)
- • Meta Llama: Custom license with restrictions
- • Gemma: Terms of Use with commercial limits
- • Restrictions: Revenue thresholds, use case limits
- • Derivative Works: Complex fine-tuning implications
- • Legal Review: Required for commercial use
Enterprise Considerations
- • Legal Compliance: OSI-approved preferred
- • Liability: No warranty in any open source
- • IP Rights: Unclear derivative work ownership
- • Commercial Support: Available for some models
- • Risk Assessment: Balance capability vs legal risk
Performance Insights
Key Performance Factors
- • Inference Speed: Llama 3 > Mistral > Qwen > Gemma
- • Reasoning: DeepSeek R1 > Phi-4 > Llama 3.3
- • Multilingual: Qwen 2.5 ≈ Llama 3.3 > others
- • Code Quality: DeepSeek Coder > Qwen Coder > Phi-4
- • Fine-tuning Speed: Smaller models train 2-5x faster
Cost Considerations
- • Training Cost: Scales quadratically with model size
- • Inference Cost: DeepSeek models 90% cheaper than others
- • Hardware: 70B models require $10K+ in GPUs
- • Cloud Training: $13 (LoRA) vs $322 (full fine-tuning)
- • Long-term: Consider inference volume costs
Quick Decision Guide
Start Here (Budget < $5K)
- • General: Phi-4 (14B) - MIT license
- • Commercial: Mistral Small 3.1 - Apache 2.0
- • Hardware: RTX 4090 or cloud instances
- • Technique: QLoRA 4-bit fine-tuning
Scale Up (Budget $5K-50K)
- • Performance: Llama 3.3 70B or Qwen 2.5 72B
- • Commercial: Check licensing carefully
- • Hardware: 2-4x A100/H100 GPUs
- • Technique: DeepSpeed ZeRO + LoRA
Enterprise (Budget $50K+)
- • Performance: DeepSeek R1 for reasoning
- • Reliable: Llama 3.3 for production
- • Infrastructure: Multi-node clusters
- • Support: Consider commercial partnerships