Fine-Tuning Guide

🚀

Getting Started

🧪

Methods & Techniques

⚙️

Implementation

🌐

Deployment

Choosing the Right Base Model

Comprehensive guide to selecting the optimal foundation model for your fine-tuning project based on performance, licensing, hardware requirements, and use case specifics.

🎯 Find your perfect model in 30 seconds

Based on your use case, hardware, and experience level

🔒 Sign in required for personalized recommendations

Model Selection Decision Framework

Start Here: Define Requirements

• Use Case: Chat, code, analysis, multimodal
• Languages: English-only vs multilingual
• Context Length: Short vs long documents
• Latency: Real-time vs batch processing
• Budget: Hardware and inference costs

Licensing Considerations

• Commercial Use: Apache 2.0 > MIT > Custom
• Enterprise: Check derivative work clauses
• Attribution: Required for most licenses
• Liability: No warranty in open source
• Patents: Apache 2.0 provides protection

Hardware Constraints

• 7B Models: 14-16GB VRAM (consumer)
• 13B Models: 26-30GB VRAM (prosumer)
• 30B+ Models: 60GB+ VRAM (enterprise)
• 70B+ Models: Multiple GPUs required
• Quantization: 50-75% memory reduction

🚀 2025 Breakthrough Models (Just Released)

DeepSeek V3.1

685B • MIT • Hybrid thinking mode beats GPT-5

OpenAI GPT-OSS

120B/20B • Apache 2.0 • OpenAI's first open models

IBM Granite 3.0

8B • Apache 2.0 • Enterprise-ready, 116 languages

Gemma 3 270M

270M • Edge AI • 0.75% battery usage

Qwen-Image-Edit

20B • Apache 2.0 • Advanced image editing with text rendering

OpenVLA

7B • MIT • Vision-language-action for robotics

Cisco Foundation-sec

8B • Apache 2.0 • First open cybersecurity LLM

YOLO v11

Variable • AGPL-3.0 • Latest object detection, 22% fewer params

Top Recommendations by Use Case

Chat & Conversation

• Ultra-Budget: TinyLlama 1.1B, Gemma 3 270M
• Budget: SmolLM3 3B, CroissantLLM 1.3B
• Balanced: IBM Granite 3.0 8B, OpenAI GPT-OSS 20B
• Premium: DeepSeek V3.1 (685B), Qwen 2.5-Max

Code Generation

• Enterprise: IBM Granite 3.0 (116 languages)
• Specialized: StarCoder 15B, DeepSeek Coder V2
• Latest: OpenAI GPT-OSS 120B, DeepSeek V3.1
• Edge: MobileLLM-R1 (math/coding on mobile)

Analysis & Reasoning

• State-of-Art: DeepSeek V3.1 (hybrid thinking)
• Compact: MobileLLM-R1 (2-5x performance boost)
• Enterprise: Qwen 2.5-Max, IBM Granite 3.0
• Agentic: ChatGLM-4.5 (task decomposition)

Enterprise Use

• Latest Flagship: DeepSeek V3.1, Qwen 2.5-Max
• Enterprise-Ready: IBM Granite 3.0 series
• OpenAI Open: GPT-OSS 120B/20B (Apache 2.0)
• Cost-Effective: ChatGLM-4.5 (cheaper than DeepSeek)

Multilingual

• 46 Languages: BLOOM 176B (BigScience)
• Chinese/English: Yi 1.5 34B, Baichuan 4
• French/English: CroissantLLM (truly bilingual)
• Japanese: Rakuten AI 2.0 (business-optimized)

Edge & Mobile

• Ultra-Efficient: Gemma 3 270M (0.75% battery)
• Reasoning: MobileLLM-R1 950M (2-5x boost)
• Compact: TinyLlama 1.1B, CroissantLLM 1.3B
• Quantized: GGUF format (Q2-Q8 levels)

Computer Vision

• Object Detection: YOLO v11, YOLOv10, Grounding DINO
• Segmentation: SAM 2 (44 FPS), TinySAM
• Vision-Language: LLaVA 1.6, Florence-2, MiniGPT-4
• Document AI: Granite-Docling-258M, PaddleOCR 3.0, TrOCR

Search & Retrieval

• Image Embedding: CLIP-ViT-L/14, OpenCLIP, SigLIP 2
• Text Retrieval: ColBERT-v2, E5-Large-v2, BGE-M3
• Reranking: BGE Reranker v2-M3, Jina Reranker v2
• Neural Search: OpenVision, all-MiniLM-L6-v2

Audio & Speech

• Speech Recognition: Wav2Vec2, SpeechT5
• Speaker Tasks: WavLM (verification, diarization)
• Synthesis: SpeechT5 (unified speech-text)
• Self-supervised: Wav2Vec2 (representation learning)

Domain Specialists

• Finance: FinGPT 7B, BloombergGPT 50B
• Medical: BioGPT, Palmyra-Med 70B, OpenBioLLM 70B
• Legal: LawLLM 7B (US legal system)
• Cybersecurity: Cisco Foundation-sec-8B, Trend Cybertron

Time Series & Forecasting

• Foundation: Chronos-T5 (250x faster), TimesFM 200M
• Best Performance: Moirai 2.0 (#1 GIFT-Eval)
• Business: Prophet (seasonality), NeuralProphet
• Zero-shot: TimesFM (100B time-points trained)

Tabular & Structured Data

• Deep Learning: TabNet (attention-based)
• Gradient Boosting: XGBoost, LightGBM
• Competitions: XGBoost (proven winner)
• Efficiency: LightGBM (fast training)

Specialized Applications

• Creative: Qwen-Image-Edit, InstantID, MusicGen
• Robotics: OpenVLA 7B, SmolVLA 450M
• Scientific: UMA (Meta), ChemBERTa-2, BioGPT
• Security: Cisco Foundation-sec, Trend Cybertron

Detailed Model Comparison

Model	Size	License	VRAM (FP16)	Strengths	Best For
Llama 3.3 70B	70B	Custom (restrictive)	140GB	Proven, multilingual, community	General purpose, enterprise
Mistral Small 3.1	22B	Apache 2.0	44GB	Fast, commercial-friendly	Commercial deployment
Qwen 2.5 72B	72B	Apache 2.0	144GB	Data analysis, structured output	Enterprise data tasks
Gemma 3 27B	27B	Custom (restrictive)	54GB	Efficient, Google ecosystem	Research, prototyping
Phi-4 14B	14B	MIT	28GB	Strong reasoning, compact	Resource-constrained
DeepSeek R1	671B	MIT	1342GB+	Advanced reasoning, coding	Research, complex tasks
SmolLM3 3B	3B	Apache 2.0	6GB	Multilingual, long context (64k)	Edge devices, mobile
VibeVoice 1.5B	1.5B	MIT (disabled)	4GB	Text-to-speech, 90min audio	Voice synthesis (research)
Qwen2.5-VL 7B	7B	Apache 2.0	14GB	Vision, OCR, video understanding	Multimodal applications
ModernBERT	139M/395M	Apache 2.0	1-2GB	Embeddings, 8k context	Text embeddings, RAG
Nomic-Embed v2	100M	Apache 2.0	500MB	MoE embeddings, 100 languages	Multilingual embeddings
FLUX.1 [dev]	12B	Custom (non-commercial)	24GB	Text-to-image, best quality	Image generation (research)
FLUX.1 [schnell]	12B	Apache 2.0	24GB	Fast text-to-image generation	Commercial image generation
Stable Diffusion 3	2B/8B	Custom (restrictive)	4-16GB	Text-to-image, established	Legacy image generation
Whisper Large v3	1.55B	MIT	3GB	Speech recognition, 99 languages	Speech-to-text applications
Distil-Whisper v3	756M	MIT	1.5GB	6x faster, 49% smaller than Whisper	Real-time transcription
OpenAI GPT-OSS 120B	120B	Apache 2.0	240GB	OpenAI's first open-weight model, o4-mini level	General purpose, reasoning
OpenAI GPT-OSS 20B	20B	Apache 2.0	40GB	Compact version, o3-mini level performance	Edge deployment, reasoning
Qwen3 235B-A22B	235B	Apache 2.0	470GB	MoE, 119 languages, beats DeepSeek R1	Multilingual, enterprise
Qwen3 32B	32B	Apache 2.0	64GB	Dense model, excellent multilingual	Production deployment
OLMo 2 32B	32B	Apache 2.0	64GB	Fully open, beats GPT-3.5 Turbo	Research, transparency
NVIDIA Nemotron Nano 9B	9B	Apache 2.0	18GB	Mamba-Transformer hybrid, 6x faster	Real-time reasoning
Command R+ 104B	104B	CC-BY-NC-4.0	208GB	RAG optimized, tool use, 10 languages	Enterprise RAG, agents
MiniCPM-o 2.6	8B	Apache 2.0	16GB	Multimodal, beats GPT-4o on vision	Mobile multimodal
OpenBioLLM 70B	70B	Apache 2.0	140GB	Medical domain, beats Med-PaLM-2	Healthcare, biomedical
StarCoder 15B	15B	OpenRAIL	30GB	Code generation, 80+ languages	Code completion, development
MusicGen	3.3B	CC-BY-NC-4.0	7GB	Music generation from text prompts	Audio/music creation
OpenSora 2.0	Transformer	Apache 2.0	Variable	Video generation, commercial quality	Video production
DeepSeek V3.1	685B	MIT	1370GB	Hybrid thinking mode, beats GPT-5	Advanced reasoning, research
Qwen 2.5-Max	~70B	Apache 2.0	140GB	Alibaba's latest, beats DeepSeek V3	Enterprise, multimodal
IBM Granite 3.0 8B	8B	Apache 2.0	16GB	Enterprise model, 116 programming languages	Enterprise workflows, tools
Yi 1.5 34B	34B	Apache 2.0	68GB	Bilingual (Chinese/English), reasoning	01.AI flagship, bilingual
Baichuan 4	13B	Apache 2.0	26GB	Chinese domain specialist (law, finance)	Chinese business applications
ChatGLM-4.5	~13B	Apache 2.0	26GB	Agentic AI, cheaper than DeepSeek	Agent workflows, Chinese
CroissantLLM	1.3B	MIT	3GB	Truly bilingual French-English	French language applications
BLOOM	176B	BigScience OpenRAIL-M	352GB	46 languages, 13 programming languages	Multilingual research
Rakuten AI 2.0	MoE	Apache 2.0	Variable	Japanese-optimized, MoE architecture	Japanese business applications
FinGPT	7B	MIT	14GB	Financial domain, sentiment analysis	Financial analysis, trading
BloombergGPT	50B	Research only	100GB	Finance-specific training data	Financial NLP, research
Palmyra-Med 70B	70B	Commercial license	140GB	Medical domain, beats Med-PaLM-2	Healthcare applications
LawLLM	7B	Apache 2.0	14GB	US legal system specialist	Legal research, compliance
Gemma 3 270M	270M	Gemma License	600MB	Ultra-efficient edge AI, 0.75% battery	Mobile, edge devices
TinyLlama	1.1B	Apache 2.0	2.2GB	Compact LLaMA architecture	Resource-constrained devices
MobileLLM-R1	950M	Apache 2.0	2GB	Edge reasoning, 2-5x performance boost	Mobile reasoning, math
Cisco Foundation-sec-8B	8B	Apache 2.0	16GB	Security-focused, threat detection	Cybersecurity, SOC operations
Trend Cybertron	8B	Open Source	16GB	Autonomous cybersecurity agents	Security automation, defense
Qwen-Image-Edit	20B	Apache 2.0	40GB	Precise image editing, text rendering	Image editing, visual design
InstantID	Diffusion	Apache 2.0	8GB	Identity-preserving generation	Avatar creation, face swapping
ControlNet	Various	Apache 2.0	Variable	Controlled image generation	Guided image synthesis
OpenVLA	7B	MIT	14GB	Vision-language-action for robots	Robotic manipulation
SmolVLA	450M	Apache 2.0	1GB	Compact robotics model	Lightweight robotics
UMA (Meta)	Variable	Open Source	Variable	Universal atomic simulation, 10000x faster DFT	Materials science, chemistry
ChemBERTa-2	110M	MIT	500MB	Chemical foundation model, SMILES	Drug discovery, chemistry
BioGPT	355M	MIT	1GB	Biomedical text generation, 78.2% PubMedQA	Biomedical research, literature
IBM SMILES-TED	Transformer	Apache 2.0	Variable	91M SMILES samples, chemical synthesis	Materials discovery, green chemistry
YOLO v11	Varies (n,s,m,l,x)	AGPL-3.0	Variable	Latest object detection, 22% fewer params	Real-time object detection
YOLOv10	Varies (n,s,m,l,x)	AGPL-3.0	Variable	End-to-end detection, no NMS needed	Efficient object detection
SAM 2	Transformer	Apache 2.0	Variable	Segment anything in images/videos, 44 FPS	Image/video segmentation
Florence-2	230M/770M	MIT	1-2GB	Lightweight VLM, captioning, detection	Vision-language tasks
Grounding DINO	Transformer	Apache 2.0	Variable	Open-set detection, 52.5 AP COCO zero-shot	Zero-shot object detection
LLaVA 1.6	7B/13B/34B	Apache 2.0	14-68GB	Large language and vision assistant	Multimodal conversations
MiniGPT-4	7B/13B	BSD 3-Clause	14-26GB	Aligned vision encoder with LLM	Image understanding, creativity
BLIP-2	2.7B/7.8B	BSD 3-Clause	6-16GB	Q-Former bridging vision and language	Vision-language pre-training
PaLI-3	5B	Apache 2.0	10GB	Multilingual vision-language, 100+ languages	Multilingual VL tasks
PaddleOCR 3.0	Various	Apache 2.0	Variable	PP-OCRv5, 13-point accuracy gain	OCR, document parsing
TrOCR	Transformer	MIT	Variable	End-to-end text recognition	Handwritten text OCR
Donut	200M	MIT	1GB	OCR-free document understanding	Document AI, form parsing
LayoutLMv3	134M	MIT	500MB	Document understanding, 83.37 ANLS DocVQA	Document layout analysis
Granite-Docling-258M	258M	Apache 2.0	1GB	End-to-end document conversion, 30x faster	Enterprise document processing
CLIP (OpenAI)	ViT-L/14	MIT	Variable	Vision-language contrastive learning	Image embeddings, zero-shot
OpenCLIP	ViT-G/14	Apache 2.0	Variable	Open source CLIP implementation	Large-scale image embeddings
SigLIP 2	Various	Apache 2.0	Variable	Multilingual vision-language, sigmoid loss	Improved semantic understanding
OpenVision	Various	Apache 2.0	Variable	2-3x faster training than CLIP	Efficient vision encoding
BGE Reranker v2-M3	600M	Apache 2.0	1.2GB	Multilingual reranking, SOTA performance	RAG, search reranking
Jina Reranker v2	Base	Apache 2.0	Variable	6x faster, multilingual, function-calling	Agentic RAG, code search
ColBERT	BERT-based	MIT	Variable	Efficient neural search with late interaction	Information retrieval
E5-Large-v2	335M	MIT	1.3GB	Microsoft's text embedding model	Text similarity, retrieval
Chronos	Various	Apache 2.0	Variable	Time series foundation model, 250x faster	Time series forecasting
TimesFM	200M	Apache 2.0	800MB	Google's time series model, 100B time-points	Zero-shot forecasting
Moirai 2.0	Transformer	Apache 2.0	Variable	#1 on GIFT-Eval benchmark, decoder-only	Universal forecasting
Prophet	Statistical	MIT	Light	Meta's forecasting tool with seasonality	Business forecasting
NeuralProphet	Neural	MIT	Variable	55-92% accuracy improvement over Prophet	Interpretable forecasting
Wav2Vec2	Large	MIT	Variable	Self-supervised speech representation	Speech recognition, ASR
WavLM	316M	MIT	1.2GB	Speaker verification, diarization	Speaker tasks, speech processing
SpeechT5	Transformer	MIT	Variable	Unified speech-text pre-training	Speech synthesis, recognition
TabNet	Various	Apache 2.0	Variable	Attention-based tabular learning	Structured data, tabular ML
XGBoost	Tree-based	Apache 2.0	Light	Extreme gradient boosting	Tabular data, competitions
LightGBM	Tree-based	MIT	Light	Fast gradient boosting framework	Efficient tabular learning

Hardware Requirements

Consumer Hardware (12-24GB)

• RTX 4090: 24GB - up to 13B models
• RTX 4080: 16GB - up to 7B models
• Ultra-Light: Gemma 3 270M, TinyLlama 1.1B
• Recommended: CroissantLLM 1.3B, IBM Granite 3.0 8B
• Edge Reasoning: MobileLLM-R1 950M
• Search/Embedding: CLIP, all-MiniLM-L6-v2
• Audio: Wav2Vec2, SpeechT5
• Tabular: XGBoost, LightGBM, TabNet
• Quantization: GGUF Q4/Q8, QLoRA 4-bit
• Mobile: 48 tokens/sec on Snapdragon X Elite

Professional (48-80GB)

• A100 80GB: Single GPU up to 30B
• H100 80GB: Faster training, larger batches
• Recommended: OpenAI GPT-OSS 20B, Qwen 2.5-Max
• Specialists: BioGPT, Cisco Foundation-sec, ChemBERTa
• Regional: Yi 1.5 34B, Baichuan 4, ChatGLM-4.5
• Time Series: TimesFM 200M, Chronos-T5, Moirai 2.0
• Retrieval: BGE Reranker v2-M3, ColBERT-v2, E5-Large-v2
• Techniques: DeepSpeed ZeRO Stage 2
• Fine-tuning: Full parameter or large LoRA

Enterprise (Multi-GPU)

• 2-8x H100: 70B+ models
• Multi-node: 400B+ models like DeepSeek R1
• Latest Flagship: DeepSeek V3.1 685B (MIT license)
• Enterprise: OpenAI GPT-OSS 120B, BLOOM 176B
• Advanced: Qwen-Image-Edit 20B, OpenVLA 7B
• Scientific: UMA (Meta), BloombergGPT 50B
• Vision: SAM 2, YOLO v11, Florence-2, OpenCLIP
• Audio/Speech: WavLM 316M, large Wav2Vec2 models
• Techniques: DeepSpeed ZeRO Stage 3, FSDP
• Infrastructure: InfiniBand, NVLink

Licensing & Legal Considerations

Permissive Licenses (Recommended)

• Apache 2.0: Mistral, Qwen, EleutherAI models
• MIT: Phi-4, some research models
• Benefits: Commercial use, modification, distribution
• Requirements: Attribution, license inclusion
• Patent Protection: Apache 2.0 provides coverage

Custom Licenses (Caution)

• Meta Llama: Custom license with restrictions
• Gemma: Terms of Use with commercial limits
• Restrictions: Revenue thresholds, use case limits
• Derivative Works: Complex fine-tuning implications
• Legal Review: Required for commercial use

Enterprise Considerations

• Legal Compliance: OSI-approved preferred
• Liability: No warranty in any open source
• IP Rights: Unclear derivative work ownership
• Commercial Support: Available for some models
• Risk Assessment: Balance capability vs legal risk

Performance Insights

Key Performance Factors

• Inference Speed: Llama 3 > Mistral > Qwen > Gemma
• Reasoning: DeepSeek R1 > Phi-4 > Llama 3.3
• Multilingual: Qwen 2.5 ≈ Llama 3.3 > others
• Code Quality: DeepSeek Coder > Qwen Coder > Phi-4
• Fine-tuning Speed: Smaller models train 2-5x faster

Cost Considerations

• Training Cost: Scales quadratically with model size
• Inference Cost: DeepSeek models 90% cheaper than others
• Hardware: 70B models require $10K+ in GPUs
• Cloud Training: $13 (LoRA) vs $322 (full fine-tuning)
• Long-term: Consider inference volume costs

Quick Decision Guide

Start Here (Budget < $5K)

• General: Phi-4 (14B) - MIT license
• Commercial: Mistral Small 3.1 - Apache 2.0
• Hardware: RTX 4090 or cloud instances
• Technique: QLoRA 4-bit fine-tuning

Scale Up (Budget $5K-50K)

• Performance: Llama 3.3 70B or Qwen 2.5 72B
• Commercial: Check licensing carefully
• Hardware: 2-4x A100/H100 GPUs
• Technique: DeepSpeed ZeRO + LoRA

Enterprise (Budget $50K+)

• Performance: DeepSeek R1 for reasoning
• Reliable: Llama 3.3 for production
• Infrastructure: Multi-node clusters
• Support: Consider commercial partnerships

Choosing the Right Base Model

Comprehensive guide to selecting the optimal foundation model for your fine-tuning project based on performance, licensing, hardware requirements, and use case specifics.

🎯 Find your perfect model in 30 seconds

Based on your use case, hardware, and experience level

🔒 Sign in required for personalized recommendations

Model Selection Decision Framework

Start Here: Define Requirements

• Use Case: Chat, code, analysis, multimodal
• Languages: English-only vs multilingual
• Context Length: Short vs long documents
• Latency: Real-time vs batch processing
• Budget: Hardware and inference costs

Licensing Considerations

• Commercial Use: Apache 2.0 > MIT > Custom
• Enterprise: Check derivative work clauses
• Attribution: Required for most licenses
• Liability: No warranty in open source
• Patents: Apache 2.0 provides protection

Hardware Constraints

• 7B Models: 14-16GB VRAM (consumer)
• 13B Models: 26-30GB VRAM (prosumer)
• 30B+ Models: 60GB+ VRAM (enterprise)
• 70B+ Models: Multiple GPUs required
• Quantization: 50-75% memory reduction

🚀 2025 Breakthrough Models (Just Released)

DeepSeek V3.1

685B • MIT • Hybrid thinking mode beats GPT-5

OpenAI GPT-OSS

120B/20B • Apache 2.0 • OpenAI's first open models

IBM Granite 3.0

8B • Apache 2.0 • Enterprise-ready, 116 languages

Gemma 3 270M

270M • Edge AI • 0.75% battery usage

Qwen-Image-Edit

20B • Apache 2.0 • Advanced image editing with text rendering

OpenVLA

7B • MIT • Vision-language-action for robotics

Cisco Foundation-sec

8B • Apache 2.0 • First open cybersecurity LLM

YOLO v11

Variable • AGPL-3.0 • Latest object detection, 22% fewer params

Top Recommendations by Use Case

Chat & Conversation

• Ultra-Budget: TinyLlama 1.1B, Gemma 3 270M
• Budget: SmolLM3 3B, CroissantLLM 1.3B
• Balanced: IBM Granite 3.0 8B, OpenAI GPT-OSS 20B
• Premium: DeepSeek V3.1 (685B), Qwen 2.5-Max

Code Generation

• Enterprise: IBM Granite 3.0 (116 languages)
• Specialized: StarCoder 15B, DeepSeek Coder V2
• Latest: OpenAI GPT-OSS 120B, DeepSeek V3.1
• Edge: MobileLLM-R1 (math/coding on mobile)

Analysis & Reasoning

• State-of-Art: DeepSeek V3.1 (hybrid thinking)
• Compact: MobileLLM-R1 (2-5x performance boost)
• Enterprise: Qwen 2.5-Max, IBM Granite 3.0
• Agentic: ChatGLM-4.5 (task decomposition)

Enterprise Use

• Latest Flagship: DeepSeek V3.1, Qwen 2.5-Max
• Enterprise-Ready: IBM Granite 3.0 series
• OpenAI Open: GPT-OSS 120B/20B (Apache 2.0)
• Cost-Effective: ChatGLM-4.5 (cheaper than DeepSeek)

Multilingual

• 46 Languages: BLOOM 176B (BigScience)
• Chinese/English: Yi 1.5 34B, Baichuan 4
• French/English: CroissantLLM (truly bilingual)
• Japanese: Rakuten AI 2.0 (business-optimized)

Edge & Mobile

• Ultra-Efficient: Gemma 3 270M (0.75% battery)
• Reasoning: MobileLLM-R1 950M (2-5x boost)
• Compact: TinyLlama 1.1B, CroissantLLM 1.3B
• Quantized: GGUF format (Q2-Q8 levels)

Computer Vision

• Object Detection: YOLO v11, YOLOv10, Grounding DINO
• Segmentation: SAM 2 (44 FPS), TinySAM
• Vision-Language: LLaVA 1.6, Florence-2, MiniGPT-4
• Document AI: Granite-Docling-258M, PaddleOCR 3.0, TrOCR

Search & Retrieval

• Image Embedding: CLIP-ViT-L/14, OpenCLIP, SigLIP 2
• Text Retrieval: ColBERT-v2, E5-Large-v2, BGE-M3
• Reranking: BGE Reranker v2-M3, Jina Reranker v2
• Neural Search: OpenVision, all-MiniLM-L6-v2

Audio & Speech

• Speech Recognition: Wav2Vec2, SpeechT5
• Speaker Tasks: WavLM (verification, diarization)
• Synthesis: SpeechT5 (unified speech-text)
• Self-supervised: Wav2Vec2 (representation learning)

Domain Specialists

• Finance: FinGPT 7B, BloombergGPT 50B
• Medical: BioGPT, Palmyra-Med 70B, OpenBioLLM 70B
• Legal: LawLLM 7B (US legal system)
• Cybersecurity: Cisco Foundation-sec-8B, Trend Cybertron

Time Series & Forecasting

• Foundation: Chronos-T5 (250x faster), TimesFM 200M
• Best Performance: Moirai 2.0 (#1 GIFT-Eval)
• Business: Prophet (seasonality), NeuralProphet
• Zero-shot: TimesFM (100B time-points trained)

Tabular & Structured Data

• Deep Learning: TabNet (attention-based)
• Gradient Boosting: XGBoost, LightGBM
• Competitions: XGBoost (proven winner)
• Efficiency: LightGBM (fast training)

Specialized Applications

• Creative: Qwen-Image-Edit, InstantID, MusicGen
• Robotics: OpenVLA 7B, SmolVLA 450M
• Scientific: UMA (Meta), ChemBERTa-2, BioGPT
• Security: Cisco Foundation-sec, Trend Cybertron

Detailed Model Comparison

Model	Size	License	VRAM (FP16)	Strengths	Best For
Llama 3.3 70B	70B	Custom (restrictive)	140GB	Proven, multilingual, community	General purpose, enterprise
Mistral Small 3.1	22B	Apache 2.0	44GB	Fast, commercial-friendly	Commercial deployment
Qwen 2.5 72B	72B	Apache 2.0	144GB	Data analysis, structured output	Enterprise data tasks
Gemma 3 27B	27B	Custom (restrictive)	54GB	Efficient, Google ecosystem	Research, prototyping
Phi-4 14B	14B	MIT	28GB	Strong reasoning, compact	Resource-constrained
DeepSeek R1	671B	MIT	1342GB+	Advanced reasoning, coding	Research, complex tasks
SmolLM3 3B	3B	Apache 2.0	6GB	Multilingual, long context (64k)	Edge devices, mobile
VibeVoice 1.5B	1.5B	MIT (disabled)	4GB	Text-to-speech, 90min audio	Voice synthesis (research)
Qwen2.5-VL 7B	7B	Apache 2.0	14GB	Vision, OCR, video understanding	Multimodal applications
ModernBERT	139M/395M	Apache 2.0	1-2GB	Embeddings, 8k context	Text embeddings, RAG
Nomic-Embed v2	100M	Apache 2.0	500MB	MoE embeddings, 100 languages	Multilingual embeddings
FLUX.1 [dev]	12B	Custom (non-commercial)	24GB	Text-to-image, best quality	Image generation (research)
FLUX.1 [schnell]	12B	Apache 2.0	24GB	Fast text-to-image generation	Commercial image generation
Stable Diffusion 3	2B/8B	Custom (restrictive)	4-16GB	Text-to-image, established	Legacy image generation
Whisper Large v3	1.55B	MIT	3GB	Speech recognition, 99 languages	Speech-to-text applications
Distil-Whisper v3	756M	MIT	1.5GB	6x faster, 49% smaller than Whisper	Real-time transcription
OpenAI GPT-OSS 120B	120B	Apache 2.0	240GB	OpenAI's first open-weight model, o4-mini level	General purpose, reasoning
OpenAI GPT-OSS 20B	20B	Apache 2.0	40GB	Compact version, o3-mini level performance	Edge deployment, reasoning
Qwen3 235B-A22B	235B	Apache 2.0	470GB	MoE, 119 languages, beats DeepSeek R1	Multilingual, enterprise
Qwen3 32B	32B	Apache 2.0	64GB	Dense model, excellent multilingual	Production deployment
OLMo 2 32B	32B	Apache 2.0	64GB	Fully open, beats GPT-3.5 Turbo	Research, transparency
NVIDIA Nemotron Nano 9B	9B	Apache 2.0	18GB	Mamba-Transformer hybrid, 6x faster	Real-time reasoning
Command R+ 104B	104B	CC-BY-NC-4.0	208GB	RAG optimized, tool use, 10 languages	Enterprise RAG, agents
MiniCPM-o 2.6	8B	Apache 2.0	16GB	Multimodal, beats GPT-4o on vision	Mobile multimodal
OpenBioLLM 70B	70B	Apache 2.0	140GB	Medical domain, beats Med-PaLM-2	Healthcare, biomedical
StarCoder 15B	15B	OpenRAIL	30GB	Code generation, 80+ languages	Code completion, development
MusicGen	3.3B	CC-BY-NC-4.0	7GB	Music generation from text prompts	Audio/music creation
OpenSora 2.0	Transformer	Apache 2.0	Variable	Video generation, commercial quality	Video production
DeepSeek V3.1	685B	MIT	1370GB	Hybrid thinking mode, beats GPT-5	Advanced reasoning, research
Qwen 2.5-Max	~70B	Apache 2.0	140GB	Alibaba's latest, beats DeepSeek V3	Enterprise, multimodal
IBM Granite 3.0 8B	8B	Apache 2.0	16GB	Enterprise model, 116 programming languages	Enterprise workflows, tools
Yi 1.5 34B	34B	Apache 2.0	68GB	Bilingual (Chinese/English), reasoning	01.AI flagship, bilingual
Baichuan 4	13B	Apache 2.0	26GB	Chinese domain specialist (law, finance)	Chinese business applications
ChatGLM-4.5	~13B	Apache 2.0	26GB	Agentic AI, cheaper than DeepSeek	Agent workflows, Chinese
CroissantLLM	1.3B	MIT	3GB	Truly bilingual French-English	French language applications
BLOOM	176B	BigScience OpenRAIL-M	352GB	46 languages, 13 programming languages	Multilingual research
Rakuten AI 2.0	MoE	Apache 2.0	Variable	Japanese-optimized, MoE architecture	Japanese business applications
FinGPT	7B	MIT	14GB	Financial domain, sentiment analysis	Financial analysis, trading
BloombergGPT	50B	Research only	100GB	Finance-specific training data	Financial NLP, research
Palmyra-Med 70B	70B	Commercial license	140GB	Medical domain, beats Med-PaLM-2	Healthcare applications
LawLLM	7B	Apache 2.0	14GB	US legal system specialist	Legal research, compliance
Gemma 3 270M	270M	Gemma License	600MB	Ultra-efficient edge AI, 0.75% battery	Mobile, edge devices
TinyLlama	1.1B	Apache 2.0	2.2GB	Compact LLaMA architecture	Resource-constrained devices
MobileLLM-R1	950M	Apache 2.0	2GB	Edge reasoning, 2-5x performance boost	Mobile reasoning, math
Cisco Foundation-sec-8B	8B	Apache 2.0	16GB	Security-focused, threat detection	Cybersecurity, SOC operations
Trend Cybertron	8B	Open Source	16GB	Autonomous cybersecurity agents	Security automation, defense
Qwen-Image-Edit	20B	Apache 2.0	40GB	Precise image editing, text rendering	Image editing, visual design
InstantID	Diffusion	Apache 2.0	8GB	Identity-preserving generation	Avatar creation, face swapping
ControlNet	Various	Apache 2.0	Variable	Controlled image generation	Guided image synthesis
OpenVLA	7B	MIT	14GB	Vision-language-action for robots	Robotic manipulation
SmolVLA	450M	Apache 2.0	1GB	Compact robotics model	Lightweight robotics
UMA (Meta)	Variable	Open Source	Variable	Universal atomic simulation, 10000x faster DFT	Materials science, chemistry
ChemBERTa-2	110M	MIT	500MB	Chemical foundation model, SMILES	Drug discovery, chemistry
BioGPT	355M	MIT	1GB	Biomedical text generation, 78.2% PubMedQA	Biomedical research, literature
IBM SMILES-TED	Transformer	Apache 2.0	Variable	91M SMILES samples, chemical synthesis	Materials discovery, green chemistry
YOLO v11	Varies (n,s,m,l,x)	AGPL-3.0	Variable	Latest object detection, 22% fewer params	Real-time object detection
YOLOv10	Varies (n,s,m,l,x)	AGPL-3.0	Variable	End-to-end detection, no NMS needed	Efficient object detection
SAM 2	Transformer	Apache 2.0	Variable	Segment anything in images/videos, 44 FPS	Image/video segmentation
Florence-2	230M/770M	MIT	1-2GB	Lightweight VLM, captioning, detection	Vision-language tasks
Grounding DINO	Transformer	Apache 2.0	Variable	Open-set detection, 52.5 AP COCO zero-shot	Zero-shot object detection
LLaVA 1.6	7B/13B/34B	Apache 2.0	14-68GB	Large language and vision assistant	Multimodal conversations
MiniGPT-4	7B/13B	BSD 3-Clause	14-26GB	Aligned vision encoder with LLM	Image understanding, creativity
BLIP-2	2.7B/7.8B	BSD 3-Clause	6-16GB	Q-Former bridging vision and language	Vision-language pre-training
PaLI-3	5B	Apache 2.0	10GB	Multilingual vision-language, 100+ languages	Multilingual VL tasks
PaddleOCR 3.0	Various	Apache 2.0	Variable	PP-OCRv5, 13-point accuracy gain	OCR, document parsing
TrOCR	Transformer	MIT	Variable	End-to-end text recognition	Handwritten text OCR
Donut	200M	MIT	1GB	OCR-free document understanding	Document AI, form parsing
LayoutLMv3	134M	MIT	500MB	Document understanding, 83.37 ANLS DocVQA	Document layout analysis
Granite-Docling-258M	258M	Apache 2.0	1GB	End-to-end document conversion, 30x faster	Enterprise document processing
CLIP (OpenAI)	ViT-L/14	MIT	Variable	Vision-language contrastive learning	Image embeddings, zero-shot
OpenCLIP	ViT-G/14	Apache 2.0	Variable	Open source CLIP implementation	Large-scale image embeddings
SigLIP 2	Various	Apache 2.0	Variable	Multilingual vision-language, sigmoid loss	Improved semantic understanding
OpenVision	Various	Apache 2.0	Variable	2-3x faster training than CLIP	Efficient vision encoding
BGE Reranker v2-M3	600M	Apache 2.0	1.2GB	Multilingual reranking, SOTA performance	RAG, search reranking
Jina Reranker v2	Base	Apache 2.0	Variable	6x faster, multilingual, function-calling	Agentic RAG, code search
ColBERT	BERT-based	MIT	Variable	Efficient neural search with late interaction	Information retrieval
E5-Large-v2	335M	MIT	1.3GB	Microsoft's text embedding model	Text similarity, retrieval
Chronos	Various	Apache 2.0	Variable	Time series foundation model, 250x faster	Time series forecasting
TimesFM	200M	Apache 2.0	800MB	Google's time series model, 100B time-points	Zero-shot forecasting
Moirai 2.0	Transformer	Apache 2.0	Variable	#1 on GIFT-Eval benchmark, decoder-only	Universal forecasting
Prophet	Statistical	MIT	Light	Meta's forecasting tool with seasonality	Business forecasting
NeuralProphet	Neural	MIT	Variable	55-92% accuracy improvement over Prophet	Interpretable forecasting
Wav2Vec2	Large	MIT	Variable	Self-supervised speech representation	Speech recognition, ASR
WavLM	316M	MIT	1.2GB	Speaker verification, diarization	Speaker tasks, speech processing
SpeechT5	Transformer	MIT	Variable	Unified speech-text pre-training	Speech synthesis, recognition
TabNet	Various	Apache 2.0	Variable	Attention-based tabular learning	Structured data, tabular ML
XGBoost	Tree-based	Apache 2.0	Light	Extreme gradient boosting	Tabular data, competitions
LightGBM	Tree-based	MIT	Light	Fast gradient boosting framework	Efficient tabular learning

Hardware Requirements

Consumer Hardware (12-24GB)

• RTX 4090: 24GB - up to 13B models
• RTX 4080: 16GB - up to 7B models
• Ultra-Light: Gemma 3 270M, TinyLlama 1.1B
• Recommended: CroissantLLM 1.3B, IBM Granite 3.0 8B
• Edge Reasoning: MobileLLM-R1 950M
• Search/Embedding: CLIP, all-MiniLM-L6-v2
• Audio: Wav2Vec2, SpeechT5
• Tabular: XGBoost, LightGBM, TabNet
• Quantization: GGUF Q4/Q8, QLoRA 4-bit
• Mobile: 48 tokens/sec on Snapdragon X Elite

Professional (48-80GB)

• A100 80GB: Single GPU up to 30B
• H100 80GB: Faster training, larger batches
• Recommended: OpenAI GPT-OSS 20B, Qwen 2.5-Max
• Specialists: BioGPT, Cisco Foundation-sec, ChemBERTa
• Regional: Yi 1.5 34B, Baichuan 4, ChatGLM-4.5
• Time Series: TimesFM 200M, Chronos-T5, Moirai 2.0
• Retrieval: BGE Reranker v2-M3, ColBERT-v2, E5-Large-v2
• Techniques: DeepSpeed ZeRO Stage 2
• Fine-tuning: Full parameter or large LoRA

Enterprise (Multi-GPU)

• 2-8x H100: 70B+ models
• Multi-node: 400B+ models like DeepSeek R1
• Latest Flagship: DeepSeek V3.1 685B (MIT license)
• Enterprise: OpenAI GPT-OSS 120B, BLOOM 176B
• Advanced: Qwen-Image-Edit 20B, OpenVLA 7B
• Scientific: UMA (Meta), BloombergGPT 50B
• Vision: SAM 2, YOLO v11, Florence-2, OpenCLIP
• Audio/Speech: WavLM 316M, large Wav2Vec2 models
• Techniques: DeepSpeed ZeRO Stage 3, FSDP
• Infrastructure: InfiniBand, NVLink

Licensing & Legal Considerations

Permissive Licenses (Recommended)

• Apache 2.0: Mistral, Qwen, EleutherAI models
• MIT: Phi-4, some research models
• Benefits: Commercial use, modification, distribution
• Requirements: Attribution, license inclusion
• Patent Protection: Apache 2.0 provides coverage

Custom Licenses (Caution)

• Meta Llama: Custom license with restrictions
• Gemma: Terms of Use with commercial limits
• Restrictions: Revenue thresholds, use case limits
• Derivative Works: Complex fine-tuning implications
• Legal Review: Required for commercial use

Enterprise Considerations

• Legal Compliance: OSI-approved preferred
• Liability: No warranty in any open source
• IP Rights: Unclear derivative work ownership
• Commercial Support: Available for some models
• Risk Assessment: Balance capability vs legal risk

Performance Insights

Key Performance Factors

• Inference Speed: Llama 3 > Mistral > Qwen > Gemma
• Reasoning: DeepSeek R1 > Phi-4 > Llama 3.3
• Multilingual: Qwen 2.5 ≈ Llama 3.3 > others
• Code Quality: DeepSeek Coder > Qwen Coder > Phi-4
• Fine-tuning Speed: Smaller models train 2-5x faster

Cost Considerations

• Training Cost: Scales quadratically with model size
• Inference Cost: DeepSeek models 90% cheaper than others
• Hardware: 70B models require $10K+ in GPUs
• Cloud Training: $13 (LoRA) vs $322 (full fine-tuning)
• Long-term: Consider inference volume costs

Quick Decision Guide

Start Here (Budget < $5K)

• General: Phi-4 (14B) - MIT license
• Commercial: Mistral Small 3.1 - Apache 2.0
• Hardware: RTX 4090 or cloud instances
• Technique: QLoRA 4-bit fine-tuning

Scale Up (Budget $5K-50K)

• Performance: Llama 3.3 70B or Qwen 2.5 72B
• Commercial: Check licensing carefully
• Hardware: 2-4x A100/H100 GPUs
• Technique: DeepSpeed ZeRO + LoRA

Enterprise (Budget $50K+)

• Performance: DeepSeek R1 for reasoning
• Reliable: Llama 3.3 for production
• Infrastructure: Multi-node clusters
• Support: Consider commercial partnerships

Fine-Tuning Guide

closed

Fine-Tuning Guide

🚀

Getting Started

🧪

Methods & Techniques

⚙️

Implementation

🌐

Agentic Design

Agentic Design

Fine-Tuning Guide

Getting Started

Overview & Quick Start

Cheatsheet & Best Practices

Model Selection Guide

Methods & Techniques

Implementation

Deployment

Choosing the Right Base Model

Model Selection Decision Framework

Start Here: Define Requirements

Licensing Considerations

Hardware Constraints

🚀 2025 Breakthrough Models (Just Released)

DeepSeek V3.1

OpenAI GPT-OSS

IBM Granite 3.0

Gemma 3 270M

Qwen-Image-Edit

OpenVLA

Cisco Foundation-sec

YOLO v11

Top Recommendations by Use Case

Chat & Conversation

Code Generation

Analysis & Reasoning

Enterprise Use

Multilingual

Edge & Mobile

Computer Vision

Search & Retrieval

Audio & Speech

Domain Specialists

Time Series & Forecasting

Tabular & Structured Data

Specialized Applications

Detailed Model Comparison

Hardware Requirements

Consumer Hardware (12-24GB)

Professional (48-80GB)

Enterprise (Multi-GPU)

Licensing & Legal Considerations

Permissive Licenses (Recommended)

Custom Licenses (Caution)

Enterprise Considerations

Performance Insights

Key Performance Factors

Cost Considerations

Quick Decision Guide

Start Here (Budget < $5K)

Scale Up (Budget $5K-50K)

Enterprise (Budget $50K+)

Choosing the Right Base Model

Model Selection Decision Framework

Start Here: Define Requirements

Licensing Considerations

Hardware Constraints

🚀 2025 Breakthrough Models (Just Released)

DeepSeek V3.1

OpenAI GPT-OSS

IBM Granite 3.0

Gemma 3 270M

Qwen-Image-Edit

OpenVLA

Cisco Foundation-sec

YOLO v11

Top Recommendations by Use Case

Chat & Conversation

Code Generation

Analysis & Reasoning

Enterprise Use

Multilingual

Edge & Mobile

Computer Vision

Search & Retrieval

Audio & Speech

Domain Specialists

Time Series & Forecasting