Agentic Design

Patterns

Choosing the Right Base Model

Comprehensive guide to selecting the optimal foundation model for your fine-tuning project based on performance, licensing, hardware requirements, and use case specifics.

🎯 Find your perfect model in 30 seconds

Based on your use case, hardware, and experience level

🔒 Sign in required for personalized recommendations

Model Selection Decision Framework

Start Here: Define Requirements

  • Use Case: Chat, code, analysis, multimodal
  • Languages: English-only vs multilingual
  • Context Length: Short vs long documents
  • Latency: Real-time vs batch processing
  • Budget: Hardware and inference costs

Licensing Considerations

  • Commercial Use: Apache 2.0 > MIT > Custom
  • Enterprise: Check derivative work clauses
  • Attribution: Required for most licenses
  • Liability: No warranty in open source
  • Patents: Apache 2.0 provides protection

Hardware Constraints

  • 7B Models: 14-16GB VRAM (consumer)
  • 13B Models: 26-30GB VRAM (prosumer)
  • 30B+ Models: 60GB+ VRAM (enterprise)
  • 70B+ Models: Multiple GPUs required
  • Quantization: 50-75% memory reduction

🚀 2025 Breakthrough Models (Just Released)

DeepSeek V3.1

685B • MIT • Hybrid thinking mode beats GPT-5

OpenAI GPT-OSS

120B/20B • Apache 2.0 • OpenAI's first open models

IBM Granite 3.0

8B • Apache 2.0 • Enterprise-ready, 116 languages

Gemma 3 270M

270M • Edge AI • 0.75% battery usage

Qwen-Image-Edit

20B • Apache 2.0 • Advanced image editing with text rendering

OpenVLA

7B • MIT • Vision-language-action for robotics

Cisco Foundation-sec

8B • Apache 2.0 • First open cybersecurity LLM

YOLO v11

Variable • AGPL-3.0 • Latest object detection, 22% fewer params

Top Recommendations by Use Case

Chat & Conversation

  • Ultra-Budget: TinyLlama 1.1B, Gemma 3 270M
  • Budget: SmolLM3 3B, CroissantLLM 1.3B
  • Balanced: IBM Granite 3.0 8B, OpenAI GPT-OSS 20B
  • Premium: DeepSeek V3.1 (685B), Qwen 2.5-Max

Code Generation

  • Enterprise: IBM Granite 3.0 (116 languages)
  • Specialized: StarCoder 15B, DeepSeek Coder V2
  • Latest: OpenAI GPT-OSS 120B, DeepSeek V3.1
  • Edge: MobileLLM-R1 (math/coding on mobile)

Analysis & Reasoning

  • State-of-Art: DeepSeek V3.1 (hybrid thinking)
  • Compact: MobileLLM-R1 (2-5x performance boost)
  • Enterprise: Qwen 2.5-Max, IBM Granite 3.0
  • Agentic: ChatGLM-4.5 (task decomposition)

Enterprise Use

  • Latest Flagship: DeepSeek V3.1, Qwen 2.5-Max
  • Enterprise-Ready: IBM Granite 3.0 series
  • OpenAI Open: GPT-OSS 120B/20B (Apache 2.0)
  • Cost-Effective: ChatGLM-4.5 (cheaper than DeepSeek)

Multilingual

  • 46 Languages: BLOOM 176B (BigScience)
  • Chinese/English: Yi 1.5 34B, Baichuan 4
  • French/English: CroissantLLM (truly bilingual)
  • Japanese: Rakuten AI 2.0 (business-optimized)

Edge & Mobile

  • Ultra-Efficient: Gemma 3 270M (0.75% battery)
  • Reasoning: MobileLLM-R1 950M (2-5x boost)
  • Compact: TinyLlama 1.1B, CroissantLLM 1.3B
  • Quantized: GGUF format (Q2-Q8 levels)

Computer Vision

  • Object Detection: YOLO v11, YOLOv10, Grounding DINO
  • Segmentation: SAM 2 (44 FPS), TinySAM
  • Vision-Language: LLaVA 1.6, Florence-2, MiniGPT-4
  • Document AI: Granite-Docling-258M, PaddleOCR 3.0, TrOCR

Search & Retrieval

  • Image Embedding: CLIP-ViT-L/14, OpenCLIP, SigLIP 2
  • Text Retrieval: ColBERT-v2, E5-Large-v2, BGE-M3
  • Reranking: BGE Reranker v2-M3, Jina Reranker v2
  • Neural Search: OpenVision, all-MiniLM-L6-v2

Audio & Speech

  • Speech Recognition: Wav2Vec2, SpeechT5
  • Speaker Tasks: WavLM (verification, diarization)
  • Synthesis: SpeechT5 (unified speech-text)
  • Self-supervised: Wav2Vec2 (representation learning)

Domain Specialists

  • Finance: FinGPT 7B, BloombergGPT 50B
  • Medical: BioGPT, Palmyra-Med 70B, OpenBioLLM 70B
  • Legal: LawLLM 7B (US legal system)
  • Cybersecurity: Cisco Foundation-sec-8B, Trend Cybertron

Time Series & Forecasting

  • Foundation: Chronos-T5 (250x faster), TimesFM 200M
  • Best Performance: Moirai 2.0 (#1 GIFT-Eval)
  • Business: Prophet (seasonality), NeuralProphet
  • Zero-shot: TimesFM (100B time-points trained)

Tabular & Structured Data

  • Deep Learning: TabNet (attention-based)
  • Gradient Boosting: XGBoost, LightGBM
  • Competitions: XGBoost (proven winner)
  • Efficiency: LightGBM (fast training)

Specialized Applications

  • Creative: Qwen-Image-Edit, InstantID, MusicGen
  • Robotics: OpenVLA 7B, SmolVLA 450M
  • Scientific: UMA (Meta), ChemBERTa-2, BioGPT
  • Security: Cisco Foundation-sec, Trend Cybertron

Detailed Model Comparison

ModelSizeLicenseVRAM (FP16)StrengthsBest For
Llama 3.3 70B70BCustom (restrictive)140GBProven, multilingual, communityGeneral purpose, enterprise
Mistral Small 3.122BApache 2.044GBFast, commercial-friendlyCommercial deployment
Qwen 2.5 72B72BApache 2.0144GBData analysis, structured outputEnterprise data tasks
Gemma 3 27B27BCustom (restrictive)54GBEfficient, Google ecosystemResearch, prototyping
Phi-4 14B14BMIT28GBStrong reasoning, compactResource-constrained
DeepSeek R1671BMIT1342GB+Advanced reasoning, codingResearch, complex tasks
SmolLM3 3B3BApache 2.06GBMultilingual, long context (64k)Edge devices, mobile
VibeVoice 1.5B1.5BMIT (disabled)4GBText-to-speech, 90min audioVoice synthesis (research)
Qwen2.5-VL 7B7BApache 2.014GBVision, OCR, video understandingMultimodal applications
ModernBERT139M/395MApache 2.01-2GBEmbeddings, 8k contextText embeddings, RAG
Nomic-Embed v2100MApache 2.0500MBMoE embeddings, 100 languagesMultilingual embeddings
FLUX.1 [dev]12BCustom (non-commercial)24GBText-to-image, best qualityImage generation (research)
FLUX.1 [schnell]12BApache 2.024GBFast text-to-image generationCommercial image generation
Stable Diffusion 32B/8BCustom (restrictive)4-16GBText-to-image, establishedLegacy image generation
Whisper Large v31.55BMIT3GBSpeech recognition, 99 languagesSpeech-to-text applications
Distil-Whisper v3756MMIT1.5GB6x faster, 49% smaller than WhisperReal-time transcription
OpenAI GPT-OSS 120B120BApache 2.0240GBOpenAI's first open-weight model, o4-mini levelGeneral purpose, reasoning
OpenAI GPT-OSS 20B20BApache 2.040GBCompact version, o3-mini level performanceEdge deployment, reasoning
Qwen3 235B-A22B235BApache 2.0470GBMoE, 119 languages, beats DeepSeek R1Multilingual, enterprise
Qwen3 32B32BApache 2.064GBDense model, excellent multilingualProduction deployment
OLMo 2 32B32BApache 2.064GBFully open, beats GPT-3.5 TurboResearch, transparency
NVIDIA Nemotron Nano 9B9BApache 2.018GBMamba-Transformer hybrid, 6x fasterReal-time reasoning
Command R+ 104B104BCC-BY-NC-4.0208GBRAG optimized, tool use, 10 languagesEnterprise RAG, agents
MiniCPM-o 2.68BApache 2.016GBMultimodal, beats GPT-4o on visionMobile multimodal
OpenBioLLM 70B70BApache 2.0140GBMedical domain, beats Med-PaLM-2Healthcare, biomedical
StarCoder 15B15BOpenRAIL30GBCode generation, 80+ languagesCode completion, development
MusicGen3.3BCC-BY-NC-4.07GBMusic generation from text promptsAudio/music creation
OpenSora 2.0TransformerApache 2.0VariableVideo generation, commercial qualityVideo production
DeepSeek V3.1685BMIT1370GBHybrid thinking mode, beats GPT-5Advanced reasoning, research
Qwen 2.5-Max~70BApache 2.0140GBAlibaba's latest, beats DeepSeek V3Enterprise, multimodal
IBM Granite 3.0 8B8BApache 2.016GBEnterprise model, 116 programming languagesEnterprise workflows, tools
Yi 1.5 34B34BApache 2.068GBBilingual (Chinese/English), reasoning01.AI flagship, bilingual
Baichuan 413BApache 2.026GBChinese domain specialist (law, finance)Chinese business applications
ChatGLM-4.5~13BApache 2.026GBAgentic AI, cheaper than DeepSeekAgent workflows, Chinese
CroissantLLM1.3BMIT3GBTruly bilingual French-EnglishFrench language applications
BLOOM176BBigScience OpenRAIL-M352GB46 languages, 13 programming languagesMultilingual research
Rakuten AI 2.0MoEApache 2.0VariableJapanese-optimized, MoE architectureJapanese business applications
FinGPT7BMIT14GBFinancial domain, sentiment analysisFinancial analysis, trading
BloombergGPT50BResearch only100GBFinance-specific training dataFinancial NLP, research
Palmyra-Med 70B70BCommercial license140GBMedical domain, beats Med-PaLM-2Healthcare applications
LawLLM7BApache 2.014GBUS legal system specialistLegal research, compliance
Gemma 3 270M270MGemma License600MBUltra-efficient edge AI, 0.75% batteryMobile, edge devices
TinyLlama1.1BApache 2.02.2GBCompact LLaMA architectureResource-constrained devices
MobileLLM-R1950MApache 2.02GBEdge reasoning, 2-5x performance boostMobile reasoning, math
Cisco Foundation-sec-8B8BApache 2.016GBSecurity-focused, threat detectionCybersecurity, SOC operations
Trend Cybertron8BOpen Source16GBAutonomous cybersecurity agentsSecurity automation, defense
Qwen-Image-Edit20BApache 2.040GBPrecise image editing, text renderingImage editing, visual design
InstantIDDiffusionApache 2.08GBIdentity-preserving generationAvatar creation, face swapping
ControlNetVariousApache 2.0VariableControlled image generationGuided image synthesis
OpenVLA7BMIT14GBVision-language-action for robotsRobotic manipulation
SmolVLA450MApache 2.01GBCompact robotics modelLightweight robotics
UMA (Meta)VariableOpen SourceVariableUniversal atomic simulation, 10000x faster DFTMaterials science, chemistry
ChemBERTa-2110MMIT500MBChemical foundation model, SMILESDrug discovery, chemistry
BioGPT355MMIT1GBBiomedical text generation, 78.2% PubMedQABiomedical research, literature
IBM SMILES-TEDTransformerApache 2.0Variable91M SMILES samples, chemical synthesisMaterials discovery, green chemistry
YOLO v11Varies (n,s,m,l,x)AGPL-3.0VariableLatest object detection, 22% fewer paramsReal-time object detection
YOLOv10Varies (n,s,m,l,x)AGPL-3.0VariableEnd-to-end detection, no NMS neededEfficient object detection
SAM 2TransformerApache 2.0VariableSegment anything in images/videos, 44 FPSImage/video segmentation
Florence-2230M/770MMIT1-2GBLightweight VLM, captioning, detectionVision-language tasks
Grounding DINOTransformerApache 2.0VariableOpen-set detection, 52.5 AP COCO zero-shotZero-shot object detection
LLaVA 1.67B/13B/34BApache 2.014-68GBLarge language and vision assistantMultimodal conversations
MiniGPT-47B/13BBSD 3-Clause14-26GBAligned vision encoder with LLMImage understanding, creativity
BLIP-22.7B/7.8BBSD 3-Clause6-16GBQ-Former bridging vision and languageVision-language pre-training
PaLI-35BApache 2.010GBMultilingual vision-language, 100+ languagesMultilingual VL tasks
PaddleOCR 3.0VariousApache 2.0VariablePP-OCRv5, 13-point accuracy gainOCR, document parsing
TrOCRTransformerMITVariableEnd-to-end text recognitionHandwritten text OCR
Donut200MMIT1GBOCR-free document understandingDocument AI, form parsing
LayoutLMv3134MMIT500MBDocument understanding, 83.37 ANLS DocVQADocument layout analysis
Granite-Docling-258M258MApache 2.01GBEnd-to-end document conversion, 30x fasterEnterprise document processing
CLIP (OpenAI)ViT-L/14MITVariableVision-language contrastive learningImage embeddings, zero-shot
OpenCLIPViT-G/14Apache 2.0VariableOpen source CLIP implementationLarge-scale image embeddings
SigLIP 2VariousApache 2.0VariableMultilingual vision-language, sigmoid lossImproved semantic understanding
OpenVisionVariousApache 2.0Variable2-3x faster training than CLIPEfficient vision encoding
BGE Reranker v2-M3600MApache 2.01.2GBMultilingual reranking, SOTA performanceRAG, search reranking
Jina Reranker v2BaseApache 2.0Variable6x faster, multilingual, function-callingAgentic RAG, code search
ColBERTBERT-basedMITVariableEfficient neural search with late interactionInformation retrieval
E5-Large-v2335MMIT1.3GBMicrosoft's text embedding modelText similarity, retrieval
ChronosVariousApache 2.0VariableTime series foundation model, 250x fasterTime series forecasting
TimesFM200MApache 2.0800MBGoogle's time series model, 100B time-pointsZero-shot forecasting
Moirai 2.0TransformerApache 2.0Variable#1 on GIFT-Eval benchmark, decoder-onlyUniversal forecasting
ProphetStatisticalMITLightMeta's forecasting tool with seasonalityBusiness forecasting
NeuralProphetNeuralMITVariable55-92% accuracy improvement over ProphetInterpretable forecasting
Wav2Vec2LargeMITVariableSelf-supervised speech representationSpeech recognition, ASR
WavLM316MMIT1.2GBSpeaker verification, diarizationSpeaker tasks, speech processing
SpeechT5TransformerMITVariableUnified speech-text pre-trainingSpeech synthesis, recognition
TabNetVariousApache 2.0VariableAttention-based tabular learningStructured data, tabular ML
XGBoostTree-basedApache 2.0LightExtreme gradient boostingTabular data, competitions
LightGBMTree-basedMITLightFast gradient boosting frameworkEfficient tabular learning

Hardware Requirements

Consumer Hardware (12-24GB)

  • RTX 4090: 24GB - up to 13B models
  • RTX 4080: 16GB - up to 7B models
  • Ultra-Light: Gemma 3 270M, TinyLlama 1.1B
  • Recommended: CroissantLLM 1.3B, IBM Granite 3.0 8B
  • Edge Reasoning: MobileLLM-R1 950M
  • Search/Embedding: CLIP, all-MiniLM-L6-v2
  • Audio: Wav2Vec2, SpeechT5
  • Tabular: XGBoost, LightGBM, TabNet
  • Quantization: GGUF Q4/Q8, QLoRA 4-bit
  • Mobile: 48 tokens/sec on Snapdragon X Elite

Professional (48-80GB)

  • A100 80GB: Single GPU up to 30B
  • H100 80GB: Faster training, larger batches
  • Recommended: OpenAI GPT-OSS 20B, Qwen 2.5-Max
  • Specialists: BioGPT, Cisco Foundation-sec, ChemBERTa
  • Regional: Yi 1.5 34B, Baichuan 4, ChatGLM-4.5
  • Time Series: TimesFM 200M, Chronos-T5, Moirai 2.0
  • Retrieval: BGE Reranker v2-M3, ColBERT-v2, E5-Large-v2
  • Techniques: DeepSpeed ZeRO Stage 2
  • Fine-tuning: Full parameter or large LoRA

Enterprise (Multi-GPU)

  • 2-8x H100: 70B+ models
  • Multi-node: 400B+ models like DeepSeek R1
  • Latest Flagship: DeepSeek V3.1 685B (MIT license)
  • Enterprise: OpenAI GPT-OSS 120B, BLOOM 176B
  • Advanced: Qwen-Image-Edit 20B, OpenVLA 7B
  • Scientific: UMA (Meta), BloombergGPT 50B
  • Vision: SAM 2, YOLO v11, Florence-2, OpenCLIP
  • Audio/Speech: WavLM 316M, large Wav2Vec2 models
  • Techniques: DeepSpeed ZeRO Stage 3, FSDP
  • Infrastructure: InfiniBand, NVLink

Licensing & Legal Considerations

Permissive Licenses (Recommended)

  • Apache 2.0: Mistral, Qwen, EleutherAI models
  • MIT: Phi-4, some research models
  • Benefits: Commercial use, modification, distribution
  • Requirements: Attribution, license inclusion
  • Patent Protection: Apache 2.0 provides coverage

Custom Licenses (Caution)

  • Meta Llama: Custom license with restrictions
  • Gemma: Terms of Use with commercial limits
  • Restrictions: Revenue thresholds, use case limits
  • Derivative Works: Complex fine-tuning implications
  • Legal Review: Required for commercial use

Enterprise Considerations

  • Legal Compliance: OSI-approved preferred
  • Liability: No warranty in any open source
  • IP Rights: Unclear derivative work ownership
  • Commercial Support: Available for some models
  • Risk Assessment: Balance capability vs legal risk

Performance Insights

Key Performance Factors

  • Inference Speed: Llama 3 > Mistral > Qwen > Gemma
  • Reasoning: DeepSeek R1 > Phi-4 > Llama 3.3
  • Multilingual: Qwen 2.5 ≈ Llama 3.3 > others
  • Code Quality: DeepSeek Coder > Qwen Coder > Phi-4
  • Fine-tuning Speed: Smaller models train 2-5x faster

Cost Considerations

  • Training Cost: Scales quadratically with model size
  • Inference Cost: DeepSeek models 90% cheaper than others
  • Hardware: 70B models require $10K+ in GPUs
  • Cloud Training: $13 (LoRA) vs $322 (full fine-tuning)
  • Long-term: Consider inference volume costs

Quick Decision Guide

Start Here (Budget < $5K)

  • General: Phi-4 (14B) - MIT license
  • Commercial: Mistral Small 3.1 - Apache 2.0
  • Hardware: RTX 4090 or cloud instances
  • Technique: QLoRA 4-bit fine-tuning

Scale Up (Budget $5K-50K)

  • Performance: Llama 3.3 70B or Qwen 2.5 72B
  • Commercial: Check licensing carefully
  • Hardware: 2-4x A100/H100 GPUs
  • Technique: DeepSpeed ZeRO + LoRA

Enterprise (Budget $50K+)

  • Performance: DeepSeek R1 for reasoning
  • Reliable: Llama 3.3 for production
  • Infrastructure: Multi-node clusters
  • Support: Consider commercial partnerships

Fine-Tuning Guide

closed
🚀

Getting Started

3
🧪

Methods & Techniques

1
⚙️

Implementation

1
🌐

Deployment

2