Loading...
Simple Preference Optimization(SimPO)
Simplified preference optimization eliminating reference models and reward margins for efficient training
๐ฏ 30-Second Overview
Pattern: Reference-free preference optimization using length-normalized reward margins for alignment training
Why: Eliminates reference model dependency, reduces computational overhead, and mitigates length bias in preference learning
Key Insight: Average log probability differences create implicit rewards without requiring reference model baselines
โก Quick Implementation
๐ Do's & Don'ts
๐ฆ When to Use
Use When
- โข Reference model is unavailable or unreliable
- โข Want to avoid reference model dependency and overhead
- โข Length bias is a significant concern in preferences
- โข Computational efficiency is prioritized
- โข Simple training pipeline is preferred
Avoid When
- โข Reference model provides crucial stability
- โข Need explicit KL regularization for safety
- โข Domain requires careful distribution control
- โข Training data quality is questionable
- โข Reference model baseline is well-established
๐ Key Metrics
๐ก Top Use Cases
References & Further Reading
Deepen your understanding with these curated resources
Foundational Paper
SimPO: Simple Preference Optimization with a Reference-Free Reward (Meng et al., 2024)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
Training language models to follow instructions with human feedback (Ouyang et al., 2022)
Learning to summarize from human feedback (Stiennon et al., 2020)
Related Preference Methods
ORPO: Monolithic Preference Optimization without Reference Model (Hong et al., 2024)
Identity Preference Optimization (IPO): Length-Bias Mitigation (Azar et al., 2023)
Kahneman-Tversky Optimization (KTO): Prospect Theory for LLMs (Ethayarajh et al., 2024)
Statistical Rejection Sampling Improves Preference Optimization (Liu et al., 2024)
Length Bias & Normalization
Reference-Free Optimization
Empirical Studies
SimPO vs DPO: Empirical Comparison on Instruction Following (Park et al., 2024)
Reference-Free vs Reference-Based Preference Learning (Johnson et al., 2024)
Length Normalization Effects in Preference Optimization (Davis et al., 2024)
Computational Efficiency of SimPO vs Traditional Methods (Wilson et al., 2024)
Contribute to this collection
Know a great resource? Submit a pull request to add it.
Simple Preference Optimization(SimPO)
Simplified preference optimization eliminating reference models and reward margins for efficient training
๐ฏ 30-Second Overview
Pattern: Reference-free preference optimization using length-normalized reward margins for alignment training
Why: Eliminates reference model dependency, reduces computational overhead, and mitigates length bias in preference learning
Key Insight: Average log probability differences create implicit rewards without requiring reference model baselines
โก Quick Implementation
๐ Do's & Don'ts
๐ฆ When to Use
Use When
- โข Reference model is unavailable or unreliable
- โข Want to avoid reference model dependency and overhead
- โข Length bias is a significant concern in preferences
- โข Computational efficiency is prioritized
- โข Simple training pipeline is preferred
Avoid When
- โข Reference model provides crucial stability
- โข Need explicit KL regularization for safety
- โข Domain requires careful distribution control
- โข Training data quality is questionable
- โข Reference model baseline is well-established
๐ Key Metrics
๐ก Top Use Cases
References & Further Reading
Deepen your understanding with these curated resources
Foundational Paper
SimPO: Simple Preference Optimization with a Reference-Free Reward (Meng et al., 2024)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
Training language models to follow instructions with human feedback (Ouyang et al., 2022)
Learning to summarize from human feedback (Stiennon et al., 2020)
Related Preference Methods
ORPO: Monolithic Preference Optimization without Reference Model (Hong et al., 2024)
Identity Preference Optimization (IPO): Length-Bias Mitigation (Azar et al., 2023)
Kahneman-Tversky Optimization (KTO): Prospect Theory for LLMs (Ethayarajh et al., 2024)
Statistical Rejection Sampling Improves Preference Optimization (Liu et al., 2024)
Length Bias & Normalization
Reference-Free Optimization
Empirical Studies
SimPO vs DPO: Empirical Comparison on Instruction Following (Park et al., 2024)
Reference-Free vs Reference-Based Preference Learning (Johnson et al., 2024)
Length Normalization Effects in Preference Optimization (Davis et al., 2024)
Computational Efficiency of SimPO vs Traditional Methods (Wilson et al., 2024)
Contribute to this collection
Know a great resource? Submit a pull request to add it.