Loading...
Test-Time Scaling(TTS)
Improving model performance through increased computation during inference rather than larger models
๐ฏ 30-Second Overview
Pattern: Allocate additional compute at inference time to improve performance through multiple attempts and verification
Why: Enables better accuracy on complex tasks, flexible compute allocation, and performance scaling without retraining
Key Insight: More inference-time compute can substitute for larger models or more training data on reasoning tasks
โก Quick Implementation
๐ Do's & Don'ts
๐ฆ When to Use
Use When
- โข Complex reasoning tasks requiring multiple solution paths
- โข High-stakes decisions where accuracy is prioritized over speed
- โข Problems where verification is easier than generation
- โข Tasks with clear objective evaluation criteria
- โข Applications with flexible inference time budgets
Avoid When
- โข Simple tasks with straightforward solutions
- โข Real-time applications with strict latency requirements
- โข Problems without reliable verification methods
- โข Cost-sensitive applications with tight budgets
- โข Tasks where first attempt is typically sufficient
๐ Key Metrics
๐ก Top Use Cases
References & Further Reading
Deepen your understanding with these curated resources
Search & Tree-Based Reasoning
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023)
Graph of Thoughts: Solving Elaborate Problems with Large Language Models (Besta et al., 2023)
AlphaCode: Competition-Level Code Generation with Search (Li et al., 2022)
Learning to Search with Language Models (Beurer-Kellner et al., 2023)
Self-Consistency & Majority Voting
Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022)
Complex Reasoning: The Divide and Conquer Approach (Zhou et al., 2022)
Least-to-Most Prompting Enables Complex Reasoning (Zhou et al., 2022)
Universal Self-Consistency for Large Language Model Generation (Chen et al., 2023)
Recent Advances (2023-2024)
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (Zelikman et al., 2024)
Rest-of-World Latent Search (RoWLS) for Test-Time Scaling (Chen et al., 2024)
Test-Time Training for Large Language Models (Liu et al., 2024)
Inference-Time Scaling Laws for Large Language Models (Snell et al., 2024)
Mathematical & Scientific Reasoning
Solving Quantitative Reasoning Problems with Language Models (Lewkowycz et al., 2022)
MATH: Measuring Mathematical Problem Solving (Hendrycks et al., 2021)
Competition-Level Mathematics with AlphaGeometry (Trinh et al., 2024)
FunSearch: Making New Discoveries in Mathematical Sciences (Romera-Paredes et al., 2023)
Code Generation & Programming
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models (Wang et al., 2021)
Competition-Level Code Generation with AlphaCode (Li et al., 2022)
Code as Policies: Language Model Programs for Embodied Control (Liang et al., 2022)
Self-Debugging: Teaching Language Models to Debug Programs (Chen et al., 2023)
Contribute to this collection
Know a great resource? Submit a pull request to add it.
Test-Time Scaling(TTS)
Improving model performance through increased computation during inference rather than larger models
๐ฏ 30-Second Overview
Pattern: Allocate additional compute at inference time to improve performance through multiple attempts and verification
Why: Enables better accuracy on complex tasks, flexible compute allocation, and performance scaling without retraining
Key Insight: More inference-time compute can substitute for larger models or more training data on reasoning tasks
โก Quick Implementation
๐ Do's & Don'ts
๐ฆ When to Use
Use When
- โข Complex reasoning tasks requiring multiple solution paths
- โข High-stakes decisions where accuracy is prioritized over speed
- โข Problems where verification is easier than generation
- โข Tasks with clear objective evaluation criteria
- โข Applications with flexible inference time budgets
Avoid When
- โข Simple tasks with straightforward solutions
- โข Real-time applications with strict latency requirements
- โข Problems without reliable verification methods
- โข Cost-sensitive applications with tight budgets
- โข Tasks where first attempt is typically sufficient
๐ Key Metrics
๐ก Top Use Cases
References & Further Reading
Deepen your understanding with these curated resources
Search & Tree-Based Reasoning
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023)
Graph of Thoughts: Solving Elaborate Problems with Large Language Models (Besta et al., 2023)
AlphaCode: Competition-Level Code Generation with Search (Li et al., 2022)
Learning to Search with Language Models (Beurer-Kellner et al., 2023)
Self-Consistency & Majority Voting
Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022)
Complex Reasoning: The Divide and Conquer Approach (Zhou et al., 2022)
Least-to-Most Prompting Enables Complex Reasoning (Zhou et al., 2022)
Universal Self-Consistency for Large Language Model Generation (Chen et al., 2023)
Recent Advances (2023-2024)
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (Zelikman et al., 2024)
Rest-of-World Latent Search (RoWLS) for Test-Time Scaling (Chen et al., 2024)
Test-Time Training for Large Language Models (Liu et al., 2024)
Inference-Time Scaling Laws for Large Language Models (Snell et al., 2024)
Mathematical & Scientific Reasoning
Solving Quantitative Reasoning Problems with Language Models (Lewkowycz et al., 2022)
MATH: Measuring Mathematical Problem Solving (Hendrycks et al., 2021)
Competition-Level Mathematics with AlphaGeometry (Trinh et al., 2024)
FunSearch: Making New Discoveries in Mathematical Sciences (Romera-Paredes et al., 2023)
Code Generation & Programming
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models (Wang et al., 2021)
Competition-Level Code Generation with AlphaCode (Li et al., 2022)
Code as Policies: Language Model Programs for Embodied Control (Liang et al., 2022)
Self-Debugging: Teaching Language Models to Debug Programs (Chen et al., 2023)
Contribute to this collection
Know a great resource? Submit a pull request to add it.