Agentic Design

Patterns
โšก

Advanced RAG(ARAG)

Enhanced RAG with pre-retrieval and post-retrieval optimizations including query expansion, reranking, and context curation

Complexity: mediumKnowledge Retrieval (RAG)

๐ŸŽฏ 30-Second Overview

Pattern: Enhanced retrieval pipeline with query preprocessing, multi-stage retrieval, neural reranking, and context optimization

Why: Addresses limitations of naive RAG through query understanding, relevance scoring, and context quality optimization

Key Insight: Pre-retrieval optimization + post-retrieval processing significantly improves accuracy and relevance

โšก Quick Implementation

1Pre-process:Query expansion, rewriting, routing
2Multi-retrieve:Multiple retrieval strategies and sources
3Rerank:Neural rerankers (BGE, Cohere, etc.)
4Filter:Relevance scoring and context selection
5Generate:Context-optimized generation with citations
Example: expand_query โ†’ multi_retrieve โ†’ neural_rerank โ†’ filter_context โ†’ generate_with_citations

๐Ÿ“‹ Do's & Don'ts

โœ…Implement query expansion (HyDE, query2doc)
โœ…Use neural rerankers (BGE-reranker, Cohere rerank)
โœ…Apply sentence window retrieval for context preservation
โœ…Implement relevance filtering with confidence thresholds
โœ…Use multiple embedding models for retrieval diversity
โŒSkip query preprocessing and expansion
โŒRely solely on semantic similarity for ranking
โŒIgnore document quality and freshness signals
โŒOver-retrieve without proper filtering mechanisms
โŒNeglect context window optimization

๐Ÿšฆ When to Use

Use When

  • โ€ข Production RAG systems requiring high accuracy
  • โ€ข Complex queries needing contextual understanding
  • โ€ข Large knowledge bases with noisy content
  • โ€ข Multi-domain or heterogeneous data sources
  • โ€ข Applications requiring source attribution

Avoid When

  • โ€ข Simple factual Q&A with clean data
  • โ€ข Resource-constrained environments
  • โ€ข Real-time applications (<100ms latency)
  • โ€ข Small knowledge bases with high-quality content
  • โ€ข Proof-of-concept or prototype systems

๐Ÿ“Š Key Metrics

Retrieval Precision
Relevant docs in top-k after reranking
Answer Faithfulness
Generated content grounded in retrieved docs
Context Relevance
Retrieved context relevance to query
Reranking Effectiveness
NDCG@k improvement vs base retrieval
Query Understanding
Semantic similarity after expansion/rewriting
End-to-End Latency
Including pre-processing and reranking overhead

๐Ÿ’ก Top Use Cases

Enterprise Search: Complex queries over large corporate knowledge bases with reranking
Legal Research: Multi-hop reasoning over case law with query expansion and relevance filtering
Medical Q&A: Clinical queries with domain-specific rerankers and confidence scoring
Technical Documentation: Developer queries with code-aware retrieval and context optimization
Research Assistant: Academic queries with citation tracking and multi-source retrieval

References & Further Reading

Deepen your understanding with these curated resources

Contribute to this collection

Know a great resource? Submit a pull request to add it.

Contribute

Patterns

closed

Loading...