Agentic RAG in Practice: Strategies for Smarter Retrieval
Agentic RAG in Practice: Strategies for Smarter Retrieval
TLDR
- Standard RAG struggles when query complexity increases.
- Advanced RAG adds pre/post retrieval logic plus branching strategies.
- Agentic RAG introduces an orchestrator that selects strategies and self-corrects.
- Metadata filtering, re-ranking, and hybrid search dramatically improve accuracy.
- AWS now provides notebooks, APIs, and patterns for production-grade agentic RAG.
This session centered on a consistent theme: most RAG issues aren't LLM issues—they’re retrieval issues. AWS laid out how adding structure, branching, and agentic reasoning to retrieval pipelines drastically improves quality and reduces retries.
Foundational RAG Patterns
Standard RAG is simple: embed content, chop it into chunks, retrieve the top matches, and feed them to the LLM. It's straightforward, scalable, and works great for direct, narrow queries. But as soon as questions become multi-step, ambiguous, or domain-specific, the system often falls apart.
Standard RAG Flow
- User issues a query.
- Retriever pulls chunks based on embeddings similarity.
- LLM generates an answer using whatever context it was given.
The problem: every query gets exactly the same retrieval strategy.
Advanced RAG
Advanced RAG inserts additional intelligence on both sides of retrieval:
Pre-retrieval steps
- Query rewriting
- Classification
- Applying metadata constraints
- Relevance expansion
Post-retrieval steps
- Re-ranking
- Filtering
- Merging and deduplicating
The result: a far more context-aware pipeline that can adapt retrieval to the shape of the user's question.
Advanced RAG Techniques
Conditional Branching
This technique routes a query to one appropriate vector store using rules or heuristics. For example, internal HR policy questions vs. product documentation vs. code snippets.
A lightweight routing step drastically improves relevance when you have multiple domains.
Parallel Branching & Retrieval Fusion
Instead of routing to just one store, the system can:
- Reformulate the query in multiple ways.
- Send each version to different vector stores (or different retrieval methods).
- Combine the results through a fusion step.
This pattern is ideal for broad questions, underspecified queries, or highly heterogeneous knowledge bases.
Query Reformulation in Bedrock
AWS demonstrated the RAG API’s ability to automatically generate multiple sub-queries from a single user query. Each sub-query is independently retrieved, and the system then pools and ranks the results.
The benefit is improved recall without manually authoring prompt templates or hand-tuning retrieval logic. It’s especially effective when the initial user query lacks specificity.
Self-Corrective Agentic RAG
This was the highlight of the session. Instead of retrieval being a fixed pipeline, you introduce a central agent that orchestrates the entire workflow.
A self-corrective loop looks like this:
- User posts a question.
- The agent retrieves context and evaluates relevance.
- The agent selects a strategy:
- Query expansion
- Query decomposition
- Or combined strategies
- The LLM generates a response.
- A quality check evaluates:
- Relevance
- Completeness
- Factual accuracy
- If the response fails, the agent loops and adapts.
- After several attempts or a satisfactory answer, it finalizes.
This is the emerging canonical pattern for reliable RAG—dynamic, adaptive, and quality-aware.
Enhancing RAG Accuracy
The talk separated accuracy into two flows: ingestion and retrieval.
Ingestion Improvements
- Better chunking strategy (structural + semantic)
- Parsing using foundation models for accuracy
- Multimodal parsing for scanned documents or images
- Metadata labeling (critical for filtering and access control)
Retrieval Improvements
- Metadata filtering (including tenant isolation)
- Re-ranking using cross encoders or LLM scoring
- Hybrid search (sparse + dense)
A notable callout:
Access control for vector stores using metadata filtering with Amazon Bedrock Knowledge Bases
Key Takeaways
- RAG failures typically arise from misaligned retrieval strategies, not bad LLMs.
- An orchestrator agent can analyze query complexity and select the appropriate retrieval strategy upfront.
- This reduces retries and significantly improves output quality.
- There is an emerging abstract RAG workflow that production teams can adopt to “right size” retrieval based on query type.
- Chunking, metadata, hybrid search, and re-ranking remain high-leverage accuracy tools.
- AWS now provides a complete notebook demonstrating self-corrective agentic RAG patterns.
Notebook reference:
https://github.com/aws-samples/amazon-bedrock-samples/tree/main/rag/knowledge-bases/use-case-examples/agentic-self-corrective-rag-kb-langraph
Further Reading & Resources
- AWS re:Invent session catalog: https://reinvent.awsevents.com