Advanced Architectural Frameworks for Scalable, Production-Grade Agentic RAG Pipelines

Sanjay Nakharu Prasad Kumar

doi:10.15662/IJRAI.2026.0901001

Authors

Sanjay Nakharu Prasad Kumar Senior IEEE Member, USA Author

DOI:

https://doi.org/10.15662/IJRAI.2026.0901001

Keywords:

Retrieval-Augmented Generation, Agentic AI, Production Architecture, Context Engineering, Vector Databases, Knowledge Graphs, Infrastructure as Code

Abstract

The evolution of artificial intelligence from monolithic generative models to modular, retrieval-augmented architectures represents a fundamental shift in enterprise software engineering. This paper presents a comprehensive examination of production-grade Retrieval-Augmented Generation (RAG) systems, introducing a six-layer architectural framework that addresses the limitations of standalone large language models through distributed computing, autonomous reasoning, and rigorous evaluation protocols. Our analysis demonstrates that modern RAG architecture requires systematic context engineering rather than simple retrieval algorithms, with empirical evidence showing 2-3× improvements in GPU utilization through advanced inference engines and up to 90% recall accuracy through layered retrieval strategies. This framework provides enterprise organizations with a blueprint for building reliable, scalable AI systems capable of processing millions of documents while maintaining low latency and high ground fidelity.

References

1. Anyscale. (2024). Optimize performance for Ray Serve LLM. https://www.anyscale.com

2. AWS Documentation. (2024). Compute and autoscaling: Amazon EKS best practices. https://docs.aws.amazon.com

3. AWS Labs. (2024). Ray Serve with vLLM: AI on EKS blueprints [GitHub repository]. https://github.com/aws-samples

4. Chen, W., et al. (2025). RAGOps: Operating and managing RAG pipelines. arXiv. https://arxiv.org/abs/2506.03401

5. Comprehensive AI governance framework: A strategic approach for organizations in dynamic regulatory environments. (2025). International Journal of Engineering & Extended Technologies Research (IJEETR), 7(2), 9653–9660. https://doi.org/10.15662/IJEETR.2025.0702004

6. Data Nucleus. (2025). RAG in 2025: The enterprise guide to retrieval-augmented generation, graph RAG, and agentic AI. https://www.datanucleus.ai

7. Docker. (2024). Docker + E2B: Building the future of trusted AI. https://www.docker.com

8. E2B. (2024). Docker & E2B partner to introduce MCP support. https://e2b.dev

9. External Secrets Operator. (2024). Introduction and documentation. https://external-secrets.io

10. Goswami, P. (2024). Building a scalable RAG data ingestion pipeline. Medium. https://medium.com

11. Khan, F. (2024). Scalable RAG pipeline: A production-grade implementation [GitHub repository]. GitHub. https://github.com

12. Kumar, H. (2025). RAG in 2025: From quick fix to core architecture. Medium. https://medium.com

13. Kumar, S. N. P. (2025a). Fraud detection in banking using generative AI. Sarcouncil Journal of Engineering and Computer Sciences, 4(11), 133–145. https://doi.org/10.5281/zenodo.17634095

14. Kumar, S. N. P. (2025b). Hallucination detection and mitigation in large language models: A comprehensive review. Journal of Information Systems Engineering and Management.

15. Kumar, S. N. P. (2025c). Multi-agent AI systems in finance: Models, applications, and challenges. International Journal of Advanced Research in Computer Science & Technology (IJARCST), 8(1), 11555–11573.

16. Kumar, S. N. P. (2025d). Recent innovations in cloud-optimized retrieval-augmented generation architectures for AI-driven decision systems. Engineering Management Science Journal, 9(4). https://doi.org/10.59573/emsj.9(4).2025.81

17. Kumar, S. N. P. (2025e). Regulating autonomous AI agents: Prospects, hazards, and policy structures. Journal of Computer Science and Technology Studies, 7(10), 393–399.

18. Kumar, S. N. P. (2025f). RMHAN: Random multi-hierarchical attention network with RAG-LLM-based sentiment analysis using text reviews. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. https://www.worldscientific.com/doi/10.1142/S1469026825500075

19. Kumar, S. N. P. (2025g). Scalable cloud architectures for AI-driven decision systems. Journal of Computer Science and Technology Studies. https://al-kindipublishers.org/index.php/jcsts/article/view/10545

20. Kumar, S. N. P. (2025h). AI and cloud data engineering transforming healthcare decisions. SAR Council. https://sarcouncil.com/2025/08/ai-and-cloud-data-engineering-transforming-healthcare-decisions

21. Li, J. (2024). ReAct vs. plan-and-execute: A practical comparison. Dev.to. https://dev.to

22. Neo4j. (2024). RAG tutorial: How to build a RAG system on a knowledge graph. https://neo4j.com

23. Patronus AI. (2024). RAG evaluation metrics: Best practices. https://www.patronus.ai

24. Ray Documentation. (2024). Scalable RAG data ingestion with Ray Data. https://docs.ray.io

25. Red Hat Developer. (2025). Why vLLM is the best choice for AI inference today. https://developers.redhat.com

26. Saish, P. (2024). Production-grade RAG: Architecture, trade-offs, and hard-won lessons. Medium. https://medium.com

27. Sharma, S., et al. (2025). Retrieval-augmented generation: A comprehensive survey. arXiv. https://arxiv.org/abs/2506.00054

28. Sinha, D. (2024). The ultimate guide to chunking strategies for RAG applications. Medium. https://medium.com

29. Towards Data Science. (2024). Is RAG dead? The rise of context engineering. https://towardsdatascience.com

30. Vespa. (2024). Eliminating the precision–latency trade-off in large-scale RAG. https://vespa.ai

31. Zarnecki, M. (2025). LLM & AI agent applications with LangChain and LangGraph. Medium. https://medium.com

32. Zhou, Y., et al. (2025). AgentX: Orchestrating robust agentic workflows. arXiv. https://arxiv.org/abs/2509.07595

Advanced Architectural Frameworks for Scalable, Production-Grade Agentic RAG Pipelines

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

images

Submission

Open Access

License

Information

Keywords

Latest publications