Trustworthy LLM Agents for Autonomous Decision-Making

Shraddha Rajeshwar Iyer

doi:10.15662/IJRAI.2024.0701002

Authors

Shraddha Rajeshwar Iyer R.L.S. Govt. College, Kaladera, Jaipur- Rajasthan, India Author

DOI:

https://doi.org/10.15662/IJRAI.2024.0701002

Keywords:

Trustworthy AI, LLM Agents, Autonomous Decision-Making, Explainability, Human-Centered AI, DevOps for AI, Policy-as-a-Service (PaaS), Self-Monitoring Agents, Trust Metrics, Runtime Compliance

Abstract

s large language models (LLMs) increasingly serve as autonomous agents—making decisions and taking actions with minimal human oversight—the question of trustworthiness becomes paramount. This paper examines the challenges and approaches to building trustworthy LLM-based agents capable of autonomous decision-making, focusing on the landscape before 2022. We explore foundational principles of trustworthy AI agents, such as reliability, safety, explainability, human-centered design, and policy-aware behavior. Core frameworks include human-centered trust metrics for autonomous systems and DevOps-aligned trust in AI lifecycles. These guide the development of LLM agents that adapt, self-monitor, and respect human values. Our literature review emphasizes integrating continuous monitoring and agile development practices for AI agents, extending trustworthy design into runtime. Recognizing agents' socio-technical complexity, we also consider policy-asa-service approaches that embed ethical and regulatory norms into agent behavior. The methodology proposes designing autonomous agents with layered trust mechanisms: robust planning systems, selfreflection capabilities, human-in-the-loop checkpoints for high-stakes decisions, and runtime policy enforcement interfaces. Advantages of such designs include adaptability, improved reliability, accountability, and compliance. Challenges include modeling human trust in AI, aligning complex behaviors with regulation, developing real-time oversight infrastructure, and balancing autonomy against safety. We conclude that while trustworthy LLM agents remain aspirational, pre-2022 foundations provide a coherent roadmap combining engineering, process, and policy perspectives. Future directions include formal verification of agent behaviors, interpretable reasoning modules, multi-agent trust protocols, and holistic development-to-deployment pipelines that bake in trust from design to runtime.

References

1. Martínez-Fernández, S., Franch, X., Jedlitschka, A., Oriol, M., & Trendowicz, A. (2020). Developing and Operating Artificial Intelligence Models in Trustworthy Autonomous Systems. arXiv arXiv.

2. He, H., Gray, J., Cangelosi, A., Meng, Q., McGinnity, T. M., & Mehnen, J. (2021). The Challenges and Opportunities of Human-Centered AI for Trustworthy Robots and Autonomous Systems. arXiv arXiv.

3. Morris, A., Siegel, H., & Kelly, J. (2020). Towards a Policy-as-a-Service Framework to Enable Compliant, Trustworthy AI and HRI Systems in the Wild. arXiv