Autonomous Code Generation with Reinforcement-Learning-Based LLMs

Amit Rajiv Malhotra

doi:10.15662/IJRAI.2025.0806003

Authors

Amit Rajiv Malhotra City Engineering College, Bengaluru, Karnataka, India Author

DOI:

https://doi.org/10.15662/IJRAI.2025.0806003

Keywords:

Autonomous Code Generation, Reinforcement Learning, Large Language Models, Code Synthesis, Functional Correctness, Execution-Based Feedback, Human Feedback Alignment, Code Optimization

Abstract

The integration of reinforcement learning (RL) with large language models (LLMs) has significantly advanced autonomous code generation. Traditional supervised learning approaches often overlook the complexities inherent in code synthesis, such as functional correctness and adherence to coding standards. Reinforcement learning addresses these challenges by enabling models to learn from feedback, optimizing code generation processes. This paper explores the synergy between RL and LLMs in code generation, highlighting key methodologies, advancements, and applications. We examine frameworks like CodeRL, which employs actor-critic architectures to refine code outputs through functional correctness feedback. Additionally, we discuss the role of execution-based feedback, as seen in PPOCoder, which utilizes Proximal Policy Optimization to enhance code generation. The paper also delves into the concept of reinforcement learning from human feedback (RLHF), focusing on its application in aligning LLM outputs with human preferences. Furthermore, we address the challenges associated with these approaches, including the need for diverse training data and the complexities of reward signal design. Through a comprehensive review, this paper provides insights into the current landscape of RL-enhanced LLMs for code generation and outlines directions for future research.arXiv+2GitHub+2arXiv+1Wikipedia+1

References

1. Le, H., Wang, Y., Gotmare, A. D., Savarese, S., & Hoi, S. C. H. (2022). CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning. arXiv.

2. Shojaee, P., Jain, A., Tipirneni, S., & Reddy, C. K. (2023). Execution-based Code Generation using Deep Reinforcement Learning. arXiv.

3. Jain, A., Adiole, C., Chaudhuri, S., Reps, T., & Jermaine, C. (2023). Coarse-Tuning Models of Code with Reinforcement Learning Feedback. arXiv.

4. Wikipedia contributors. (2023). Reinforcement learning. Wikipedia.

5. Wikipedia contributors. (2023). Reinforcement learning from human feedback. Wikipedia.

6. Kashyap, A. (2024). Harnessing Advanced Reinforcement Learning and Transformers: Shaping the Future of Autonomous Generation. Medium.

7. Rasheed, Z., Sami, M. A., Kemell, K.-K., Waseem, M., Saari, M., Systä, K., & Abrahamsson, P. (2024). CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology. arXiv.

8. Wikipedia contributors. (2023). Reinforcement learning. Wikipedia.

9. Wikipedia contributors. (2023). Reinforcement learning from human feedback. Wikipedia.