Foundation Models for Code: Accuracy vs. Security Trade-offs
DOI:
https://doi.org/10.15662/IJRAI.2022.0503002Keywords:
Foundation models, Code generation, odex, CodeT5, Accuracy vs. security, Vulnerabilities in generated code, Secure-by-design AI coding, Prompt engineering, 2021 code modelsAbstract
Foundation models trained on code—such as Codex, CodeT5, and other large language models (LLMs)— have demonstrated significant prowess in generating accurate and functional code snippets across multiple programming languages in 2021. For instance, Codex achieved roughly 29% success on HumanEval problems with a single sample, and scaled up to ~70% with repeated samplingarXiv. Parallel advancements like CodeT5 enhanced both code understanding and generation through identifier-aware pre-training, improving performance in defect detection and code synthesisarXiv. Despite such promising accuracy, concerns regarding the security of AI-generated code also emerged in 2021. Industry reports and analyses warned that code generated by LLMs often introduced vulnerabilities—such as SQL injection, crosssite scripting (XSS), and insecure dependencies—sometimes as high as 30–50% of outputsMediumTechTarget. Sectored studies emphasized that LLMs replicate insecure patterns present in their training data and may omit necessary security controls unless explicitly directedCloud Security AlliancePreventing the Unpreventable | Qwietᴬᴵ. This paper explores the trade-offs between functional accuracy and security in foundation models for code. We review key 2021-era models, identify common classes of generated vulnerabilities, and compare performance across benchmarks. We also investigate how prompt engineering, dataset curation, and inference-time filters can mitigate risk without degrading utility. Our methodology includes empirical evaluation of Codex and CodeT5 on both functional correctness (HumanEval) and security analysis using common vulnerability patterns. Results highlight that accuracy improvements often come with a non-negligible increase in security risk, underscoring the need for design decisions that balance both dimensions. We discuss strategies for achieving safer deployments in developer tools—emphasizing secure-by-default generation, postgeneration scanning, and developer awareness.
References
1. Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv. arXiv
2. Wang, Y., et al. (2021). CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. arXiv. arXiv
3. Security analysis of AI-generated code—vulnerability prevalence ~30–50%. Medium
4. AI-generated code replicates insecure patterns and omits security logic. Cloud Security AlliancePreventing the Unpreventable | Qwietᴬᴵ
5. AI code generation tools may produce insecure code by default. TechT