An Architecture-Centric Framework for Secure and Scalable AI Deployment Using Amazon Web Services
DOI:
https://doi.org/10.15662/IJRAI.2024.0705018Keywords:
AI deployment, Amazon Web Services, cloud computing, security framework, scalability, machine learning, cloud architectureAbstract
The research article provides an architecture-based framework which is intended to be the secure and scalable execution of Artificial Intelligence (AI) applications relying on Amazon Web Services (AWS). This is due to increasing requirements to protect and regulate IAIs as AI implementation is fast in most industries. The proposed framework is concerned with the security and scalability problems and offers the comprehensive solution of applying AI models and applications to a cloud. The framework integrates the optimal cloud computing, machine learning and cybersecurity best practices and therefore, the application of AI to AWS is robust, tough, and capable of adapting to the evolving threats. The significant components of the framework include safe data processing, access permission, scalability of the management of infrastructure, and continual oversight of the AI functions. The paper also indicates the need to implement the appropriate security requirements such as encryption, identity controls, and testing the vulnerability and maintaining the ability to gage AI models effectively and without a significant amount of performance deterioration. One can also find a case study that demonstrates how the framework may be put into practice which certifies its effectiveness in the practice implementations. The results prove what the framework is able to handle the complex AI loads, ensure the data confidentiality, and compliance with the regulations, which is supported by the effective allocation of resources and cost management on AWS.
References
1. S. Venkataraman, “AI goes serverless: Are systems ready?” ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ai-goes-serverless-are-systemsready/.
2. L. Kothokatta, “Scalable validation and continuous verification of AI/ML systems on AWS using Python-based automation,” International Journal of Advanced Engineering Science and Information Technology (IJAESIT), vol. 3, no. 5, pp. 5131–5138, 2020.
3. J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fastgshare: Enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635–644. [Online]. Available: https://arxiv.org/abs/2309.00558.
4. AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/dg/bestpractices.html.
5. M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online]. Available: https://ieeexplore.ieee.org/document/9546452.
6. W.-Q. Ren, Y.-B. Qu, C. Dong, Y.-Q. Jing, H. Sun, Q.-H. Wu, and S. Guo, “A survey on collaborative DNN inference for edge intelligence,” Machine Intelligence Research, vol. 20, no. 3, pp. 370–395, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s11633-022-1391-7.
7. Kubeflow Authors, “What is KServe?” Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/external-addons/kserve/introduction/.
8. K. Kojs, “A survey of serverless machine learning model inference,” arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https://arxiv.org/abs/2311.13587.
9. Y. Yang, L. Zhao, Y. Li, H. Zhang, J. Li, M. Zhao, X. Chen, and K. Li, “Infless: a native serverless system for low-latency, high-throughput inference,” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022, pp. 768–781. [Online]. Available: https://doi.org/10.1145/3503222.3507709.
10. Y. Yu, J. Liu, H. Liu, B. Yu, and Y. Wang, “Faaswap: Cost-effective pre-warming of serverless functions using learning-based scheduling,” 2023. [Online]. Available: https://arxiv.org/abs/2306.03622.
11. C. McKinnel, “Massively parallel machine learning inference using AWS Lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/massively-parallel-machinelearning-inference-using-aws-lambda.html.
12. A. Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2023, pp. 1–6. [Online]. Available: https://repository.ubn.ru.nl/bitstream/handle/2066/308588/308588.pdf?sequence=1.
13. M. Li, X. Zhang, J. Guo, and F. Li, “Cloud–edge collaborative inference with network pruning,” Electronics, vol. 12, no. 17, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/17/3598.
14. D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. R. Devanur, G. R. Ganger, P. B. Gibbons, and M. Zaharia, “Pipedream: generalized pipeline parallelism for DNN training,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 1–15. [Online]. Available: https://doi.org/10.1145/3341301.3359646.
15. L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “Coedge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 595–608, 2021. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9535932.





