A Systematic Framework for Experiment Tracking and Model Promotion in Enterprise MLOps Using MLflow and Databricks

Janardhan Reddy Kasireddy

doi:10.15662/IJRAI.2023.0601006

Authors

Janardhan Reddy Kasireddy Lead Data Engineer, Info Drive Systems, USA Author

DOI:

https://doi.org/10.15662/IJRAI.2023.0601006

Keywords:

MLflow, Databricks, MLOps, model registry, experiment tracking, reproducibility, CI/CD for ML

Abstract

The high pace of adoption of Machine Learning (ML) into the business setting has identified the necessity of organized procedures that can be used to coordinate experiments, models, and deployment pipelines. This paper introduces an experimental tracking and model promotion system in MLOps based on MLflow and Databricks, showing a realistic way of enhancing reproducibility, governance, and operational efficiency procedures. The Wine Quality Dataset which is familiar with the benchmarks of regression and classification activities was used to evaluate the reproducibility of the ML experiments and efficiency in a model promotion workflow. The feature engineering, hyperparameter optimization, and model comparison between Random Forest, Gradient Boosting, and XGBoost algorithms were experimented and monitored in an orderly manner with the help of MLflow. Findings show that MLflow substantially lowered errors especially in the manual tracking, enhanced reproducibility of experimental results more than 80 times, and facilitated the process of promoting validated models to production with the minimum amount of human intervention. Also, the tightening with Databricks supported the scalability that made it possible to cooperate with other teams in the model development, and the audit trails helped to keep things in compliance. This paper indicates that structured MLOps using MLflow and Databricks can achieve significant enhancements to the management of the lifecycle of the ML, reproducibility, and operational reliability.

References

[1] S. Alla and S. K. Adari, Beginning MLOps with MLflow, Berkeley, CA, USA: Apress, 2021, ch. 3, pp. 79–124, doi: 10.1007/978-1-4842-6549-9_3.

[2] J. Bradley, R. Kurlansik, M. Thomson, and N. Turbitt, The Big Book of MLOps, Databricks, 2022.

[3] M. M. John, H. H. Olsson, and J. Bosch, “Towards MLOps: A framework and maturity model,” in Proc. 47th Euromicro Conf. Software Engineering and Advanced Applications (SEAA), Palermo, Italy, Sept. 2021, pp. 1–8, doi: 10.1109/SEAA53835.2021.00050.

[4] L. E. Lwakatare, I. Crnkovic, and J. Bosch, “DevOps for AI—Challenges in development of AI-enabled applications,” in Proc. Int. Conf. Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, Sept. 2020, pp. 1–6, doi: 10.23919/SoftCOM50211.2020.9238323.

[5] B. M. A. Matsui and D. H. Goya, “MLOps: Five steps to guide its effective implementation,” in Proc. 1st Int. Conf. AI Engineering: Software Engineering for AI, Pittsburgh, PA, USA, May 2022, pp. 33–34, doi: 10.1145/3522664.3528611.

[6] G. Recupito, F. Pecorelli, G. Catolino, S. Moreschini, D. D. Nucci, F. Palomba, and D. A. Tamburri, “A multivocal literature review of MLOps tools and features,” in Proc. 48th Euromicro Conf. Software Engineering and Advanced Applications (SEAA), Gran Canaria, Spain, Aug. 2022, pp. 84–91, doi: 10.1109/SEAA56994.2022.00021.

[7] P. Ruf, M. Madan, C. Reich, and D. Ould-Abdeslam, “Demystifying MLOps and presenting a recipe for the selection of open-source tools,” Applied Sciences, vol. 11, no. 19, p. 8861, Sept. 2021, doi: 10.3390/app11198861.

[8] H. Skogström, “The MLOps stack,” 2020. [Online]. Available: https://valohai.com/blog/the-mlops-stack/

[9] G. Symeonidis, E. Nerantzis, A. Kazakis, and G. A. Papakostas, “MLOps—Definitions, tools and challenges,” in Proc. IEEE 12th Annu. Computing and Communication Workshop and Conf. (CCWC), Las Vegas, NV, USA, Jan. 2022, pp. 453–460, doi: 10.1109/CCWC54503.2022.9720902.

[10] R. Subramanya, S. Sierla, and V. Vyatkin, “From DevOps to MLOps: Overview and application to electricity market forecasting,” Applied Sciences, vol. 12, no. 19, p. 9851, 2022, doi: 10.3390/app12199851.

[11] D. A. Tamburri, “Sustainable MLOps: Trends and challenges,” in Proc. 22nd Int. Symp. Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, Sept. 2020, pp. 17–23, doi: 10.1109/SYNASC51798.2020.00015.

[12] M. Testi, M. Ballabio, E. Frontoni, G. Iannello, S. Moccia, P. Soda, and G. Vessio, “MLOps: A taxonomy and a methodology,” IEEE Access, vol. 10, pp. 63606–63618, 2022, doi: 10.1109/ACCESS.2022.3181730.