Intelligent Data Lakehouse and MLOps Pipelines for Scalable Predictive Analytics in Cloud-Based Enterprise Systems
DOI:
https://doi.org/10.15662/IJRAI.2025.0806036Keywords:
Intelligent Data Lakehouse, MLOps Pipelines, Cloud-Based Enterprise Systems, Predictive Analytics, Machine Learning, Scalable Analytics, Data Integration, Feature Engineering, Model Deployment, Cloud-Native ArchitectureAbstract
Enterprise systems increasingly rely on large-scale, cloud-based data platforms to support decision-making, operational efficiency, and strategic planning. Traditional data warehouses and lakes often struggle with scalability, integration, and real-time analytics requirements. The emergence of data lakehouse architectures, combined with machine learning operations (MLOps) pipelines, provides an integrated solution for scalable, high-performance predictive analytics in modern enterprise environments.
This research proposes an intelligent data lakehouse architecture integrated with MLOps pipelines to enable scalable predictive analytics in cloud-based enterprise systems. The framework unifies structured, semi-structured, and unstructured data while supporting real-time ingestion, preprocessing, feature engineering, and model deployment. MLOps pipelines automate model training, testing, versioning, deployment, and monitoring, ensuring reproducibility, reliability, and continuous improvement of predictive models.
The architecture leverages cloud-native technologies, including distributed storage, containerized services, and orchestration tools, to optimize resource allocation and scalability. Predictive analytics models provide insights for operational optimization, financial forecasting, customer behavior analysis, and risk assessment. The study highlights the benefits of combining intelligent data lakehouses with MLOps pipelines, including improved model performance, operational efficiency, and governance, while addressing challenges such as data heterogeneity, pipeline complexity, and cross-cloud interoperability.
References
1. Ganesan, G. B. K. (2025). Fraud detection systems in enterprise integration architecture. IJSAT-International Journal on Science and Technology, 16(1).
2. Kamadi, S. (2024). GenAI data engineering: Synthetic data and feature engineering framework for cloud analytics. World Journal of Advanced Research and Reviews, 24(1), 2867–2877. https://doi.org/10.30574/wjarr.2024.24.1.3165
3. Sheta, S. V. (2021). Security vulnerabilities in cloud environments. Webology, 18(6), 10043–10063.
4. Ambati, K. C. (2025). Improving user experience and operational efficiency for smarter procurement management. International Journal of Engineering & Extended Technologies Research, 7(3), 1282–1289.
5. Karnam, A. (2024). Engineering trust at scale: How proactive governance and operational health reviews achieved zero service credits for mission-critical SAP customers. International Journal of Humanities and Information Technology, 6(4), 60–67. https://doi.org/10.21590/ijhit.06.04.11
6. Rajasekaran, M., Sekar, S., Manikandaprabhu, K., Vijayakumar, R., Rajmohan, M., & Murugan, S. (2024, October). Next-gen coaching: IoT and linear regression for adaptive training load management. In 2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) (pp. 224–229). IEEE.
7. Parathraju, P., & Umasankar, P. (2025). Performance evaluation of ultrathin CdTe-based solar cells with dual absorbers via SCAPS-1D simulation. Scientific Reports, 15(1), 26428.
8. Gowda, M. K. S. (2025). Comprehensive audit data pipeline architecture—Strategies for modern banking audit, compliance and risk management. International Journal of Advanced Research in Computer Science & Technology, 8(1), 11590–11597.
9. Uttama Reddy Sanepalli. (2024). Operationalizing MLOps with Databricks pipelines: Scalable machine learning in cloud environments. International Journal of Scientific Research in Science, Engineering and Technology, 10(6), 2544–2552. https://doi.org/10.32628/CSEIT25113573
10. Vaidya, S., Shah, N., Shah, N., & Shankarmani, R. (2020, May). Real-time object detection for visually challenged people. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 311–316). IEEE.
11. Grandhe, K. (2025). Transforming insight into action: The symbiotic relationship between big data analytics and data visualization. International Journal of Emerging Trends in Computer Science and Information Technology, 125–129.
12. Surampudi, Y. (2024). Big data meets LLMs: A new era of incident monitoring. Libertatem Media Private Limited.
13. Panda, S. S. (2024). Delivering scalable cloud services in China: Microsoft and 21Vianet collaboration. International Journal of Advanced Research in Computer Science & Technology, 7(6), 11325–11333.
14. Kumar, R., Mohammed, A. S., & Murthy, C. J. (2023). Cash management forecasting using long short-term memory (LSTM) networks. American Journal of Cognitive Computing and AI Systems, 7, 123–155.
15. Gopinathan, V. R. (2024). AI-driven customer support automation: A hybrid human–machine collaboration model for real-time service delivery. International Journal of Technology, Management and Humanities, 10(1), 67–83.
16. Adari, V. K. (2024). Interoperability and data modernization: Building a connected banking ecosystem. International Journal of Computer Engineering and Technology (IJCET), 15(6), 653–662. https://doi.org/10.5281/zenodo.14219429
17. Ande, B. R. (2025). Federated learning and explainable AI for decentralized fraud detection in financial systems. Journal of Information Systems Engineering and Management, 10(35s), 48–56.
18. Nagarajan, C., & Madheswaran, M. (2011). Stability analysis of series parallel resonant converter with fuzzy logic controller using state space techniques. Electric Power Components and Systems, 39(8), 780–793.
19. Nagarajan, C., & Madheswaran, M. (2011). Performance analysis of LCL-T resonant converter with fuzzy/PID using state space analysis. Electrical Engineering, 93(3), 167–178.
20. Jagadeesh, S., & Soundappan, R. S. (2014). Survey on knowledge discovery in speech emotion detection. International Journal of Innovative Research in Computer and Communication Engineering, 2(5), 4476–4481.
21. Mulla, F. A. (2024). Building scalable mobile applications: A comprehensive guide to shared component architecture. International Journal of Computer Engineering and Technology, 15, 1337–1348.
22. Kiran, A., Rubini, P., & Kumar, S. S. (2025). Comprehensive review of privacy, utility and fairness offered by synthetic data. IEEE Access.
23. Raju, S., & Sindhuja, D. (2024). Transparent encryption for external storage media with mobile-compatible key management by Crypto Ciphershield. PatternIQ Mining, 1(3), 12–24.
24. Charumathi, M. V., & Inbavalli, M. (n.d.). Familiarizing the pine nut oil by fusing it into different food products. PG and Research Department of Foods & Nutrition, Marudhar Kesari Jain College for Women.
25. Ravi Kumar Ireddy. (2024). Real-time payment orchestration and fraud governance framework: Cloud-native treasury optimization with ensemble deep learning integration. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 10(3), 1152–1161. https://doi.org/10.32628/CSEIT25113583
26. Gaddapuri, N. S. (2025). Cloud-Native Twin Systems for Real-Time Risk and Compliance Simulation in FinHealth Converged Ecosystems. ISCSITR-INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND ENGINEERING (ISCSITR-IJCSE)-ISSN: 3067-7394, 6(4), 77-94.
27. Sampath Kumar Konda. (2024). Distributed AI infrastructure orchestration: A hyperscale multi-cloud framework for geographic load balancing with renewable energy optimization. International Journal of Scientific Research in Science Engineering and Technology, 11(4), 522–533. https://doi.org/10.32628/IJSRSET242438
28. Nallamothu, T. K. (2024). Empowering clinicians through AI-augmented documentation: Insights from Dragon Copilot implementation. International Journal of Advanced Research in Computer Science & Technology, 7(6), 11309–11318.
29. Bapatla, S. K. S. (2025). AI-powered physician-insurance data mapping: A case study in reducing revenue leakage. Journal of Computer Science and Technology Studies, 7(7), 550–559.
30. Archana, R., & Anand, L. (2025). Residual U-Net with self-attention based deep convolutional adaptive capsule network for liver cancer segmentation and classification. Biomedical Signal Processing and Control, 105, 107665.
31. Yashwanth, K., Adithya, N., Sivaraman, R., Janakiraman, S., & Rengarajan, A. (2021, July). Design and development of pipelined computational unit for high-speed processors. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1–5). IEEE.
32. Sundareswaran, A. P., Gupta, A., Srinivas, S., Athamakuri, S. S. K. K., Singh, K., & Sharma, R. K. (2025, August). Data Quality Assurance in Cloud-Based Warehousing Systems. In 2025 International Conference on Intelligent and Secure Engineering Solutions (CISES) (pp. 939-944). IEEE.





