Cloud-Native AI Pipelines for Real-Time Cyber Threat Intelligence

Dileep Valiki

doi:10.15662/IJRAI.2024.0706036

Authors

Dileep Valiki Independent Researcher, India Author

DOI:

https://doi.org/10.15662/IJRAI.2024.0706036

Keywords:

Cloud-Native AI Pipelines, Ransomware Detection Systems, Cyber Threat Intelligence (CTI), Real-Time Event Correlation Engines, Deep Learning for Cybersecurity, Online Learning Architectures, Continuous Model Retraining, Cloud Data Lake Analytics, Tactical Event Reconciliation, Distributed Cloud-Native Processing, Malicious Pattern Recognition, Adaptive Threat Detection Models, Open-Source Security Frameworks, Automated Incident Response Systems, Indicator and Trigger-Based Detection, Scalable Microservices Architectures, Continuous Deployment (CD) in AI Systems, Modus Operandi-Based Detection, Feature Engineering in Cybersecurity, Intelligent Security Operations Pipelines

Abstract

Cloud-native pipelines are expected to be able to analyze data in cloud data lakes using a reconciliation process to detect and process new types of tactical events relating to cyber threat intelligence. Data-related, source-related, and service-related considerations and requirements for real-time pipelines that deploy deep learning processes to create Cloud-native AI Pipeline for Ransomware Detection are discussed. The architecture is open-source-based, with a collection of repositories containing several online learning processes using cloud-native AI techniques.

Cyber threat detection is a dynamic process that depends on the existence of detections of malicious event patterns from indicators and triggers for reactive responses by using relevant updated data sources. The technical challenge derives from the need for these detections, supporting indicators, and tactical action responses to be created and activated in a real-time automated fashion. An extensible and scalable event correlation engine has been proposed, capable of detecting malicious events occurring in real time. However, detection of new types of events is hampered by the requirement for expert-driven feature-extraction processes and supervised learning classifiers capable of recognizing only specific threats, limiting the detection space to only those attack types for which the model has been previously trained.

In online learning processes under continuous deployment, the model is continuously trained as new data is ingested in the base, thus enabling the recognition of new types of events as they appear. Distributed cloud-native process architecture enables the continual deployment of adaptation processes into production that can use data stored in cloud data lakes for continuous retraining and adaptation. A new, integrated, end-to-end visual Cloud-native AI Pipeline for Ransomware Detection based on the recognition of the modus operandi of ransomware is implemented. Data-related, source-related, and service-related considerations for real-time pipelines built using cloud-native AI techniques that generate deep learning processes are described.

References

1. Sommer, R., & Paxson, V. (2010). Outside the closed world: On using machine learning for network intrusion detection. IEEE Symposium on Security and Privacy, 305–316.

2. Kushvanth Chowdary Nagabhyru. (2023). Accelerating Digital Transformation with AI Driven Data Engineering: Industry Case Studies from Cloud and IoT Domains. Educational Administration: Theory and Practice, 29(4), 5898–5910. https://doi.org/10.53555/kuey.v29i4.10932

3. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.

4. IT Integration and Cloud-Based Analytics for Managing Unclaimed Property and Public Revenue. (2024). MSW Management Journal, 34(2), 1228-1248.

5. Sendak, M. P., D’Arcy, J., Kashyap, S., et al. (2020). A path for translation of machine learning products into healthcare delivery. EMJ Innovations, 4(1), 94–106.

6. Meda, R. (2023). Intelligent Infrastructure for Real-Time Inventory and Logistics in Retail Supply Chains. Educational Administration: Theory and Practice.

7. Johnson, A. E. W., Pollard, T. J., Shen, L., et al. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.

8. Hripcsak, G., Duke, J. D., Shah, N. H., et al. (2015). Observational Health Data Sciences and Informatics (OHDSI). JAMIA, 22(2), 403–408.

9. Agentic AI in Data Pipelines: Self Optimizing Systems for Continuous Data Quality, Performance and Governance. (2024). American Data Science Journal for Advanced Computations (ADSJAC) ISSN: 3067-4166, 2(1).

10. Kahn, M. G., Callahan, T. J., Barnard, J., et al. (2016). A harmonized data quality assessment framework. eGEMs, 4(1), 1244.

11. Meda, R. (2024). Agentic AI in Multi-Tiered Paint Supply Chains: A Case Study on Efficiency and Responsiveness. Journal of Compu-tational Analysis and Applications (JoCAAA), 33(08), 3994-4015.

12. Adler-Milstein, J., Holmgren, A. J., Kralovec, P., et al. (2017). Electronic health record adoption in US hospitals. Health Affairs, 36(8), 1417–1425.

13. Nagabhyru, K. C. (2024). Data Engineering in the Age of Large Language Models: Transforming Data Access, Curation, and Enterprise Interpretation. Computer Fraud and Security.

14. Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records. Nature Reviews Genetics, 13(6), 395–405.

15. Davuluri, P. N. Integrating Artificial Intelligence into Event-Driven Financial Crime Compliance Platforms.

16. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from clinical text. JAMIA, 15(5), 601–610.

17. Savova, G. K., Masanz, J. J., Ogren, P. V., et al. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES). JAMIA, 17(5), 507–513.

18. Aitha, A. R. (2024). Generative AI-Powered Fraud Detection in Workers' Compensation: A DevOps-Based Multi-Cloud Architecture Leveraging, Deep Learning, and Explainable AI. Deep Learning, and Explainable AI (July 26, 2024).

19. Spackman, K. A., Campbell, K. E., & Côté, R. A. (1997). SNOMED RT. JAMIA, 4(6), 640–649.

20. Davuluri, P. S. L. N. . (2024). AI-Driven Data Governance Frameworks for Automated Regulatory Reporting and Audit Readiness. Metallurgical and Materials Engineering, 30(4), 996–1010. Retrieved from https://metall-mater-eng.com/index.php/home/article/view/1936

21. Mandel, J. C., Kreda, D. A., Mandl, K. D., et al. (2016). SMART on FHIR. JAMIA, 23(5), 899–908.

22. Deep Learning-Driven Optimization of ISO 20022 Protocol Stacks for Secure Cross-Border Messaging. (2024). MSW Management Journal, 34(2), 1545-1554.

23. Weber, G. M., Mandl, K. D., & Kohane, I. S. (2014). Finding the missing link for big biomedical data. JAMIA, 21(1), 1–3.

24. Rongali, S. K., & Kumar Kakarala, M. R. (2024). Existing challenges in ethical AI: Addressing algorithmic bias, transparency, accountability and regulatory compliance.

25. Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare. Health Information Science and Systems, 2, 3.

26. Aitha, A. R. (2023). CloudBased Micro services Architecture for Seamless Insurance Policy Administration. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 607-632.

27. Bates, D. W., Saria, S., Ohno-Machado, L., et al. (2014). Big data in health care. Health Affairs, 33(7), 1123–1131.

28. Shortliffe, E. H., & Sepúlveda, M. J. (2018). Clinical decision support in the era of AI. JAMA, 320(21), 2199–2200.

29. Amistapuram, K. (2024). Generative AI in Insurance: Automating Claims Documentation and Customer Communication. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 15(3), 461–475. https://doi.org/10.61841/turcomat.v15i3.15474

30. London, A. J. (2019). Artificial intelligence and black-box medical decisions. Hastings Center Report, 49(1), 15–21.

31. Varri, D. B. S. (2024). Adaptive and Autonomous Security Frameworks Using Generative AI for Cloud Ecosystems. Available at SSRN 5774785.

32. Price, W. N., & Cohen, I. G. (2019). Privacy in the age of medical big data. Nature Medicine, 25(1), 37–43.

33. Singireddy, J. (2024). AI-Enhanced Tax Preparation and Filing: Automating Complex Regulatory Compliance. European Data Science Journal (EDSJ) p-ISSN 3050-9572 en e-ISSN 3050-9580, 2(1).

34. Choi, E., Schuetz, A., Stewart, W. F., & Sun, J. (2017). Using recurrent neural networks for early detection of heart failure. JAMIA, 24(2), 361–370.

35. Keerthi Amistapuram. (2024). Federated Learning for Cross-Carrier Insurance Fraud Detection: Secure Multi-Institutional Collaboration. Journal of Computational Analysis and Applications (JoCAAA), 33(08), 6727–6738. Retrieved from https://www.eudoxuspress.com/index.php/pub/article/view/3934

36. He, J., Baxter, S. L., Xu, J., et al. (2019). The practical implementation of AI in healthcare. Nature Medicine, 25(1), 30–36.

37. Yu, K.-H., Beam, A. L., & Kohane, I. S. (2018). Artificial intelligence in healthcare. Nature Biomedical Engineering, 2(10), 719–731.

38. Varri, D. B. S. (2023). Advanced Threat Intelligence Modeling for Proactive Cyber Defense Systems. Available at SSRN 5774926.

39. Vinayakumar, R., Alazab, M., Soman, K. P., et al. (2019). Deep learning approach for intelligent intrusion detection system. IEEE Access, 7, 41525–41550.

40. Paleti, S. (2024). Transforming Financial Risk Management with AI and Data Engineering in the Modern Banking Sector. American Journal of Analytics and Artificial Intelligence (ajaai) with ISSN 3067-283X, 2(1).

41. European Parliament. (2016). General Data Protection Regulation (EU) 2016/679. Official Journal of the EU.

42. Kolla, S. K. (2021). Designing Scalable Healthcare Data Pipelines for Multi-Hospital Networks. World Journal of Clinical Medicine Research, 1(1), 1–14. Retrieved from https://www.scipublications.com/journal/index.php/wjcmr/article/view/1376

43. Smith, B., Ashburner, M., Rosse, C., et al. (2007). The OBO Foundry. Nature Biotechnology, 25(11), 1251–1255.

44. Hogan, W. R., Hanna, J., Joseph, E., & Brochhausen, M. (2016). Ontology-based query expansion. JAMIA, 23(2), 286–293.

45. Garapati, R. S. (2023). Optimizing Energy Consumption in Smart Build-ings Through Web-Integrated AI and Cloud-Driven Control Systems.

46. Gandomi, A., & Haider, M. (2015). Beyond big data. International Journal of Information Management, 35(2), 137–144.

47. Inala, R. Revolutionizing Customer Master Data in Insurance Technology Platforms: An AI and MDM Architecture Perspective.

48. Gottimukkala, V. R. R. (2023). Privacy-Preserving Machine Learning Models for Transaction Monitoring in Global Banking Networks. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 633-652.

49. Varri, D. B. S. (2022). A Framework for Cloud-Integrated Database Hardening in Hybrid AWS-Azure Environments: Security Posture Automation Through Wiz-Driven Insights. International Journal of Scientific Research and Modern Technology, 1(12), 216-226.

50. Chapman, W. W., Nadkarni, P. M., Hirschman, L., et al. (2011). NLP in clinical research. JAMIA, 18(5), 544–551.

51. Sheelam, G. K., & Koppolu, H. K. R. (2024). From Transistors to Intelligence: Semiconductor Architectures Empowering Agentic AI in 5G and Beyond. Journal of Computational Analy- sis and Applications(JoCAAA), 33(08), 4518-4537.

52. Sasi Kumar Kolla. (2023). Big Data–Driven Machine Learning Frameworks for Clinical Risk Prediction. International Journal of Medical Toxicology and Legal Medicine, 26(3 and 4), 44–59. Retrieved from https://ijmtlm.org/index.php/journal/article/view/1456.

53. Murphy, S. N., Weber, G., Mendis, M., et al. (2010). Serving the enterprise and beyond with i2b2. JAMIA, 17(2), 124–130.

54. Holmes, J. H., Elliott, T. E., Brown, J. S., et al. (2008). Clinical research networks. Journal of the American Medical Informatics Association, 15(6), 759–766.

55. Fleurence, R. L., Curtis, L. H., Califf, R. M., et al. (2014). Launching PCORnet. JAMIA, 21(4), 578–582.

56. Forrest, C. B., McTigue, K. M., Hernandez, A. F., et al. (2014). PCORnet architecture. Journal of the American Medical Informatics Association, 21(4), 578–582.

57. Uday Surendra Yandamuri. (2023). An Intelligent Analytics Framework Combining Big Data and Machine Learning for Business Forecasting. International Journal Of Finance, 36(6), 682-706. https://doi.org/10.5281/zenodo.18095256

58. El Emam, K., & Arbuckle, L. (2013). Anonymizing health data. O’Reilly Media.

59. Vardhan Kumar Bandi, V. D. (2024). Automated Feature Engineering Systems in Large-Scale Healthcare Data Environments. Journal of Neonatal Surgery, 13(1), 2127–2141. Retrieved from https://www.jneonatalsurg.com/index.php/jns/article/view/10004

60. Kolla, S. H. (2024). RETRIEVAL-AUGMENTED GENERATION WITH SMALL LLMS FOR KNOWLEDGE-DRIVEN DECISION AUTOMATION IN ENTERPRISE SERVICE PLATFORMS. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 15(3), 476–486. https://doi.org/10.61841/turcomat.v15i3.15497.

61. Ross, J. W., Beath, C. M., & Quaadgras, A. (2013). Enterprise architecture. MIS Quarterly Executive, 12(1), 31–45.

62. Guntupalli, R. (2024). Enhancing Cloud Security with AI: A Deep Learning Approach to Identify and Prevent Cyberattacks in Multi-Tenant Environments. Available at SSRN 5329132.

63. Khatri, V., & Brown, C. V. (2010). Designing data governance. Communications of the ACM, 53(1), 148–152.

64. Koppolu, H. K. R., & Sheelam, G. K. (2024). Machine Learning-Driven Optimization in 6G Telecommunications: The Role of Intelligent Wireless and Semiconductor Innovation. Global Research Development (GRD) ISSN: 2455-5703, 9(12).

65. DAMA International. (2017). DAMA-DMBOK2. Technics Publications.

66. Lahari Pandiri, "AI-Powered Fraud Detection Systems in Professional and Contractors Insurance Claims," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2024.121206.

67. Cios, K. J., & Moore, G. W. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1–2), 1–24.

68. Rongali, S. K. (2023). Explainable Artificial Intelligence (XAI) Framework for Transparent Clinical Decision Support Systems. International Journal of Medical Toxicology and Legal Medicine, 26(3), 22-31.

69. Mandl, K. D., & Kohane, I. S. (2015). Data sharing in healthcare. BMJ, 350, h988.

70. Pathak, J., Kho, A. N., & Denny, J. C. (2013). Electronic phenotyping. JAMIA, 20(e2), e178–e183.

71. Inala, R. AI-Powered Investment Decision Support Systems: Building Smart Data Products with Embedded Governance Controls.

72. Chute, C. G., & Pathak, J. (2009). Ontologies and biomedical informatics. Journal of Biomedical Informatics, 42(5), 745–747.

73. Mashetty, S., Challa, S. R., ADUSUPALLI, B., Singireddy, J., & Paleti, S. (2024). Intelligent Technologies for Modern Financial Ecosystems: Transforming Housing Finance, Risk Management, and Advisory Services Through Advanced Analytics and Secure Cloud Solutions. Risk Management, and Advisory Services Through Advanced Analytics and Secure Cloud Solutions (December 12, 2024).

74. Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). NLP overview. JAMIA, 18(5), 544–551.

75. Kolla, S. K. (2021). Architectural Frameworks for Large-Scale Electronic Health Record Data Platforms. Current Research in Public Health, 1(1), 1–19. Retrieved from https://www.scipublications.com/journal/index.php/crph/article/view/1372.

76. Segireddy, A. R. (2024). Machine Learning-Driven Anomaly Detection in CI/CD Pipelines for Financial Applications. Journal of Computational Analysis and Applications, 33(8).

77. Weber, G. M., Murphy, S. N., McMurry, A. J., et al. (2009). The Shared Health Research Information Network. JAMIA, 16(4), 458–466.

78. Guntupalli, R. (2024). AI-Powered Infrastructure Management in Cloud Computing: Automating Security Compliance and Performance Monitoring. Available at SSRN 5329147.

79. Friedman, C. P., Wong, A. K., & Blumenthal, D. (2010). Achieving a nationwide learning health system. Science Translational Medicine, 2(57), 57cm29.

80. Nagubandi, A. R. (2023). Advanced Multi-Agent AI Systems for Autonomous Reconciliation Across Enterprise Multi-Counterparty Derivatives, Collateral, and Accounting Platforms. International Journal of Finance (IJFIN)-ABDC Journal Quality List, 36(6), 653-674.

81. Velangani Divya Vardhan Kumar Bandi. (2024). Intelligent Data Platforms For Personalized Retail Analytics At Scale. Metallurgical and Materials Engineering, 30(4), 1011–1027. Retrieved from https://metall-mater-eng.com/index.php/home/article/view/1011-1027

82. Liaw, S.-T., Rahimi, A., Ray, P., et al. (2013). Towards an ontology for data quality. Journal of Biomedical Informatics, 46(1), 80–92.

83. Keerthi Amistapuram. (2023). Privacy-Preserving Machine Learning Models for Sensitive Customer Data in Insurance Systems. Educational Administration: Theory and Practice, 29(4), 5950–5958. https://doi.org/10.53555/kuey.v29i4.10965

84. Rector, A. L., Rogers, J., & Taweel, A. (2006). Ontological foundations. Methods of Information in Medicine, 45(S1), 65–72.

85. Chava, K. (2024). The Role of Cloud Computing in Accelerating AI-Driven Innovations in Healthcare Systems. European Advanced Journal for Emerging Technologies (EAJET)-p-ISSN 3050-9734 en e-ISSN 3050-9742, 2(1).

86. Belle, A., Thiagarajan, R., Soroushmehr, S. M. R., et al. (2015). Big data analytics in healthcare. BioMed Research International, 2015, 370194.

87. Siva Hemanth Kolla. (2023). Deep Learning–Driven Retrieval-Augmented Generation for Enterprise ITSM Automation: A Governance-Aligned Large Language Model Architecture . Journal of Computational Analysis and Applications (JoCAAA), 31(4), 2489–2502. Retrieved from https://www.eudoxuspress.com/index.php/pub/article/view/4774

88. Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. ACM SIGMOD Record, 29(2), 439–450.

89. Rongali, S. K. (2024). Federated and Generative AI Models for Secure, Cross-Institutional Healthcare Data Interoperability. Journal of Neonatal Surgery, 13(1), 1683-1694.

90. Heitmueller, A., Henderson, S., Warburton, W., et al. (2014). Developing public trust in health data. Journal of Medical Internet Research, 16(2), e54.

91. Bandi, V. D. V. K. (2023). Production-Grade Machine Learning Pipelines For Healthcare Predictive Analytics. South Eastern European Journal of Public Health, 189–205. Retrieved from https://www.seejph.com/index.php/seejph/article/view/7057

92. AI and ML-Driven Optimization of Telecom Routers for Secure and Scalable Broadband Networks. (2024). MSW Management Journal, 34(2), 1145-1160.

93. Yandamuri, U. S. AI-Driven Decision Support Systems for Operational Optimization in Hospitality Technology.

94. Apruzzese, G., Colajanni, M., Ferretti, L., et al. (2018). On the effectiveness of machine and deep learning for cyber security. IEEE International Conference on Cyber Conflict, 371–390.