Designing and deploying scalable MLOps pipelines for continuous artificial intelligence model training and delivery

Authors

Phanish Lakkarasu
Senior Site Reliability Engineer, Qualys, Foster City, CA 94404 USA

Synopsis

Artificial Intelligence (AI) systems are rapidly being adopted across industries to address important business use cases. Healthcare, financial services, and public safety organizations are increasingly integrating AI into their operations to better enhance decision making, improve business processes, provide superior customer service, ensure compliance to regulations, and reduce costs. AI systems are impacting real business processes; improving early-stage diagnosis of various diseases to ensure timely preventive care, assisting financial advisors to recommend tailored investment plans, predicting violations of laws and rules on social media to enable timely intervention, and helping major retailers optimize their supply chains and improve delivery efficiency. These advantages are resulting in the adoption of AI for mission critical tasks where the cost of failure for the business could be extremely large. For such high stake business applications, the level of explainability and interpretability required from AI systems becomes very high since these businesses are risking core trust and relationship with end customers that have been built over many years (Liang et al., 2024; Mallardi et al., 2024; Jain & Das, 2025).

Due to their high complexity, AI systems cannot be designed, developed, and deployed by a single organization or a small team within an organization. Consequently, AI systems are designed and built across different functional teams within an organization and most often across organizations. Within an organization, different teams are responsible for adhering to specific business objectives around success factors such as accuracy, latency, and prediction pricing while enabling AI services to orchestrate relevant workflows across various teams and data and infrastructure services to integrate data and compute resources at scale. This leads to the rise of a large number of assets, along different stages of an ML's life cycle both in isolation and at scale. All these assets need to be managed holistically in a systematic manner to enable enterprise grade ML delivery which involves an integrated and cohesive process that focuses on training AI and ML models and delivering AI-powered business applications that satisfy preferred business objectives, explainability and interpretability constraints, and accuracy, latency, and agility requirements while managing business risks (Matthew, 2022; Singla, 2023; Slade, 2024).

Downloads

Published

6 June 2025

How to Cite

Lakkarasu, P. . (2025). Designing and deploying scalable MLOps pipelines for continuous artificial intelligence model training and delivery. In Designing Scalable and Intelligent Cloud Architectures: An End-to-End Guide to AI Driven Platforms, MLOps Pipelines, and Data Engineering for Digital Transformation (pp. 28-42). Deep Science Publishing. https://doi.org/10.70593/978-93-49910-08-9_3