Best practices in building secure, compliant, and resilient cloud-native architectures for artificial intelligence workloads
Synopsis
Cloud-native architectures are characterized as a strategy for building and running applications that exploits the advantages of the cloud computing delivery model. Using cloud-native technologies, developers and organizations can build and run scalable applications in an environment that overcomes the constraints of a traditional data center and of proprietary technology stacks. Today, we can give even stronger definitions and more accurate terms to characterize these architectures, with even increased levels of common agreement. In particular, we posit that cloud-native architectures are distributed application architectures, involving a set of usual patterns such as service-oriented or microservices design, API-first design, developed using standard protocols and data in a bi-modal approach that exploits the co-evolution of both shared, common data and distributed, specifically-oriented data structures. This said and done, the landscape of cloud-native development is quickly changing, with the introduction of several enabling technologies such as containers and container orchestration systems. Indeed, during the last ten years we have witnessed an increasing democratization of cloud services, with the proliferation of several types of first-party cloud resources, from serverless computers to managed data storage and machine learning services. On the other hand, we witness the increased availability of third-party solutions from the community or third-party vendors that enrich the cloud ecosystem. These evolutions are affecting the motivations and drivers for organizations to adopt cloud-native development and have been pushing to the increasing convergence of cloud-native development practices. By adopting an increased number of cloud services or of third-party products, organizations are actually reducing the implementation and operational burdens required for building cloud-native architectures (Celeste & Michael, 2021; Bauskar, 2025; Jorepalli, 2025).
Consequently, both types of applications deployed in cloud-native architectures must take heightened care of special requirements for resiliency, performance, and security due to their differences from typical enterprise workloads. Different types of AI workloads such as AI training and validation, and AI inferencing deployed in edge and fulfillment applications must undergo more scrutiny around resilience, security, and compliance due to their higher risk factor association (Theodoropoulos et al., 2023; Ugwueze, 2024)