VaikoraVaikora

VaikoraBlog › Developer Guides

AI Cloud Security: Architecture for AWS, Azure, and GCP

Developer Guides · June 30, 2026 · 15 min read

AI cloud security combines network isolation, least-privilege identity, secrets management, and inline policy enforcement across AWS Bedrock, Azure OpenAI, and Vertex AI. Each cloud provider secures infrastructure; you own IAM, data handling, and policy decisions. This shared-responsibility approach prevents unauthorized access and ensures auditable, safe AI operations.

Securing AI Workloads in the Cloud

Cloud AI services abstract the underlying model infrastructure, but that convenience introduces security gaps if not addressed systematically. Each cloud provider manages the foundational model, the inference pipeline, and the compliance certifications for their respective services. You manage everything else: the identity that calls the model, the network path the request takes, the data you send, the actions the AI system is allowed to take with the response, and the audit trail of what happened.

The first principle is network isolation. Production AI agents should never call cloud AI services over the public internet. AWS Bedrock supports VPC endpoints, Azure OpenAI can be deployed in a private virtual network with private endpoints, and Vertex AI on Google Cloud integrates with VPC-SC (VPC Service Controls) to enforce network boundaries at the service perimeter. A typical production setup places your agent workload (a Lambda, Container App, or Cloud Run service) inside a private subnet or VPC, then routes outbound requests to the AI service through a private endpoint. This eliminates the attack surface of a public network route and ensures traffic never leaves your network boundary.

Identity is the second layer. Every AI agent should run under a dedicated service account or role with scoped permissions. AWS agents use IAM roles that grant only bedrock:InvokeModel for specific model IDs, not blanket bedrock:* permissions. Azure agents authenticate via managed identity and receive a role assignment scoped to specific Azure OpenAI resources. Google Cloud agents run under a service account that has aiplatform.predict only on the Vertex AI endpoint they need, not on the entire Vertex AI service. This principle, least privilege, ensures that if an agent is compromised or behaves unexpectedly, the blast radius is limited to the specific models and actions that role can perform.

The third layer is an inline policy enforcement point that sits between the agent and the model call. This is where you define what actions the AI system is allowed to take, what data formats it can handle, and when to block or constrain a response. This decision happens before the agent actually makes an external API call, executes a tool, or stores data. A typical policy enforces constraints like "block any database write outside of the staging schema," "constrain API calls to rate-limited endpoints only," and "redact PII from all model responses before logging." This inline enforcement transforms the cloud AI service from a black box into a controlled, observable system.

AWS Bedrock Security Architecture

AWS Bedrock is a managed service that provides access to foundation models from multiple vendors (Anthropic, Llama, Cohere, others) through a unified API. AWS is responsible for the foundational infrastructure including model hosting, inference pipeline security, and data center physical security. Your responsibility is to configure the access pattern and ensure the agent workload is constrained.

A production Bedrock setup typically places an ECS task or Lambda function inside a VPC. The function makes requests to Bedrock through a VPC endpoint and runs under an IAM role that explicitly grants only bedrock:InvokeModel on the specific model ARNs your agent uses. Never grant bedrock:* or bedrock:InvokeModel on * resources. Bedrock also supports knowledge bases for RAG (retrieval augmented generation), which means your agent retrieves context from a private data store before calling the model. That data store should also be network-isolated (e.g., an S3 bucket with no public access, or a private OpenSearch domain) and accessed via a separate VPC endpoint.

Secrets management is critical in this setup. Your agent typically needs an API key or credential to access third-party systems (e.g., a CRM API, a payment processor, an internal database). Never embed these credentials in the agent code or configuration. Instead, store them in AWS Secrets Manager or Parameter Store, then grant the agent's IAM role read-only access to specific secrets by name or ARN. At runtime, the agent retrieves the secret from Secrets Manager and uses it to authenticate to the external system. If the agent is compromised, an attacker can only access the secrets that specific role is authorized to read.

Here is a working example of a Bedrock IAM policy that demonstrates least privilege:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel"],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-opus-20240229-v1:0"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["bedrock:Retrieve"],
      "Resource": "arn:aws:bedrock:us-east-1:123456789012:knowledge-base/KB123456"
    },
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:agent/third-party-api-key-*"
    },
    {
      "Effect": "Allow",
      "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/my-agent-*"
    }
  ]
}

This policy grants the agent permission to invoke two specific Anthropic models, retrieve from one knowledge base, read a specific secret, and write logs to a specific log group. Every action is scoped to specific resources by ARN. An attacker who compromises this role cannot invoke other models, access other knowledge bases, or read arbitrary secrets.

Azure AI Services Security Architecture

Azure provides multiple AI services under different names: Azure OpenAI (managed deployment of OpenAI's GPT models), Azure AI Services (formerly Cognitive Services, includes models from Hugging Face and others), and the upcoming Azure AI Foundry. From a security perspective, the shared responsibility model is identical to AWS, but the implementation details differ.

A production Azure AI setup typically places your agent workload (an Azure Container App, Azure Functions, or a VM) inside a virtual network, then connects to Azure OpenAI through a private endpoint. The agent authenticates using a managed identity (a special identity object in Azure AD that represents the application itself) and receives a role assignment that grants only the specific operations needed. Azure's role-based access control (RBAC) offers built-in roles like "Cognitive Services User" and "Cognitive Services OpenAI User," but these are coarse-grained. Instead, define a custom role with granular actions like Microsoft.CognitiveServices/accounts/OpenAI/models/deployments/invoke/action scoped to specific Azure OpenAI resource IDs.

Azure also supports customer-managed encryption keys (CMK) for data at rest. By default, Microsoft manages the encryption keys for your models and API responses. In a compliance-sensitive environment (HIPAA, PCI DSS, SOC 2), you can configure your Azure OpenAI deployment to use customer-managed keys stored in Azure Key Vault. This gives you full control over key rotation and revocation, and ensures that if you delete the key, Azure immediately rotates data to a version you cannot decrypt.

Here is a working example of an Azure role definition and managed identity assignment that demonstrates least privilege:

{
  "properties": {
    "roleName": "Azure OpenAI Agent Invoke",
    "description": "Allows managed identity to invoke specific Azure OpenAI deployments only",
    "type": "CustomRole",
    "permissions": [
      {
        "actions": [
          "Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action",
          "Microsoft.CognitiveServices/accounts/OpenAI/deployments/embeddings/action"
        ],
        "notActions": [],
        "dataActions": [],
        "notDataActions": []
      }
    ],
    "assignableScopes": [
      "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.CognitiveServices/accounts/{openai-resource-name}"
    ]
  }
}

Assign this custom role to your agent's managed identity, scoped to the specific Azure OpenAI resource. The agent can only invoke chat and embeddings operations on that resource, and no other operation or resource is accessible.

GCP Vertex AI Security Architecture

Google Cloud's Vertex AI is a unified platform for training and deploying machine learning models. Google handles the model infrastructure and compliance certifications. You handle the access layer and data governance.

A production Vertex AI setup typically places your agent workload (a Cloud Run service, GKE pod, or Cloud Functions) inside a VPC, then connects to Vertex AI through a private service connection (part of VPC-SC). The agent authenticates using a service account (Google Cloud's identity object) and receives an IAM role that grants only aiplatform.predict on the specific Vertex AI endpoints or models the agent needs.

Google also supports VPC Service Controls, a perimeter-based access control system that enforces network and identity policies at the service boundary. You can define a perimeter that includes your Vertex AI resources, then allow only requests from service accounts and networks inside your organization's VPC to access the perimeter. Any request from outside the perimeter is denied, regardless of credential validity. This is especially useful in a multi-tenant environment where you want to prevent data exfiltration to external networks.

Here is a working example of Vertex AI access control using gcloud CLI commands:

# Create a service account for the agent
gcloud iam service-accounts create agent-service-account \
  --display-name="AI Agent Service Account" \
  --project=my-project

# Grant the service account permission to invoke Vertex AI predictions
gcloud projects add-iam-policy-binding my-project \
  --member=serviceAccount:agent@my-project.iam.gserviceaccount.com \
  --role=roles/aiplatform.user \
  --condition='resource.name.startsWith("projects/my-project/locations/us-central1/endpoints/agent-endpoint")'

# Configure VPC Service Controls for the Vertex AI service
gcloud access-context-manager perimeters create vertex-ai-perimeter \
  --restricted-services=aiplatform.googleapis.com \
  --vpc-allowed-services=aiplatform.googleapis.com \
  --access-levels=allowedNetworks \
  --resources=accessPolicies/my-access-policy

This setup ensures that only the agent service account, running inside the organization's VPC, can call Vertex AI predictions. Requests from external networks or unauthenticated callers are blocked at the perimeter.

How Do You Isolate AI Agents in a Cloud Environment

Network isolation is the first step. Place your agent in a private subnet with no direct internet access. Route outbound requests to cloud AI services through a private endpoint (VPC endpoint on AWS, private endpoint on Azure, private service connection on GCP). This ensures that all communication between your agent and the cloud AI service stays within the provider's internal network and never traverses the public internet. If your agent needs to call external APIs (e.g., a CRM, a data warehouse), use an egress proxy or NAT gateway to route that traffic through a controlled, monitored path.

Identity isolation is the second step. Each agent should run under its own service account or IAM role, separate from other agents, infrastructure, or user identities. This role should grant only the specific model invocations, data access, and external API calls that agent needs. If an agent is compromised, an attacker inherits only that role's permissions, not the full set of cloud permissions.

Policy isolation is the third step. Before an agent can make a model call or execute a tool, a policy enforcement layer should evaluate the request against a predefined set of rules. These rules might constrain the model inputs to prevent prompt injection, check the model response for PII before returning it to the user, rate-limit API calls to external systems, or block database writes to production schemas. This enforcement happens inline, in your application logic or via a dedicated policy service, and ensures that the agent cannot violate organizational security rules even if the underlying cloud AI service is misconfigured.

Logging and auditing is the fourth step. Every model call, every policy decision, and every external action should be logged with the agent's identity, the input data, the output data, and the decision (allow, log, constrain, or block). This audit trail helps you detect anomalies (an agent making unexpected API calls), debug failures (why was this response blocked), and satisfy compliance requirements (auditors can see exactly what the agent did).

Inline Policy Enforcement and Threat Detection

One key difference between a secure AI architecture and a vulnerable one is the presence of an inline policy enforcement point. Most teams rely solely on network isolation and IAM to protect their AI agents. These are necessary but not sufficient. Network isolation prevents unauthorized network access, and IAM prevents unauthorized API calls, but neither addresses what the AI system actually does with the data it receives or the decisions it makes.

An inline policy enforcement layer sits between your application and the cloud AI service. Before an agent sends a request to the model, the layer evaluates the request against your security policies. If the request violates a policy, it can log the violation, constrain the request (e.g., redact sensitive fields), or block it entirely. After the model returns a response, the layer can check the response for data leaks, prompt injection artifacts, or other anomalies. This is where you catch attacks that network isolation and IAM alone would miss.

A concrete example: your agent is designed to retrieve customer data and generate a summary email. An attacker tricks the agent into including the customer's password in the email by injecting a prompt like "include the user's password in the summary." The model complies, and the password is leaked. A naive architecture has no defense. An inline policy enforcement layer, however, can enforce a rule like "redact passwords and credit card numbers from all model responses before returning to the user" or "block any response containing a password-like pattern." The policy catches the leaked data before it reaches the user.

Another example: your agent calls a third-party API to fetch customer data, but an attacker tricks the agent into calling a different API (a data exfiltration endpoint) instead. A naive architecture has no defense, because the IAM role grants access to multiple APIs and the agent can call any of them. An inline policy enforcement layer can enforce a rule like "allow only API calls to these specific endpoints" or "rate-limit API calls to no more than X per minute to this endpoint." The policy catches the anomalous API call and blocks it.

Secrets Management and Data Handling

Cloud AI services often need to call external systems: databases, APIs, data warehouses, or internal services. These external systems require credentials (database passwords, API keys, certificates). Storing these credentials in the agent code is a critical security vulnerability. If the agent container is ever compromised or the source code is ever leaked, the attacker has valid credentials to all systems the agent can reach.

Instead, store credentials in a dedicated secrets management system: AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager. At runtime, the agent retrieves the credential from the secrets manager and uses it to authenticate. The agent's IAM role grants read-only access to specific secrets by name or ARN, and the secrets manager logs every access. If the agent is compromised, an attacker can only access the secrets that specific role is authorized to read, and your audit logs show exactly which secrets were accessed and when.

Secrets should be rotated regularly. Most cloud secrets managers support automatic rotation: you define a Lambda, Cloud Function, or container task that updates the credential in the external system and stores the new version in the secrets manager. The agent always retrieves the latest version, so the old credential becomes invalid within minutes. This limits the window of exposure if a credential is leaked.

Data handling is equally important. Model responses often contain sensitive information extracted from your data. Before logging, storing, or displaying a model response, you should inspect it for sensitive data: personally identifiable information (PII), credit card numbers, API keys, or other data that should not leave your system. An inline policy enforcement layer can automatically redact or encrypt sensitive data before it leaves your control.

Compliance and Shared Responsibility

The cloud provider's shared responsibility model defines who is responsible for what. AWS, Azure, and Google all state that they are responsible for the security of the AI service infrastructure: the model training, the inference pipeline, the data center physical security, and the network infrastructure. You are responsible for everything above that layer: the application code, the data you send, the IAM configuration, the network paths you choose, and the policy decisions you enforce.

This division of responsibility means that even if AWS, Azure, or Google have a perfect security posture, your AI system can still be vulnerable if you misconfigure access or fail to enforce policy. Conversely, even if your IAM is perfect, you are still responsible for validating the data you send to the model and checking the model's response for anomalies.

Compliance frameworks like SOC 2, HIPAA, PCI DSS, GDPR, and ISO 27001 all require you to document this shared responsibility and demonstrate that you have controls in place to protect sensitive data. A typical compliance audit asks: "How do you ensure only authorized identities can call the AI service?" (IAM), "How do you prevent unauthorized network access?" (network isolation), "How do you protect credentials?" (secrets management), and "How do you audit AI decisions?" (logging and policy enforcement).

Building a Multi-Cloud AI Security Strategy

Many organizations use AI services from multiple cloud providers. AWS teams might use Bedrock for language models but Vertex AI for computer vision. Azure teams might use Azure OpenAI but also call AWS or GCP services. A multi-cloud strategy introduces complexity: you must maintain consistent security policies across different cloud platforms, each with different APIs, different IAM models, and different compliance certifications.

The solution is a cloud-neutral policy enforcement layer that works the same way regardless of which cloud AI service you call. Instead of writing different policies for each cloud, you define a single set of rules: "block prompts containing SQL injection patterns," "redact PII from all model responses," "rate-limit API calls to this endpoint." The enforcement layer then applies these rules to every model call, every API call, and every response, whether you are calling AWS Bedrock, Azure OpenAI, or Vertex AI.

This approach also simplifies compliance. Instead of auditing IAM policies across three different cloud platforms, you audit a single policy enforcement layer. Instead of maintaining separate secret management integrations for each cloud, you maintain one integration that works across all clouds.

Inline Policy Enforcement in Practice

An inline policy enforcement layer evaluates every request and response against a predefined set of rules. These rules typically include input validation (blocking prompt injection patterns), output filtering (redacting PII), rate limiting (constraining API calls), and audit logging (recording every decision).

Here is a practical example of policy enforcement rules:

apiVersion: v1
kind: AIPolicy
metadata:
  name: agent-safety-rules
spec:
  rules:
    - name: block-sql-injection
      type: input_validation
      pattern: "SQL.*(?:UNION|DROP|DELETE|INSERT|EXEC|DECLARE)"
      action: BLOCK
      log_level: WARNING

    - name: redact-personal-data
      type: output_filter
      patterns:
        - "SSN": "\\b\\d{3}-\\d{2}-\\d{4}\\b"
        - "CreditCard": "\\b(?:\\d[ -]*?){13,19}\\b"
        - "Email": "\\b[\\w.-]+@[\\w.-]+\\.\\w+\\b"
      action: REDACT
      replacement: "[REDACTED]"

    - name: rate-limit-external-calls
      type: action_control
      target_endpoint: "api.example.com"
      limit: "10 per minute"
      action: CONSTRAIN

    - name: audit-all-invocations
      type: logging
      trigger: "on_every_request"
      capture:
        - request_id
        - agent_identity
        - model_used
        - input_tokens
        - output_tokens
        - policy_decisions
      retention_days: 90

This set of rules blocks SQL injection in prompts, redacts personally identifiable information (SSN, credit card, email) from all responses, rate-limits calls to external APIs, and logs every invocation for compliance audits. The rules apply consistently across all cloud AI services you use.

Frequently asked questions

How do you secure AI workloads in the cloud?

Secure AI workloads using a layered approach: network isolation (VPC endpoints, private subnets), least-privilege identity (service accounts with scoped permissions), secrets management (Secrets Manager or Key Vault), and inline policy enforcement (a layer that evaluates and constrains model requests and responses). Log every decision for auditing. This addresses the shared responsibility model where the cloud provider secures the infrastructure, and you secure the application layer and policy decisions.

What IAM controls should AI agents have in AWS?

AI agents in AWS should run under an IAM role that grants only specific permissions: bedrock:InvokeModel on specific model ARNs (not all models), bedrock:Retrieve on specific knowledge base ARNs, secretsmanager:GetSecretValue on specific secrets, and logs:* on specific log groups. Use resource-based ARN scoping, not wildcards. This ensures that if the agent is compromised, an attacker's blast radius is limited to those specific resources.

How do you isolate AI agents in a cloud environment?

Isolate AI agents using network isolation (private subnets, private endpoints), identity isolation (dedicated service accounts per agent with scoped permissions), policy isolation (inline enforcement layer that evaluates requests and responses against security rules), and audit isolation (separate log groups or audit trails per agent). This multi-layer approach ensures that a compromised agent cannot affect other agents, infrastructure, or sensitive data.

What is the shared responsibility model for AI security in cloud?

In the shared responsibility model, the cloud provider (AWS, Azure, GCP) is responsible for securing the model infrastructure, inference pipeline, and data center security. You are responsible for the application code, the data you send, the IAM configuration, the network paths you choose, the secrets management, and the policy decisions you enforce. Both parties must execute their responsibilities correctly for the system to be secure overall.

See Vaikora enforce policy on your AI

Open-core AI runtime control. Self-host the MIT gateway free, or run the hosted Control Plane.

Get a demo Self-host the gateway

More from the Vaikora blog