Vaikora › Blog › Developer Guides
Zero Trust for AI Agents: Architecture and Implementation
Zero trust for AI means treating every tool call and external action from an AI agent as an access request that must be authenticated, authorized, and inspected in real time. Instead of trusting an agent because it's deployed inside your network, you assume the agent or its model could be compromised and verify every decision at runtime. This applies NIST SP 800-207 zero trust principles directly to AI systems: verify explicitly, grant least privilege, assume breach, and inspect continuously. The result is runtime enforcement of policy at the agent-to-tool boundary, where each action is denied by default unless an explicit policy permits it.
The Problem: Why Traditional Trust Models Fail for AI
AI agents operate differently from traditional applications. A traditional web service runs code you wrote and deployed. An AI agent runs code you wrote, but its behavior depends on model outputs you did not write, cannot fully predict, and cannot replay. The model responds to prompts and user input in ways that often surprise developers.
This gap between predictable application code and probabilistic model output breaks traditional trust models. You cannot firewall an LLM the way you firewall a database. You cannot assume an agent will stay inside its intended scope because a user prompt or injected input can shift its reasoning in milliseconds.
The attack surface is broad: prompt injection (an attacker embeds instructions in user input that override your system prompt), jailbreaks (an attacker uses conversational techniques to make the model ignore its constraints), hallucination attacks (an attacker exploits the model's tendency to fabricate information), and tool misuse (an attacker tricks the agent into calling tools with dangerous parameters). A compromised API key inside a tool call can leak to the model. A model fine-tuned by an adversary can deliberately bypass guardrails.
Traditional security assumes you control the code. AI security assumes you don't fully control the behavior.
Zero Trust Principles for AI
NIST SP 800-207 defines zero trust around five core tenets. Applied to AI agents, they mean:
Verify explicitly. Every tool call must prove its identity and legitimacy before execution. The agent must authenticate itself. The requested action must be validated against what the agent is supposed to do. The tool must confirm the request is authentic.
Grant least privilege. An agent should have access only to the tools and data it absolutely needs for its task. If an agent is built to summarize documents, it should not have permission to delete files or modify configurations. If an agent can read customer data, it should not be able to read competitor data.
Assume breach. Assume the model weights could be manipulated, the prompt could be injected, or the runtime could be exploited. Design the system so that a single compromise does not cascade. If one tool call is malicious, it should be caught before execution, not after. If a model outputs a dangerous instruction, that instruction should be blocked at the policy layer, not reach the tool.
Continuous verification. One-time authentication is not enough. Every action must be re-evaluated in context. An agent's permissions might change between tool calls. Policies might be updated. New threat signals might emerge. The system must check policy on every request, not just the first.
Inspect and log everything. Every decision the system makes to allow or deny an action must be recorded, auditable, and tied to the specific agent, model, user, and request that triggered it.
Mapping Zero Trust to AI Architecture
Zero trust for AI requires decision points at every AI-to-system boundary.
Identity and authentication. The agent must have a verifiable identity. This is typically an API key, a JWT token, or an OAuth credential that the agent uses to identify itself to the tools it wants to call. The identity should be scoped: an agent running a report-generation task should have a different identity than an agent running a data-deletion task. Implement identity by having the orchestration layer (the code that calls the LLM and routes its outputs to tools) sign each tool request with the agent's credentials.
Authorization and policy. Before a tool call executes, a policy engine must evaluate whether the agent's identity should be allowed to make that specific call. Policies are written in a declarative language. A simple policy might say "the reporting-agent can call the fetch-sales-data tool between 8am and 6pm, Monday through Friday, and only with query parameters that match the pattern ^[a-z0-9_]+$." A strict policy might say "all tool calls require an explicit approve-then-execute flow: the system logs the request, a human approves it, and only then does the tool run." Policies are evaluated at the control plane before the tool is invoked.
Runtime inspection. Between the policy decision and the tool execution, the system must inspect the actual parameters the agent is about to send. Does the request match the policy? Does the parameter data look like a prompt injection? Does the size and structure of the request deviate from normal patterns? If inspection finds a problem, the request is blocked, logged, and (optionally) escalated to a human.
Audit and decision records. Every allow/block decision is recorded immediately in an immutable ledger with the full context: timestamp, agent identity, tool, parameters, policy rule, inspection result, human decision (if applicable), and outcome. This ledger is the ground truth for "what did the system allow and why" and is essential for compliance, incident response, and model tuning.
Implementation: A Deny-by-Default Policy Model
The simplest zero trust policy for AI is deny-by-default: every tool call is denied unless an explicit rule permits it.
Here is how you express this in a policy language:
# Zero trust policy for an AI agent
agent_id: report-generator-v1
default_action: DENY
rules:
- name: "read_sales_data"
tool: fetch_sales_data
action: ALLOW
conditions:
- parameter: query_type
match: "^(monthly|quarterly|annual)$"
- parameter: date_range_days
max_value: 90
- time_of_day:
start: "09:00"
end: "17:00"
timezone: "America/New_York"
- name: "read_customer_list"
tool: fetch_customer_names
action: ALLOW
conditions:
- parameter: limit
max_value: 1000
- name: "write_report"
tool: save_report
action: ALLOW
conditions:
- parameter: file_path
match: "^/reports/[a-z0-9_-]+\\.pdf$"
- parameter: file_size_bytes
max_value: 10485760 # 10 MB
- name: "delete_operations"
tool: delete_file
action: DENY
conditions: []
- name: "high_risk_api_call"
tool: ".*"
action: CONSTRAIN
conditions:
- threat_score: "high"
enforcement: "require_human_approval"
# Enforcement
enforcement:
- decision: ALLOW
action: execute_tool
- decision: CONSTRAIN
action: "log_and_queue_for_approval"
- decision: DENY
action: "log_and_return_error"
- decision: LOG
action: "record_but_allow"
In this policy:
- DENY is the default. Unless a rule explicitly permits a tool call, it is rejected.
- Conditions are conjunctive. All conditions in a rule must match for the rule to apply. If a condition fails, the action does not apply.
- ALLOW executes immediately. If a rule matches and the action is ALLOW, the tool is called.
- CONSTRAIN gates the tool call. The request is logged, queued for human approval, and the agent receives a response indicating the tool requires review.
- DENY blocks silently or with an error. The agent is told the tool is unavailable (from the agent's perspective, the policy is invisible).
- LOG records but permits. Useful for monitoring new patterns without yet enforcing them.
This policy structure gives you several enforcement options. In strict environments (healthcare, financial, government), you might set most tools to ALLOW but high-risk tools to CONSTRAIN, so a human approves dangerous operations before they execute. In permissive environments, you might set most tools to ALLOW and use LOG for monitoring.
Continuous Verification in Practice
Deny-by-default is a starting point, not the finish line. Zero trust requires continuous verification throughout the agent's operation.
Re-evaluate on each request. Every tool call, not just the first, is checked against the current policy. If a policy changes between tool calls (an administrator adds a new rule or revokes a permission), the next tool call is evaluated under the new policy.
Context-aware decisions. The policy engine should know not just what the agent wants to do, but why and when. Did the user who triggered the agent have permission to request this action? Is the agent operating outside its normal time window (middle of the night, during a holiday)? Has the agent made an unusual number of calls in a short time? Policies can reference these signals.
# Context-aware policy rule
- name: "bulk_export_requires_approval"
tool: export_customer_data
action: CONSTRAIN
conditions:
- parameter: export_size_rows
min_value: 10000
- user_permission: "data_export"
must_have: true
- time_context:
unusual_hour: true
or_call_count_exceeds:
window_minutes: 60
threshold: 5
enforcement: "require_human_approval_and_log"
Threat detection integration. As the policy engine evaluates each request, it can run threat-detection checks in parallel: is the tool call attempting a prompt injection, a SQL injection, a path traversal? Does the parameter contain PII that the agent should not be able to access? Is the request signature consistent with the agent's normal pattern, or is it anomalous? If any check flags a threat, the action is constrained or denied.
Behavior analysis. Over time, the system learns what normal looks like for each agent. Normal for a reporting agent is low-volume, predictable queries. Normal for a customer-service agent is varied queries with high concurrency. If an agent's behavior suddenly changes (a reporting agent makes 100 calls in 10 minutes, or requests data it never asks for), the policy engine escalates those requests for review.
Role-Based and Attribute-Based Access Control for Agents
Traditional applications use role-based access control (RBAC): a user has a role (admin, editor, viewer), and the role determines what the user can do. AI agents need something similar, but with a twist: the agent's permissions might depend not just on the agent's role, but on the user who invoked it, the task being executed, and the data being accessed.
Attribute-based access control (ABAC) extends RBAC by evaluating not just roles, but attributes. An agent might have permission to read sales data (a role-based permission), but only for regions the invoking user has access to (an attribute-based condition).
# ABAC rule for an agent
- name: "read_regional_sales"
tool: fetch_sales_data
action: ALLOW
conditions:
- agent_role: "reporting"
has_permission: "sales_read"
- invoking_user_attribute: "region"
matches_parameter: "region"
- data_classification: "internal"
is_less_than_or_equal: "user_clearance"
This rule says: the agent can call fetch_sales_data only if the agent has the reporting role and sales_read permission, and the region in the tool call matches a region the invoking user is assigned to, and the data classification does not exceed the user's clearance level. If any condition fails, the tool call is denied.
Audit Trails and Compliance
Zero trust for AI is not complete without a persistent, tamper-proof audit trail. Every decision is recorded at the moment it is made, signed cryptographically, and appended to an immutable log.
What to log:
- Request metadata. Timestamp, agent identity, user identity, request ID, originating system.
- Tool call details. Tool name, full parameters, size, hashes (to detect tampering).
- Policy evaluation. Which policy rule matched (or why no rule matched), all conditions evaluated and their results, final decision (ALLOW/DENY/CONSTRAIN/LOG).
- Threat signals. Any threat detection results, anomaly scores, behavior analysis findings.
- Execution outcome. Did the tool run? If so, did it succeed? What did it return? Any errors?
- Approval details (if applicable). Who approved, when, comments, duration to approval.
The log itself should be signed and stored in a system that prevents retroactive modification. For regulated industries (healthcare, financial, government), the audit trail is often a compliance requirement. For everyone, it is the evidence you need when something goes wrong.
Zero Trust vs. Traditional Perimeter Security for AI
The old model was: "We control the network. We control the servers. We deployed the code. Therefore, we trust it." Behind the firewall, you might trust all local services without checking every request.
The AI model says: "We deployed this code, but it calls a model we did not train. The model's outputs are probabilistic, not deterministic. We do not trust the model without evidence. We trust nothing by default." Every action, internal or external, is checked.
The practical difference is enforcement density. Traditional security might check once at the network perimeter and then trust everything inside. Zero trust for AI checks at every agent-to-tool boundary and continuously re-verifies. It trades latency (every tool call has a tiny overhead) for certainty (you know exactly what actions are happening and why).
For AI, that trade is worth it. The attack surface is broader, the attacker's reach is deeper, and the cost of misunderstanding what the model will do is higher.
How Vaikora Helps
Vaikora is a runtime control platform for AI agents. It sits between your agent and your tools as a policy enforcement point. When your agent wants to call a tool, it goes through Vaikora first.
Vaikora evaluates the tool call against deny-by-default policies you define. It inspects the parameters for threats (prompt injection, data exfiltration, anomalies). It checks the agent's identity and current permissions. It logs every decision to a tamper-proof audit chain. If a tool call is allowed, it passes through; if it is denied or flagged for review, Vaikora blocks it and logs why.
Vaikora's open-core components (the LLM gateway and MCP server) are MIT-licensed and can be self-hosted. The commercial Control Plane adds pre-built compliance policies (SOC 2, HIPAA, GDPR, PCI DSS, ISO 27001), an approvals queue for constrained actions, and continuous threat detection.
Getting Started with Zero Trust for AI
Start with an inventory. Map all the tools your agents can call, all the data those tools access, and the actions those tools perform. Rank them by risk: deletion is high-risk, reading is lower-risk, and writing depends on what is being written.
Write deny-by-default policies. For each agent, write a policy that denies everything by default and explicitly allows only the tools and parameters it needs. Start permissive (e.g., "allow monthly sales reports between 9am and 5pm") and tighten as you learn.
Implement runtime inspection. Add a policy engine between your agent and your tools. Evaluate each tool call against the policy. Log the decision. If a threat is detected, constrain or deny.
Collect audit trails. Every decision is logged with full context. Use the logs to understand what is happening, to debug policy rules, and to meet compliance requirements.
Monitor and improve. Watch the logs. Are policies blocking legitimate requests? Are they too permissive? Are threat-detection rules catching real attacks? Adjust the policies and rules based on evidence.
Frequently asked questions
What is zero trust for AI?
Zero trust for AI means treating every tool call and external action from an AI agent as an access request that must be authenticated, authorized, and inspected before execution. Instead of trusting an agent because it is deployed in your network, you assume the agent or its model could be compromised and verify every decision at runtime, using policies that deny by default and allow only explicitly permitted actions.
How do you apply zero trust to AI agents?
Apply zero trust by implementing four layers at the agent-to-tool boundary: identity verification (the agent must authenticate itself), authorization (the agent must have a policy-granted permission for the tool), runtime inspection (parameters are checked for threats and anomalies), and audit logging (every decision is recorded). Write deny-by-default policies for each agent, specifying exactly which tools it can call and under what conditions. Re-evaluate policy on every request, not just the first.
What is least privilege for AI systems?
Least privilege for AI means an agent has access only to the exact tools and data it needs to accomplish its specific task, no more. A reporting agent should not have permission to delete files. A customer-service agent should not access employee records. Implement least privilege through attribute-based access control (ABAC) policies that specify which tools each agent can call, what parameters are allowed, and under what conditions.
How does zero trust differ for AI versus traditional applications?
Traditional applications run code you wrote and deployed, so you can rely on code review and testing to control behavior. AI agents run code you wrote, but their behavior depends on model outputs you cannot fully predict or replay. Zero trust for AI must account for probabilistic, context-dependent behavior and continuous re-verification. It must inspect not just identity and authorization, but also the actual parameters and threat signals in each request, because the model could produce unexpected outputs at any time.
What is an AI agent zero trust policy?
An AI agent zero trust policy is a declarative ruleset that specifies which tools an agent can call, under what conditions, and what parameters are allowed. A policy rule typically includes a tool name, an action (ALLOW, DENY, CONSTRAIN, or LOG), and conditions that must all match for the rule to apply. Conditions can reference agent attributes, user attributes, time of day, data classification, request size, and threat signals. Policies are evaluated at runtime on every tool call and enforced by a policy engine.
Why is audit logging important for zero trust AI?
Audit logging is essential because it provides evidence of what actions the system allowed and why. Every decision (allow, deny, or constrain) is recorded with full context: agent identity, tool, parameters, policy rule, threat signals, and outcome. This log is the ground truth for compliance (regulated industries often require audit trails), incident response (you can trace what happened during a breach), and policy tuning (you can see which rules are working and which need adjustment).
What tools are needed to implement zero trust for AI?
You need a policy language (to define deny-by-default rules), a policy engine (to evaluate each tool call against the policy), runtime inspection (to check parameters for threats), threat detection (to identify injections, anomalies, and other signals), and audit logging (to record every decision). A policy engine can be built in-house or purchased as a service. Threat detection can use pattern matching, machine learning, or both. Audit logging typically relies on an immutable, append-only system (a dedicated audit database or service).
How do I implement least privilege for an AI agent?
Start by listing all the tools your agent needs to call. For each tool, specify the exact parameters and data the agent should be able to access. Write a policy rule that allows the tool only with those parameters and under conditions that match your use case (time of day, user role, request rate). Deny all other tools. Use attribute-based conditions to tie permissions to the invoking user's attributes (region, department, data clearance) so that the agent's effective permissions are narrowed based on context.
What is the difference between role-based and attribute-based access control for AI?
Role-based access control (RBAC) grants permissions based on a single role (e.g., "reporting agent"). Attribute-based access control (ABAC) grants permissions based on multiple attributes: the agent's role, the invoking user's role and region, the data's classification, the time of day, and other context. ABAC is more precise and suited to AI, where permissions often depend on who triggered the agent, what task it is running, and what data is involved.
Can zero trust policies be automated, or do they always require human approval?
Zero trust policies can be automated with different enforcement levels. ALLOW actions execute immediately without human review. LOG actions are recorded but allowed. CONSTRAIN actions are queued for human approval. DENY actions are blocked. You can set different enforcement levels for different tools and conditions: allow reading public data immediately, require approval for writing data, deny high-risk operations entirely. Over time, as you gain confidence in your policies, you can move tools from CONSTRAIN to ALLOW.
How do I detect when an AI agent is being attacked?
Look for anomalies in the audit log: unusual tool calls, unexpected parameters, requests from new times of day or regions, high error rates, or sudden spikes in request volume. Threat detection tools can check parameters for signs of prompt injection, SQL injection, or path traversal. Behavior analysis can flag when an agent deviates from its normal pattern. Integrate these signals into your policy rules (e.g., CONSTRAIN high-anomaly requests for human review).
Is zero trust for AI compliant with HIPAA, GDPR, and other regulations?
Zero trust policies and audit trails support compliance by enforcing least privilege (only the data an agent needs), logging access (audit trail of who accessed what), and enabling approval workflows (human review of sensitive actions). However, zero trust alone is not sufficient for compliance. You must also implement data minimization, encryption, retention policies, and other controls required by your regulatory framework. Use zero trust as one component of a broader compliance program.
See Vaikora enforce policy on your AI
Open-core AI runtime control. Self-host the MIT gateway free, or run the hosted Control Plane.
Get a demo Self-host the gateway
Vaikora