Vaikora › Blog › Detection & SOC
AI Agent IOCs: Detection Rules for AI Threats in SIEM
AI agent threat detection in SIEM requires understanding a new class of indicators: not just file hashes or IP addresses, but behavioral patterns unique to autonomous agent execution. Unlike traditional malware, compromised or malicious AI agents operate through API calls, tool invocations, token consumption anomalies, and prompt injection artifacts. Building effective detection rules means mapping agent-specific IOCs (indicators of compromise) to your SIEM and correlating behavioral signals that flag unauthorized or anomalous agent activity before it causes damage.
What Are AI Agent IOCs and Why They Matter
Traditional IOCs focus on binary artifacts: malware hashes, C2 domains, compromised credentials. AI agents operate differently. They consume tokens, invoke external tools, write to datastores, and make autonomous decisions, all logged by the agent runtime. An AI agent IOC is any observable signal that a model or agent has been compromised, is behaving anomalously, or is executing unauthorized actions. These include API key exfiltration patterns, token consumption spikes, repeated policy violations, tool invocation sequences that don't match historical patterns, and prompt injection signatures in API logs.
The urgency is clear: AI agents are increasingly autonomous, often with credentials and tool access that rival human administrators. A compromised agent can trigger many rapid tool invocations, exfiltrate data through multiple parallel API calls, or attempt to modify audit logs. Traditional rule-based detection misses these patterns because they don't fit the malware playbook. You need agent-specific rules.
AI-Specific Indicators for SIEM Detection
Behavioral Anomalies in Token Consumption
Large language models are metered by token usage. A normal agent's token consumption depends on its function: simple lookups might consume 100, 500 tokens per execution, while complex reasoning tasks can consume 3,000, 10,000 tokens. Token spikes that exceed your agent's historical patterns are a key signal. Build detection rules that flag:
- Token consumption exceeding the 99th percentile of an agent's historical average, sustained over a time window. (Adjust the percentile and window based on your environment: a high-frequency agent might use a 95th percentile threshold, while a low-frequency one might use 99.5th.)
- Request rate increasing dramatically within a short window. For example, if an agent normally makes 5 requests per hour, a spike to 50 requests in 5 minutes warrants investigation.
- Out-of-hours execution at rates inconsistent with scheduled maintenance.
Most LLM API services (OpenAI, Anthropic, Azure OpenAI) log these metrics. Ingest token usage and request rate logs into your SIEM and correlate by agent ID or API key. The key principle: establish a baseline for each agent and alert on significant deviations.
Tool Invocation Chains and Policy Violations
Agents invoke tools: APIs, functions, databases, cloud services. Each invocation is an opportunity for detection. A rule might flag:
- Repeated tool invocations to the same endpoint within seconds, suggesting automated enumeration or data exfiltration.
- Tool invocations to sensitive or restricted functions that the agent historically never calls.
- Attempts to invoke tools that the agent's policy explicitly forbids (policy violations caught by runtime controls are the signal).
If you use a runtime control system that enforces policy on agent actions, those violations are gold: every BLOCK or CONSTRAIN decision from the policy engine is a behavioral signal. Correlate policy decision logs with agent identity and timestamp.
Prompt Injection and Jailbreak Signatures
Attackers target agents by injecting prompts or system message overrides into user input or retrieved data. Prompt injection patterns often include:
- Abrupt topic shifts in user input or retrieved documents: "ignore previous instructions" or "switch to developer mode."
- Repeated attempts to invoke tools outside the agent's defined role.
- Requests for sensitive operations (credential access, data deletion) that contradict the agent's normal function.
Parse agent logs for these keywords and phrases. Flag user inputs that contain instruction overrides or role-swap language. If your agent logs the full prompt before inference, inspect it for injection markers.
API Key and Credential Anomalies
An AI agent's most dangerous asset is its API credentials. Detection should focus on:
- API key usage from unusual IP addresses, geographies, or times.
- API keys associated with one agent being used by another service or user account.
- API key creation or rotation events coinciding with anomalous agent behavior (a spike in token usage shortly after a new key is created can signal key compromise and immediate abuse).
Correlate IAM logs (API key creation, rotation, revocation) with agent execution logs to detect credential compromise.
Building Detection Rules in Microsoft Sentinel with KQL
Sentinel is widely deployed in enterprise SOCs. Here's a practical KQL rule that detects token consumption anomalies:
let baseline = AzureOpenAICalls
| where TimeGenerated > ago(30d)
| summarize avg_tokens = avg(TokensUsed), stdev_tokens = stdev(TokensUsed) by AgentId, bin(TimeGenerated, 1h)
| summarize avg_baseline = avg(avg_tokens), stdev_baseline = avg(stdev_tokens) by AgentId;
AzureOpenAICalls
| where TimeGenerated > ago(1h)
| join (baseline) on AgentId
| where TokensUsed > (avg_baseline + (3 * stdev_baseline))
| summarize token_count = sum(TokensUsed), request_count = count() by AgentId, TimeGenerated
| where token_count > 50000 or request_count > 100
| project AgentId, token_count, request_count, TimeGenerated
This rule calculates a dynamic baseline over 30 days, then flags any agent that consumes tokens more than three standard deviations above average in a single hour. Adjust the threshold based on your environment and agent function.
For tool invocation detection:
AgentToolInvocations
| where TimeGenerated > ago(1h)
| where ToolName in ("DeleteData", "ModifyPolicy", "ExportSecrets")
| summarize invoke_count = count() by AgentId, ToolName, TimeGenerated
| where invoke_count > 10
| project AgentId, ToolName, invoke_count, TimeGenerated, AlertSeverity = "High"
This rule flags any agent that attempts high-risk operations (data deletion, policy modification, secret export) more than 10 times in an hour. Adjust the threshold to match your environment's risk tolerance.
For prompt injection detection, parse agent input logs:
AgentPrompts
| where TimeGenerated > ago(1h)
| where UserInput contains "ignore previous instructions"
or UserInput contains "switch to developer mode"
or UserInput contains "act as an admin"
| project AgentId, UserInput, TimeGenerated, AlertSeverity = "Medium"
Tuning Thresholds to Your Environment
The thresholds shown above (99th percentile, 3 standard deviations, 10 invocations per hour) are starting points, not universal rules. Your environment's normal behavior is the truth. Spend 1, 2 weeks observing agent logs without alerting, calculate actual baselines, and then set thresholds that reduce false positives while catching real anomalies. A threshold that works for one agent (high-volume data processor) may not work for another (low-frequency lookup service). Baseline per agent, not globally.
Detection Rules in Splunk with SPL
Splunk is often deployed alongside or instead of Sentinel. The same detection principles apply in SPL (Search Processing Language), but the syntax differs. Here are practical Splunk rules that mirror the Sentinel examples.
For token consumption anomalies:
sourcetype="llm_api_usage"
| stats avg(tokens_used) as avg_tokens, stdev(tokens_used) as stdev_tokens by agent_id
| eval baseline_low = avg_tokens - (3 * stdev_tokens), baseline_high = avg_tokens + (3 * stdev_tokens)
| append [search sourcetype="llm_api_usage" earliest=-1h | eval deviation = (tokens_used - avg_tokens) / stdev_tokens]
| where deviation > 3 or tokens_used > baseline_high
| stats sum(tokens_used) as total_tokens, count as request_count by agent_id, _time
| where total_tokens > 50000 or request_count > 100
| table agent_id, total_tokens, request_count, _time
This rule builds a statistical baseline of token usage per agent, then flags outliers in the last hour. The deviation calculation identifies how many standard deviations a request exceeds the mean.
For high-risk tool invocations:
sourcetype="agent_tool_invocations" tool_name IN ("DeleteData", "ModifyPolicy", "ExportSecrets", "RotateKey")
| stats count as invoke_count by agent_id, tool_name
| where invoke_count > 10
| eval severity = "High"
| table agent_id, tool_name, invoke_count, severity
This rule counts invocations of sensitive tools per agent and flags when any agent exceeds a threshold (10 in a time window). Add any agent-specific dangerous tools to the tool_name list. Adjust the invoke_count threshold based on your acceptable risk profile.
For prompt injection attempts:
sourcetype="agent_requests"
| search user_input="*ignore previous*" OR user_input="*developer mode*" OR user_input="*jailbreak*" OR user_input="*act as admin*"
| stats count as injection_attempts by agent_id, user_id, _time
| where injection_attempts >= 2
| eval severity = "Medium"
| table agent_id, user_id, injection_attempts, _time, severity
This rule detects prompt injection keywords in user input logs and flags users or agents making repeated injection attempts. Adjust keywords based on your observed attack patterns. The count threshold (2 in a time window) can be tuned down if false positives are low in your environment.
Splunk Baseline and Alerting Best Practices
Splunk's eventstats and stats commands can build per-agent baselines without storing them externally. Use eventstats to tag anomalies on each event:
sourcetype="llm_api_usage" earliest=-30d
| eventstats avg(tokens_used) as avg_tokens, stdev(tokens_used) as stdev_tokens by agent_id
| eval is_anomaly = if(tokens_used > (avg_tokens + (3 * stdev_tokens)), 1, 0)
| search is_anomaly=1 earliest=-1h
| table _time, agent_id, tokens_used, avg_tokens, is_anomaly
This approach labels every event, making it easy to alert on anomalies as they arrive. Save this as a scheduled search, generate an alert when is_anomaly=1, and pipe anomalies to your ticketing system.
Log Sources and Onboarding for AI Detection
Effective agent detection depends on ingesting the right logs. Identify these data sources and field mappings before building rules.
Essential Log Sources
LLM API Usage Logs: OpenAI, Anthropic, Azure OpenAI, or self-hosted LLM serving layers emit logs for every API call. Fields to capture: timestamp, agent_id, user_id, api_key or api_key_hash, tokens_used (input and output separately), model_version, request_duration_ms, response_status (success, error, throttled), cost_estimate.
Agent Execution and Tool Invocation Logs: The agent framework (LangChain, AutoGen, Vaikora, or custom) logs every tool the agent invokes. Fields: timestamp, agent_id, agent_version, tool_name, tool_parameters (sanitized of secrets), invocation_result (success or failure reason), tool_response_summary (truncated for security), execution_time_ms, policy_decision (ALLOW, BLOCK, CONSTRAIN, LOG).
Runtime Policy Decision Logs: If your agent runs behind a policy enforcement layer (the Vaikora gateway, a custom policy engine, or a model endpoint with safety filters), those decisions are critical signals. Fields: timestamp, agent_id, decision (ALLOW, BLOCK, CONSTRAIN, LOG), policy_violated (which rule or guardrail fired), attempted_action (tool name, parameter subset, or prompt text), user_id, session_id.
IAM and Key Management Logs: Track API key lifecycle events: creation, rotation, revocation, usage location, and associated agents or services. Fields: timestamp, key_id, agent_id, action (created, rotated, revoked), actor (user or service), ip_address, geographic_location, key_age_days.
Gateway and Egress Logs: If agents make external API calls, log them. Fields: timestamp, agent_id, destination_url, http_method, response_status_code, bytes_transferred, latency_ms, user_agent. These logs catch exfiltration attempts or reconnaissance calls to unexpected endpoints.
Model Output and Reasoning Logs: Some deployments log the model's full output (or a summary). If captured, fields include: timestamp, agent_id, input_prompt (or hash), output_summary, confidence_score, content_flags (if any safety filter was triggered).
Field Normalization and Correlation Keys
Before correlating logs across systems, normalize field names. Define a common schema:
| Source Type | Native Field | Normalized Field | Type | Notes |
|---|---|---|---|---|
| LLM API usage | request_id | agent_request_id | string | Unique per LLM call |
| Agent tool log | agentId | agent_id | string | Canonical agent identifier |
| IAM log | keyId | api_key_id | string | API key identifier, hashed if sensitive |
| Policy log | timestamp | event_timestamp | unix_epoch_seconds | Synchronized across all sources |
| Gateway log | src_ip | caller_ip | ip | Source IP of the request |
Normalize enum fields too: map policy decisions to a canonical set [ALLOW, BLOCK, CONSTRAIN, LOG] across all policy engines. Map tool invocation results to [success, error, timeout, unauthorized].
Ingestion Checklist
Before declaring a log source "ready for detection":
- Confirm the source is ingesting data into your SIEM at least daily (latency < 1 hour is ideal for real-time alerts).
- Verify all required fields are present and populated on at least 95% of records.
- Run a manual baseline query to understand cardinality: how many unique agents, users, API keys, and tools appear in the data over 7 days.
- Test a simple correlation query (e.g., join agent logs to policy decision logs on agent_id and timestamp) to confirm field alignment.
- Document the source, field mappings, data owner, and retention period in a data dictionary.
Once this groundwork is done, building detection rules becomes straightforward.
Response Playbook: When an AI Agent IOC Fires
Detection is only half the battle. When a rule triggers, your response must be swift and methodical. This playbook walks through triage, containment, and root-cause analysis for agent-specific incidents.
Immediate Triage (Minutes 0-5)
When a high-confidence alert fires:
-
Confirm the alert is real. Check the SIEM to verify the triggering event exists in the raw logs. False positives happen when thresholds are poorly tuned or data quality issues cause spurious spikes. Pull the raw agent log, policy decision, and API usage records.
-
Identify the agent and its credentials. Extract the agent_id, API key (or key_id), and API key age from the alert context. If the key is very new (created in the last hour), the spike might be legitimate warm-up behavior, but treat it as higher risk. If the key is old and previously stable, the spike is more alarming.
-
Scope the blast radius. Ask: which tools does this agent have access to? Which data stores or APIs can it reach? Cross-reference the agent's configuration against the tool invocation logs to see what damage it could inflict. A reporting agent with read-only database access is lower risk than an agent with delete permissions on a production database.
-
Check for concurrent signals. Did the alert fire in isolation, or are there related alerts? Scan the SIEM for policy violations, unusual API key usage, or repeated prompt injection attempts from the same user or IP in the last 30 minutes. If multiple signals fire together, confidence is higher.
Immediate Containment (Minutes 5-10)
Once triage is complete:
-
Revoke the agent's API credentials immediately. If the agent uses an API key, revoke it in your LLM provider's console (OpenAI, Anthropic, Azure, etc.) and in your IAM system. Do not wait for approval; a compromised agent key is an active attack. Document the revocation time and reason in an incident ticket.
-
Kill the agent's active sessions. If your agent framework or gateway tracks sessions, terminate all sessions for the identified agent. This stops any in-flight tool invocations or API calls.
-
Apply a blanket BLOCK policy. If you have a runtime policy engine (like the Vaikora gateway), immediately apply a policy that blocks all tool invocations by the suspect agent. The policy should log the block decision so you can trace what the agent tried to do.
-
Notify the agent's owner and stakeholders. Send an urgent alert to the team that owns the agent, the data/tool owners whose resources the agent accesses, and your incident response team. Include the agent ID, the triggering IOC, the blast radius assessment, and the containment actions taken.
Investigation and Root Cause (Hours 1-24)
After immediate containment:
-
Preserve the signed decision log as evidence. If your policy engine or gateway logs decisions in an append-only, cryptographically signed format, export these logs to a secure, read-only store. This log is your evidence of what the agent tried to do and what your system prevented or allowed.
-
Analyze the prompt and tool invocation sequence. Pull the full sequence of: - User inputs that fed the agent. - The agent's reasoning or intermediate steps (if logged). - Tool invocations the agent made, in order. - The responses from each tool.
Look for patterns: Did the agent suddenly start invoking tools it never called before? Did the prompts shift to requesting sensitive operations? Did tool parameters change (e.g., querying a different table or user list)?
- Determine the injection vector. If prompt injection is suspected, trace the source: - Did malicious input come from a user account? - Did it come from data the agent retrieved from an external source (API, database, file)? - Did it come from a compromised system or tool that returned unexpected data?
Cross-reference user IDs with authentication logs (MFA usage, login times, IP addresses). If the injection came from external data, notify the data owner that their system may be compromised.
-
Review the agent's configuration and access policies. Check: - When was the agent last deployed or updated? Was there a recent code change that introduced new tool access or weakened input validation? - What tools was the agent authorized to invoke? Should some of these tools have been blocked or constrained? - What data does the agent have access to? Should sensitive data have been redacted or masked?
-
Correlate with IAM events. Check if the API key was recently rotated, if the agent was recently given new tool access, or if credentials for associated services (databases, APIs the agent calls) were changed. A key rotation shortly before an alert can indicate proactive containment of a previous compromise.
Post-Incident (Days 2-7)
After the immediate response:
-
Update your detection rules. If the attack revealed a new pattern, add a rule to catch similar behavior. If your threshold was tuned too high and missed the early warning signs, lower it or add intermediate alert levels.
-
Patch the injection vector. If the root cause was insufficient input validation, update the agent's prompt template or input parser. If it was a data source returning unexpected content, work with the data owner to add validation. If it was a configuration issue, review all similar agents.
-
Conduct a post-mortem. Document the incident timeline, the detection that caught it, the response actions, and lessons learned. Identify what worked (e.g., "policy block prevented tool invocation") and what didn't (e.g., "false positive rate is too high"). Adjust thresholds, playbooks, and tooling based on findings.
-
Re-establish trust in the agent. Once the root cause is fixed, reissue credentials, deploy the patched agent, and monitor its behavior closely for the next 1-2 weeks. Use an even tighter baseline during this period to catch regression.
Correlating Signals: From Individual IOCs to High-Confidence Alerts
Individual IOCs are signals; correlated signals are alerts. Build multi-stage detection that combines low-confidence indicators into high-confidence alarms:
- Token spike (medium confidence) + policy violation (medium confidence) = high-confidence compromise alert.
- Unusual API key usage location (medium confidence) + first-ever invocation of sensitive tool (medium confidence) = escalate to IR team.
- Prompt injection keyword detected (low confidence) + tool invocation to restricted function within 1 minute (medium confidence) = investigate.
Use Sentinel's correlation rules (formerly fusion detection) or custom KQL joins to combine these signals across a time window appropriate to your agent design (5, 30 minutes is typical).
Advanced correlation strategies:
Cross-time-window correlation. Don't just look at a 1-minute window. Correlate events spanning 5, 15, or 60 minutes to catch slow-moving exfiltration: a high rate of tool invocations to a database query tool, each query returning 1,000 records, spread across 30 minutes. Individual queries might seem normal, but the aggregate pattern is suspicious.
User and resource-based correlation. When an agent runs on behalf of a user, correlate the agent's behavior with the user's own behavior. If user alice normally has the agent run twice per day, and suddenly the agent runs 100 times in 2 hours under alice's user context, flag it. Combine this with API key location (if alice's usual location is US and the agent's API key is now used from a Chinese IP), and confidence increases.
Credential and tool correlation. When an API key is new or recently rotated, apply tighter thresholds for the first week. A key that's been in service for months is less suspicious if it suddenly spikes; a key that's 3 hours old and already spiking is very suspicious. Combine this with the tools being invoked: if a new key is used to invoke a tool the agent never invoked before, that's a second signal.
Time-of-day correlation. Agents often run on schedules. If an agent runs every weekday at 9 AM, and it suddenly runs at 3 AM on a Sunday, that's a signal. Combine this with a token consumption spike or unusual tool invocation, and you have a composite alert worth high confidence.
Runtime Control as the Detection and Response Layer
Detection rules catch compromises after they've begun. Runtime policy enforcement stops them before execution. A control plane that enforces policy on every agent action, logging ALLOW, BLOCK, CONSTRAIN, and LOG decisions, creates a complete audit trail that feeds into SIEM detection. Every BLOCK decision is a blocked attack; every CONSTRAIN decision is an attack that was allowed but limited. Correlating these policy decisions with agent behavior in your SIEM gives you both detection and evidence of successful defense.
Frequently asked questions
What IOCs do AI agents generate?
AI agents generate behavioral IOCs: token consumption spikes, unusual tool invocation sequences, API key usage anomalies, and policy violations. Unlike traditional malware, agent IOCs are quantitative (high request rates, token count deviations) and behavioral (repeated calls to restricted functions) rather than file-based (hashes, signatures).
How do you detect AI agent threats in a SIEM?
Use agent-specific detection rules: correlate token consumption logs and request rates against each agent's historical baseline, monitor tool invocation patterns, flag policy violations from runtime controls, and detect prompt injection keywords in input logs. Combine multiple signals across a time window to raise high-confidence alerts.
What anomalies indicate AI agent compromise?
Sudden spikes in token consumption relative to baseline, requests from unusual geographies or times, repeated invocations of sensitive tools, policy violations, attempts to invoke tools outside the agent's defined role, and API key usage from unexpected locations all indicate compromise or malicious behavior.
How do you write detection rules for AI agent threats?
Write rules in your SIEM (KQL for Sentinel, SPL for Splunk) that baseline normal agent behavior for each agent, then flag deviations: token consumption exceeding historical thresholds, tool invocation rates above baseline, and policy decision violations. Correlate multiple signals into composite alerts. Remember: thresholds are environment-dependent; adjust them based on your actual baseline data, not industry-wide numbers.
What's the difference between AI threat detection and traditional malware detection?
Traditional malware detection hunts for known hashes and signatures. AI threat detection monitors behavioral signals unique to agents: token usage, API calls, model outputs, and policy decisions. Agents don't install malware; they abuse their own credentials and tool access.
How should I log agent activity for SIEM ingestion?
Log agent execution events (API calls, token usage, model responses), tool invocations with parameters, policy decisions (ALLOW, BLOCK, CONSTRAIN, LOG), API key usage and rotation events, and user inputs. Timestamp everything and include agent ID, user, and resource identifiers for correlation.
See Vaikora enforce policy on your AI
Open-core AI runtime control. Self-host the MIT gateway free, or run the hosted Control Plane.
Get a demo Self-host the gateway
Vaikora