Agentic Security
Behavry's Agentic Security program (AOC — Agents of Chaos) addresses attack vectors unique to autonomous AI agents that traditional security tools miss entirely.
Inspired by Shapira et al., "Agents of Chaos" (arXiv:2602.20021, Feb 2026).
Why agents are different
Traditional security assumes a human is in the loop. AI agents:
- Execute at machine speed — no natural pause for humans to catch mistakes
- Trust external content — fetched web pages, API responses, and file reads become agent context
- Have broad authority — a single agent may have access to filesystem, email, GitHub, and Slack simultaneously
- Are coachable — adversaries can embed instructions in data the agent will read
The AOC threat model
| AOC | Threat | Status |
|---|---|---|
| AOC-1 | Inbound injection — instructions embedded in tool results | Shipped (Sprint AOC-1, enhanced in AOC-1.5) |
| AOC-2 | Blast radius — agent takes actions with disproportionate impact | Shipped (Sprint Y, March 2026) |
| AOC-3 | Requester identity — who actually instructed this action? | Shipped (Phases 1-2 Sprint AOC-3, Phase 3 Sprint U) |
| AOC-4 | Trust reset — agent that was safe suddenly becomes unsafe | Shipped (Sprint V, March 2026) |
All four AOC controls are fully implemented and tested. The Red Team Policy Automation Loop (Sprint RT) ties them together as a cross-cutting adaptive defense layer — see below.
AOC-1: Inbound Injection
An attacker who controls an external resource (a GitHub Gist, a shared doc, a webhook response) embeds imperative instructions. The agent fetches it, incorporates it into context, and acts on it. Behavry scans tool results before they reach the agent. Enhanced in Sprint AOC-1.5 with content trust domain tagging and behavioral drift detection.
AOC-2: Blast Radius Limits
Even when individual actions pass policy, their cumulative scope can be disproportionate. Behavry evaluates every action against configurable blast radius thresholds — shallow deletes are denied outright, while bulk operations, mass messaging, config path writes, and protected file access trigger HITL escalation. Exception hardening prevents policy erosion through frequency monitoring and baseline poisoning detection.
AOC-3: Requester Identity
When an agent acts, it often does so on behalf of a human user. If the identity of the actual requester is not propagated, a compromised agent can masquerade as any user. Behavry adds X-Requester-Id to every action, alerts on mismatches, and verifies requester claims cryptographically through delegation token chains (Phase 3, Sprint U).
AOC-4: Cross-Session Trust Reset Detection
An agent that consistently operated safely may suddenly reverse its behavior across sessions. The Trust Reset Detector tracks action dispositions (allow, block, escalate) per agent and fires alerts when it detects cross-session behavior reversals, requester session cycling, or unexpected workflow participants. Workflow Behavioral Baselines use EWMA to track normal operating patterns and flag anomalies in tool distribution, delegation depth, session duration, and participant sets.
Red Team Policy Automation Loop
The Policy Generator (Sprint RT) is a cross-cutting adaptive defense layer that subscribes to the event bus and watches for security events from all four AOCs. When it detects a repeated attack pattern, it automatically generates a candidate OPA policy to block it.
How it works:
- Security events arrive from AOC detectors (injection blocked, drift detected, behavior reversal, session cycling)
- The generator computes a normalized pattern signature for deduplication
- Confidence scoring evaluates pattern frequency, severity, and corroboration across multiple sources
- A candidate Rego policy is generated from templates and proposed for admin review
- If the tenant has auto-activation enabled and confidence exceeds the threshold, the policy is activated immediately
Candidate policies are managed through the dashboard's Policy Suggestions page, where admins can review, edit, test, approve, or reject each candidate.