AOC-4: Trust Reset Detection
Shipped: March 2026 (Sprint V)
The problem
A behavioral baseline is only useful if it can be trusted. An adversary who resets an agent's context to a "clean" state — by opening a new session, switching channels, or cycling through identities — can bypass all within-session monitoring. The agent forgets its prior suspicion and complies with the same request it previously blocked.
AOC-4 addresses three attack vectors:
- Behavior reversal — an agent blocks an action in Session A, then allows the identical action in Session B after a context reset
- Session cycling — a requester rapidly opens multiple sessions for the same tool, probing for a permissive one
- Workflow baseline deviation — a multi-agent workflow suddenly exhibits different participant sets, delegation depths, tool distributions, or session durations
Trust Reset Detector
The TrustResetDetector is an event bus subscriber that maintains a per-agent rolling buffer of action dispositions. Every tool call, policy deny, and policy escalation is recorded as an ActionDispositionEntry — a lightweight in-memory record of who did what, when, and whether it was allowed.
Action classification
Raw action strings from tool calls are mapped to five stable semantic classes, ensuring that synonymous operations are compared correctly across sessions:
| Action class | Raw actions mapped |
|---|---|
read | read, get, list, search, query |
write | write, create, update, put, patch |
delete | delete, remove |
execute | execute, run, call, invoke |
send | send, post, publish, message |
Detection conditions
The detector evaluates three conditions on every incoming disposition event:
Condition A — Cross-session reversal (HIGH severity)
Fires BEHAVIOR_REVERSAL when the same (agent_id, requester_id, action_class) tuple appears with an opposite disposition in a different session within a 2-hour window.
The "dangerous direction" is blocked/escalated followed by allowed — the agent stopped resisting. The reverse direction (allowed then blocked) is also flagged, as it may indicate policy manipulation.
Session A (10:00): agent-1 / user@corp.com / delete → blocked
Session B (10:45): agent-1 / user@corp.com / delete → allowed ← BEHAVIOR_REVERSAL (Condition A)
Condition B — Accumulated block override (HIGH severity)
Fires BEHAVIOR_REVERSAL when an agent has accumulated 3 or more blocked or escalated dispositions for a specific (requester_id, action_class) pair across prior sessions, and a new session allows the same action.
This catches adversaries who have been persistently denied but eventually find a session context where the agent complies.
Sessions 1-4: agent-1 / attacker@ext / execute → blocked (4 times)
Session 5: agent-1 / attacker@ext / execute → allowed ← BEHAVIOR_REVERSAL (Condition B)
Condition C — Session cycling (MEDIUM severity)
Fires REQUESTER_SESSION_CYCLING when the same requester opens 3 or more distinct sessions within 30 minutes, each requesting the same tool, with varying dispositions (some allowed, some blocked).
This pattern indicates an attacker probing for a permissive session configuration.
Session X (14:00): user@ext / write_file → blocked
Session Y (14:10): user@ext / write_file → blocked
Session Z (14:20): user@ext / write_file → allowed ← REQUESTER_SESSION_CYCLING
Cooldown and alert management
BEHAVIOR_REVERSAL events have a 5-minute per-agent cooldown to prevent alert storms after legitimate policy changes that shift disposition patterns. If the same agent triggers multiple reversals within the cooldown window, only the first alert fires.
REQUESTER_SESSION_CYCLING has no cooldown — each detection fires immediately.
Persistent storage
Action disposition records are held in memory as a per-agent deque (max 500 entries) for fast lookup. A background task in main.py flushes dirty buffers to the action_disposition_records TimescaleDB hypertable every 60 seconds, so detection state survives process restarts.
The hypertable is indexed on (agent_id, action_class, requester_id) for efficient cross-session queries during recovery.
Tuning constants
| Constant | Default | Description |
|---|---|---|
REVERSAL_WINDOW_HOURS | 2 | Look-back window for Conditions A and B |
REVERSAL_BLOCK_THRESHOLD | 3 | Minimum prior blocks required for Condition B |
SESSION_CYCLING_WINDOW_MINUTES | 30 | Time window for Condition C |
SESSION_CYCLING_SESSION_THRESHOLD | 3 | Minimum distinct sessions for Condition C |
REVERSAL_COOLDOWN_MINUTES | 5 | Per-agent cooldown on BEHAVIOR_REVERSAL alerts |
Workflow Baseline Monitor
The WorkflowBaselineMonitor tracks per-workflow behavioral metrics across sessions and fires alerts when a live session deviates significantly from the established baseline. It operates alongside the Trust Reset Detector — while the detector focuses on individual agent reversals, the baseline monitor watches for anomalies in multi-agent workflow patterns.
Baseline construction
Each time a workflow session closes, the monitor updates the workflow's baseline using exponentially weighted moving averages (EWMA, alpha = 0.2). This gives recent sessions more weight while still reflecting historical patterns:
- Average session duration (seconds)
- Average participant count
- Average events per session
- Average delegation depth
- Tool distribution (normalized call fractions per tool)
- Inter-agent latency (average milliseconds between agent transitions)
- Recent participants (union of agents seen in the last 5 sessions)
A minimum of 3 completed sessions (MIN_BASELINE_SAMPLES) is required before anomaly detection engages. Before that threshold, only participant tracking runs — there is no history to compare against.
Anomaly detection
Four anomaly types are monitored in real time:
Unexpected participant
Fires WORKFLOW_PARTICIPANT_UNEXPECTED when an agent participates in a workflow session but is not in the recent participant set. This catches unauthorized agent injection — a prompt injection that introduces a new agent into an established workflow.
Delegation depth spike
Fires WORKFLOW_DEPTH_SPIKE when the current session's maximum delegation depth exceeds 2x the baseline average (with a minimum of avg_depth + 2). A sudden depth increase indicates possible unauthorized sub-agent spawning.
Tool distribution divergence
Fires WORKFLOW_TOOL_DISTRIBUTION_ANOMALY when the Bray-Curtis dissimilarity between the session's tool usage distribution and the baseline exceeds 0.5. Bray-Curtis ranges from 0.0 (identical) to 1.0 (completely different), making it a natural fit for comparing proportional distributions.
Baseline: {search: 0.6, read_file: 0.3, write_file: 0.1}
Observed: {search: 0.1, delete_file: 0.7, write_file: 0.2}
Divergence: 0.82 → WORKFLOW_TOOL_DISTRIBUTION_ANOMALY
This anomaly type is also triggered by scope probe patterns: when 3 or more DELEGATION_SCOPE_PROBE events accumulate within a single workflow session, the monitor fires WORKFLOW_TOOL_DISTRIBUTION_ANOMALY with anomaly_type: "scope_probe_pattern". This correlates individual delegation scope probes (from AOC-3 Phase 3) into a workflow-level signal of possible prompt injection attempting to enumerate permissions.
Session duration anomaly
Fires WORKFLOW_DURATION_ANOMALY when a completed session's duration exceeds 3x the baseline average. Abnormally long sessions may indicate data exfiltration, resource abuse, or an attacker maintaining a persistent presence.
Event types
| Event type | Severity | Trigger |
|---|---|---|
BEHAVIOR_REVERSAL | high | Cross-session disposition flip (Conditions A/B) |
REQUESTER_SESSION_CYCLING | medium | Rapid session rotation with varying dispositions |
WORKFLOW_PARTICIPANT_UNEXPECTED | medium | Unknown agent in established workflow |
WORKFLOW_DEPTH_SPIKE | medium | Delegation depth exceeds 2x baseline |
WORKFLOW_TOOL_DISTRIBUTION_ANOMALY | medium | Tool usage diverged from baseline or scope probe pattern |
WORKFLOW_DURATION_ANOMALY | medium | Session duration exceeds 3x baseline |
Architecture
Event Bus
│
┌───────────┴───────────┐
▼ ▼
TrustResetDetector WorkflowBaselineMonitor
│ │
Per-agent deque Per-workflow session state
(maxlen=500) (_SessionState tracking)
│ │
┌────────┴────────┐ ┌────────┴─────── ─┐
│ Condition A/B/C │ │ Anomaly checks │
│ detection logic │ │ on each event + │
└────────┬────────┘ │ on session close│
│ └────────┬────────┘
▼ ▼
BEHAVIOR_REVERSAL WORKFLOW_*_ANOMALY
REQUESTER_SESSION_ events + DB alerts
CYCLING events │
│ ▼
▼ WorkflowBaseline
ActionDisposition (EWMA update)
Record (TimescaleDB
hypertable, 60s flush)
Both detectors subscribe to the event bus as wildcard listeners (event_bus.subscribe(None, handler)) and filter internally for relevant event types (TOOL_CALL, POLICY_DENY, POLICY_ESCALATE, and DELEGATION_SCOPE_PROBE).
Integration with other AOC modules
AOC-4 builds on signals from the other three AOC modules:
- AOC-1 (Inbound Injection Detection) — injection findings may cause
POLICY_DENYorPOLICY_ESCALATEevents that feed into the Trust Reset Detector. An attacker who triggers injection detection in one session and retries in another will be caught by Condition A or B. - AOC-1.5 (Content Trust Domains + Behavioral Drift) — the
DriftDetectorfiresINJECTION_CONDITIONING_SUSPECTEDandBEHAVIORAL_DRIFT_DETECTEDevents. While AOC-4 does not directly consume these, the Red Team Policy Automation Loop (Sprint RT) generates policy candidates from both drift and trust reset signals. - AOC-3 (Requester Identity) — the
requester_idfield is central to all three detection conditions. Without AOC-3's header propagation, the detector cannot correlate actions across sessions by requester. Phase 3 delegation tokens provideDELEGATION_SCOPE_PROBEevents that feed into the Workflow Baseline Monitor's scope probe pattern detection.