AOC-4: Trust Reset Detection

Shipped: March 2026 (Sprint V)

The problem

A behavioral baseline is only useful if it can be trusted. An adversary who resets an agent's context to a "clean" state — by opening a new session, switching channels, or cycling through identities — can bypass all within-session monitoring. The agent forgets its prior suspicion and complies with the same request it previously blocked.

AOC-4 addresses three attack vectors:

Behavior reversal — an agent blocks an action in Session A, then allows the identical action in Session B after a context reset
Session cycling — a requester rapidly opens multiple sessions for the same tool, probing for a permissive one
Workflow baseline deviation — a multi-agent workflow suddenly exhibits different participant sets, delegation depths, tool distributions, or session durations

Trust Reset Detector

The TrustResetDetector is an event bus subscriber that maintains a per-agent rolling buffer of action dispositions. Every tool call, policy deny, and policy escalation is recorded as an ActionDispositionEntry — a lightweight in-memory record of who did what, when, and whether it was allowed.

Action classification

Raw action strings from tool calls are mapped to five stable semantic classes, ensuring that synonymous operations are compared correctly across sessions:

Action class	Raw actions mapped
`read`	read, get, list, search, query
`write`	write, create, update, put, patch
`delete`	delete, remove
`execute`	execute, run, call, invoke
`send`	send, post, publish, message

Detection conditions

The detector evaluates three conditions on every incoming disposition event:

Condition A — Cross-session reversal (HIGH severity)

Fires BEHAVIOR_REVERSAL when the same (agent_id, requester_id, action_class) tuple appears with an opposite disposition in a different session within a 2-hour window.

The "dangerous direction" is blocked/escalated followed by allowed — the agent stopped resisting. The reverse direction (allowed then blocked) is also flagged, as it may indicate policy manipulation.

Session A (10:00): agent-1 / user@corp.com / delete → blocked
Session B (10:45): agent-1 / user@corp.com / delete → allowed   ← BEHAVIOR_REVERSAL (Condition A)

Condition B — Accumulated block override (HIGH severity)

Fires BEHAVIOR_REVERSAL when an agent has accumulated 3 or more blocked or escalated dispositions for a specific (requester_id, action_class) pair across prior sessions, and a new session allows the same action.

This catches adversaries who have been persistently denied but eventually find a session context where the agent complies.

Sessions 1-4: agent-1 / attacker@ext / execute → blocked (4 times)
Session 5:    agent-1 / attacker@ext / execute → allowed   ← BEHAVIOR_REVERSAL (Condition B)

Condition C — Session cycling (MEDIUM severity)

Fires REQUESTER_SESSION_CYCLING when the same requester opens 3 or more distinct sessions within 30 minutes, each requesting the same tool, with varying dispositions (some allowed, some blocked).

This pattern indicates an attacker probing for a permissive session configuration.

Session X (14:00): user@ext / write_file → blocked
Session Y (14:10): user@ext / write_file → blocked
Session Z (14:20): user@ext / write_file → allowed   ← REQUESTER_SESSION_CYCLING

Cooldown and alert management

BEHAVIOR_REVERSAL events have a 5-minute per-agent cooldown to prevent alert storms after legitimate policy changes that shift disposition patterns. If the same agent triggers multiple reversals within the cooldown window, only the first alert fires.

REQUESTER_SESSION_CYCLING has no cooldown — each detection fires immediately.

Persistent storage

Action disposition records are held in memory as a per-agent deque (max 500 entries) for fast lookup. A background task in main.py flushes dirty buffers to the action_disposition_records TimescaleDB hypertable every 60 seconds, so detection state survives process restarts.

The hypertable is indexed on (agent_id, action_class, requester_id) for efficient cross-session queries during recovery.

Tuning constants

Constant	Default	Description
`REVERSAL_WINDOW_HOURS`	2	Look-back window for Conditions A and B
`REVERSAL_BLOCK_THRESHOLD`	3	Minimum prior blocks required for Condition B
`SESSION_CYCLING_WINDOW_MINUTES`	30	Time window for Condition C
`SESSION_CYCLING_SESSION_THRESHOLD`	3	Minimum distinct sessions for Condition C
`REVERSAL_COOLDOWN_MINUTES`	5	Per-agent cooldown on BEHAVIOR_REVERSAL alerts

Workflow Baseline Monitor

The WorkflowBaselineMonitor tracks per-workflow behavioral metrics across sessions and fires alerts when a live session deviates significantly from the established baseline. It operates alongside the Trust Reset Detector — while the detector focuses on individual agent reversals, the baseline monitor watches for anomalies in multi-agent workflow patterns.

Baseline construction

Each time a workflow session closes, the monitor updates the workflow's baseline using exponentially weighted moving averages (EWMA, alpha = 0.2). This gives recent sessions more weight while still reflecting historical patterns:

Average session duration (seconds)
Average participant count
Average events per session
Average delegation depth
Tool distribution (normalized call fractions per tool)
Inter-agent latency (average milliseconds between agent transitions)
Recent participants (union of agents seen in the last 5 sessions)

A minimum of 3 completed sessions (MIN_BASELINE_SAMPLES) is required before anomaly detection engages. Before that threshold, only participant tracking runs — there is no history to compare against.

Anomaly detection

Four anomaly types are monitored in real time:

Unexpected participant

Fires WORKFLOW_PARTICIPANT_UNEXPECTED when an agent participates in a workflow session but is not in the recent participant set. This catches unauthorized agent injection — a prompt injection that introduces a new agent into an established workflow.

Delegation depth spike

Fires WORKFLOW_DEPTH_SPIKE when the current session's maximum delegation depth exceeds 2x the baseline average (with a minimum of avg_depth + 2). A sudden depth increase indicates possible unauthorized sub-agent spawning.

Tool distribution divergence

Fires WORKFLOW_TOOL_DISTRIBUTION_ANOMALY when the Bray-Curtis dissimilarity between the session's tool usage distribution and the baseline exceeds 0.5. Bray-Curtis ranges from 0.0 (identical) to 1.0 (completely different), making it a natural fit for comparing proportional distributions.

Baseline:  {search: 0.6, read_file: 0.3, write_file: 0.1}
Observed:  {search: 0.1, delete_file: 0.7, write_file: 0.2}
Divergence: 0.82 → WORKFLOW_TOOL_DISTRIBUTION_ANOMALY

This anomaly type is also triggered by scope probe patterns: when 3 or more DELEGATION_SCOPE_PROBE events accumulate within a single workflow session, the monitor fires WORKFLOW_TOOL_DISTRIBUTION_ANOMALY with anomaly_type: "scope_probe_pattern". This correlates individual delegation scope probes (from AOC-3 Phase 3) into a workflow-level signal of possible prompt injection attempting to enumerate permissions.

Session duration anomaly

Fires WORKFLOW_DURATION_ANOMALY when a completed session's duration exceeds 3x the baseline average. Abnormally long sessions may indicate data exfiltration, resource abuse, or an attacker maintaining a persistent presence.

Event types

Event type	Severity	Trigger
`BEHAVIOR_REVERSAL`	high	Cross-session disposition flip (Conditions A/B)
`REQUESTER_SESSION_CYCLING`	medium	Rapid session rotation with varying dispositions
`WORKFLOW_PARTICIPANT_UNEXPECTED`	medium	Unknown agent in established workflow
`WORKFLOW_DEPTH_SPIKE`	medium	Delegation depth exceeds 2x baseline
`WORKFLOW_TOOL_DISTRIBUTION_ANOMALY`	medium	Tool usage diverged from baseline or scope probe pattern
`WORKFLOW_DURATION_ANOMALY`	medium	Session duration exceeds 3x baseline

Architecture

                     Event Bus
                        │
            ┌───────────┴───────────┐
            ▼                       ▼
   TrustResetDetector     WorkflowBaselineMonitor
            │                       │
   Per-agent deque          Per-workflow session state
   (maxlen=500)             (_SessionState tracking)
            │                       │
   ┌────────┴────────┐    ┌────────┴────────┐
   │ Condition A/B/C │    │ Anomaly checks  │
   │ detection logic │    │ on each event + │
   └────────┬────────┘    │ on session close│
            │             └────────┬────────┘
            ▼                      ▼
   BEHAVIOR_REVERSAL      WORKFLOW_*_ANOMALY
   REQUESTER_SESSION_     events + DB alerts
   CYCLING events         │
            │             ▼
            ▼        WorkflowBaseline
   ActionDisposition  (EWMA update)
   Record (TimescaleDB
   hypertable, 60s flush)

Both detectors subscribe to the event bus as wildcard listeners (event_bus.subscribe(None, handler)) and filter internally for relevant event types (TOOL_CALL, POLICY_DENY, POLICY_ESCALATE, and DELEGATION_SCOPE_PROBE).

Integration with other AOC modules

AOC-4 builds on signals from the other three AOC modules:

AOC-1 (Inbound Injection Detection) — injection findings may cause POLICY_DENY or POLICY_ESCALATE events that feed into the Trust Reset Detector. An attacker who triggers injection detection in one session and retries in another will be caught by Condition A or B.
AOC-1.5 (Content Trust Domains + Behavioral Drift) — the DriftDetector fires INJECTION_CONDITIONING_SUSPECTED and BEHAVIORAL_DRIFT_DETECTED events. While AOC-4 does not directly consume these, the Red Team Policy Automation Loop (Sprint RT) generates policy candidates from both drift and trust reset signals.
AOC-3 (Requester Identity) — the requester_id field is central to all three detection conditions. Without AOC-3's header propagation, the detector cannot correlate actions across sessions by requester. Phase 3 delegation tokens provide DELEGATION_SCOPE_PROBE events that feed into the Workflow Baseline Monitor's scope probe pattern detection.

Agentic Security Overview

The problem​

Trust Reset Detector​

Action classification​

Detection conditions​

Condition A — Cross-session reversal (HIGH severity)​

Condition B — Accumulated block override (HIGH severity)​

Condition C — Session cycling (MEDIUM severity)​

Cooldown and alert management​

Persistent storage​

Tuning constants​

Workflow Baseline Monitor​

Baseline construction​

Anomaly detection​

Unexpected participant​

Delegation depth spike​

Tool distribution divergence​

Session duration anomaly​

Event types​

Architecture​

Integration with other AOC modules​