Behavioral Monitor
Behavry's behavioral monitoring layer is composed of four detection systems that run concurrently, each subscribed to the internal async event bus in main.py. Together they cover acute anomalies, cross-session trust manipulation, multi-agent workflow deviations, and gradual conditioning attacks.
Architecture overview
All four systems consume events from the same event bus and write alerts to the same alerts table and SSE stream. They are designed to be complementary: the original monitor catches acute threshold breaches, the trust reset detector catches cross-session manipulation, the workflow baseline monitor catches multi-agent orchestration anomalies, and the drift detector catches slow-moving conditioning patterns that no single-event threshold would flag.
| System | Scope | Detection style | Event types produced |
|---|---|---|---|
| Behavioral Monitor | Per-agent | Z-score anomaly vs rolling baseline | FREQUENCY_SPIKE, NEW_RESOURCE_ACCESS, ERROR_RATE_ELEVATED, DATA_VOLUME_SPIKE, BASELINE_DRIFT |
| Trust Reset Detector | Per-agent, cross-session | Disposition reversal pattern matching | BEHAVIOR_REVERSAL, REQUESTER_SESSION_CYCLING |
| Workflow Baseline Monitor | Per-workflow | EWMA baseline comparison | WORKFLOW_PARTICIPANT_UNEXPECTED, WORKFLOW_DEPTH_SPIKE, WORKFLOW_TOOL_DISTRIBUTION_ANOMALY, WORKFLOW_DURATION_ANOMALY |
| Drift Detector | Per-agent | Linear trend analysis, windowed counting | INJECTION_CONDITIONING_SUSPECTED, BEHAVIORAL_DRIFT_DETECTED |
Behavioral Monitor (original)
The original Behavioral Monitor tracks what each agent does over time, builds per-agent baselines, and fires alerts when behavior deviates from those baselines.
What it tracks
| Metric | Description |
|---|---|
| Tool call frequency | Calls per minute; alerts on spikes vs rolling baseline |
| Resource access patterns | New paths/servers an agent has never touched before |
| Error rates | Denial and failure rates vs normal |
| Data volume | Bytes transferred; DLP-adjacent but behavioral |
| Action diversity | Agent suddenly doing things far outside its usual scope |
| Session duration | Unusually long or short sessions |
How baselines work
Baselines are computed per-agent over a rolling 7-day window stored in TimescaleDB. Each metric tracks a rolling mean and standard deviation. Anomaly detection uses z-score comparison with a default 2-sigma threshold — the z-score is normalized to a 0.0-1.0 anomaly score, where values at or above 4 sigma map to 1.0.
A minimum of 5 baseline samples must accumulate before anomaly detection engages for a given metric. Before that threshold, no alerts fire (there is insufficient history to distinguish anomalous behavior from normal ramp-up).
Baselines update continuously — a new access pattern becomes "normal" after it occurs enough times within the window.
Risk scoring
Every agent has a Behavry Risk Score (0-100) computed across six dimensions:
| Dimension | Weight |
|---|---|
| Policy denial rate | 25% |
| Anomaly frequency | 20% |
| Data volume | 15% |
| New resource access | 15% |
| Session behavior | 15% |
| Escalation outcomes | 10% |
The score maps to a risk tier:
| Tier | Score | Policy behavior |
|---|---|---|
| Low | 0-25 | Standard enforcement |
| Medium | 26-50 | Enhanced logging |
| High | 51-75 | Escalate borderline actions |
| Critical | 76-100 | Block and suspend |
Anomaly severity mapping
The normalized anomaly score (0.0-1.0) maps to platform-wide severity levels:
| Score range | Severity |
|---|---|
| 0.7+ | Critical |
| 0.5-0.69 | High |
| 0.3-0.49 | Medium |
| Below 0.3 | Low |
Alerts
The original monitor emits alerts for:
FREQUENCY_SPIKE— call rate exceeds baseline by a statistically significant marginNEW_RESOURCE_ACCESS— agent accesses a server/path it hasn't beforeERROR_RATE_ELEVATED— denial rate exceeds baseline thresholds in a sessionDATA_VOLUME_SPIKE— bytes transferred exceeds baseline by a statistically significant marginBASELINE_DRIFT— sustained behavior shift over 24+ hours
All alerts are visible in the dashboard Alerts tab and stream via SSE.
Trust Reset Detector
Added in Sprint V (AOC-4, March 2026). Source: backend/behavry/monitor/trust_reset_detector.py
The Trust Reset Detector addresses a class of attacks where a requester exploits the fact that AI agents lose memory of prior suspicion when a new session starts. If an agent blocked a requester's action in Session A, the requester can open Session B on a new channel and the agent may comply — effectively resetting the trust relationship.
This attack pattern is documented in Shapira et al., "Agents of Chaos" (arXiv:2602.20021, Feb 2026), specifically Case Study #8 (new channel causes loss of prior suspicion) and Case Study #15 (circular verification / echo-chamber reinforcement).
Action class derivation
Raw actions from tool calls are mapped to five semantic action classes for cross-session comparison:
| Action class | Raw actions mapped |
|---|---|
| read | read, get, list, search, query |
| write | write, create, update, put, patch |
| delete | delete, remove |
| execute | execute, run, call, invoke |
| send | send, post, publish, message |
Unrecognized actions pass through as-is (lowercased).
Detection conditions
The detector evaluates three conditions against a per-agent rolling buffer of the last 500 ActionDispositionEntry records:
Condition A: Opposite disposition in new session (HIGH severity)
Fires BEHAVIOR_REVERSAL when the same agent handles the same action class for the same requester with an opposite disposition (blocked/escalated vs. allowed) in a different session, within a 2-hour look-back window.
Condition B: Accumulated blocks overridden (HIGH severity)
Fires BEHAVIOR_REVERSAL when an agent has accumulated 3 or more blocked/escalated events for a specific requester and action class across prior sessions, but a new session allows the same action. This catches cases where the reversal window for Condition A has passed but the pattern is still suspicious.
Condition C: Session cycling (MEDIUM severity)
Fires REQUESTER_SESSION_CYCLING when the same requester opens 3 or more distinct sessions within 30 minutes, each requesting the same tool, with varying dispositions. This indicates the requester is probing for a session where the agent will comply.
Cooldown and storage
- A 5-minute per-agent cooldown prevents alert storms after legitimate policy changes that shift disposition patterns. Only
BEHAVIOR_REVERSALis subject to this cooldown; session cycling alerts fire independently. - In-memory records are persisted to the
action_disposition_recordsTimescaleDB hypertable every 60 seconds via a background flush loop inmain.py. This ensures detection state survives process restarts with bounded latency. - The hypertable is indexed on
(agent_id, action_class, requester_id)for efficient cross-session queries.
Workflow Baseline Monitor
Added in Sprint V (March 2026). Source: backend/behavry/monitor/workflow_baseline_monitor.py
The Workflow Baseline Monitor tracks behavioral metrics across multi-agent workflow sessions and fires alerts when a live session deviates from established per-workflow baselines. While the original Behavioral Monitor operates at the individual agent level, the Workflow Baseline Monitor operates at the workflow level, comparing the collective behavior of all agents participating in a workflow session.
Per-session tracking
For each active workflow session, the monitor tracks:
| Metric | Description |
|---|---|
| Participants | Set of agent IDs that have participated in the session |
| Event count | Total tool call, deny, and escalate events |
| Max delegation depth | Deepest delegation chain observed in the session |
| Tool call distribution | Per-tool call counts (converted to fractions for comparison) |
| Scope probe count | Number of DELEGATION_SCOPE_PROBE events in the session |
| Inter-agent call times | Timestamps and agent IDs for latency measurement |
EWMA baselines
Baselines are updated on session close using an Exponential Weighted Moving Average with alpha = 0.2. This gives recent sessions more weight while retaining memory of older patterns. Baselines are stored in the workflow_baselines database table and include:
- Average session duration
- Average participant count
- Average events per session
- Average delegation depth
- Tool distribution (fractional, EWMA-merged)
- Inter-agent latency
- Recent participants (union of last 5 sessions, capped at 50)
- Sample count
Anomaly detection
A minimum of 3 completed sessions (MIN_BASELINE_SAMPLES) is required before anomaly detection engages. Before that threshold, only participant tracking runs.
WORKFLOW_PARTICIPANT_UNEXPECTED (MEDIUM severity)
Fires when an agent ID participates in a workflow session but is not in the recent participant set derived from the baseline. This can indicate unauthorized agent injection into an established workflow.
WORKFLOW_DEPTH_SPIKE (MEDIUM severity)
Fires when the observed delegation depth exceeds max(2x average depth, average depth + 2). A depth spike can indicate an unauthorized sub-agent spawn or a prompt injection that causes excessive delegation.
WORKFLOW_TOOL_DISTRIBUTION_ANOMALY (MEDIUM severity)
Fires on session close when the Bray-Curtis divergence between the session's tool usage distribution and the baseline distribution exceeds 0.5. Bray-Curtis dissimilarity ranges from 0.0 (identical) to 1.0 (completely different) and handles missing tools gracefully. This also fires when 3 or more DELEGATION_SCOPE_PROBE events accumulate in a single session, which is treated as a prompt injection signal attempting to enumerate agent permissions.
WORKFLOW_DURATION_ANOMALY (MEDIUM severity)
Fires on session close when the session duration exceeds 3x the baseline average. Anomalously long sessions can indicate a stalled delegation chain or an attacker keeping a session alive to exfiltrate data.
Drift Detector
Added in Sprint AOC-1.5 (March 2026). Source: backend/behavry/monitor/drift_detector.py
The Drift Detector catches gradual behavioral shifts that are invisible to threshold-based anomaly detection. Where the original Behavioral Monitor handles acute spikes, the Drift Detector handles slow-moving patterns where each individual step is within normal bounds but the trend over time is malicious.
Injection conditioning detection
INJECTION_CONDITIONING_SUSPECTED (HIGH severity)
Fires when an agent receives 4 or more inbound injection findings (of any severity) within a 15-minute window. This matches the "boiling frog" attack pattern where an attacker who controls multiple external resources embeds low-severity injection patterns across several fetches, gradually shifting the agent's context without triggering any single critical alert.
The detector subscribes to INBOUND_INJECTION_DETECTED and INBOUND_INJECTION_BLOCKED events and maintains a per-agent sliding window of finding timestamps (bounded at 100 entries).
Behavioral drift detection
BEHAVIORAL_DRIFT_DETECTED (MEDIUM severity)
Fires when the agent's risk score shows a sustained upward linear trend over the last 10 tool call events. The detector computes a simple linear regression slope on the risk score history; if the slope exceeds 0.05 (per sample, on a 0-1 normalized scale), the alert fires.
This catches an attacker who can incrementally escalate privilege where each individual step is acceptable on its own but the cumulative trajectory is not. Threshold-based detection would not fire because no single event crosses the threshold.
Cooldown
Both alert types respect a 30-minute per-agent, per-alert-type cooldown to avoid alert storms when a legitimate policy change shifts the risk baseline. The cooldown is tracked independently for injection_conditioning and behavioral_drift.
Tuning constants
| Constant | Default | Description |
|---|---|---|
CONDITIONING_THRESHOLD | 4 | Findings in window to trigger conditioning alert |
CONDITIONING_WINDOW_MINUTES | 15 | Sliding window for finding count |
DRIFT_MIN_SAMPLES | 10 | Minimum risk score samples before slope analysis |
DRIFT_SLOPE_THRESHOLD | 0.05 | Minimum upward slope to trigger drift alert |
ALERT_COOLDOWN_MINUTES | 30 | Per-agent, per-alert-type cooldown |
Combined alert types
Across all four monitoring systems, Behavry produces the following alert and event types:
| Event type | Source | Severity | Trigger |
|---|---|---|---|
FREQUENCY_SPIKE | Behavioral Monitor | Varies | Call rate anomaly vs baseline |
NEW_RESOURCE_ACCESS | Behavioral Monitor | Medium | First-time resource access |
ERROR_RATE_ELEVATED | Behavioral Monitor | Varies | Denial rate anomaly vs baseline |
DATA_VOLUME_SPIKE | Behavioral Monitor | Varies | Transfer volume anomaly vs baseline |
BASELINE_DRIFT | Behavioral Monitor | Medium | Sustained 24h+ behavior shift |
BEHAVIOR_REVERSAL | Trust Reset Detector | High | Cross-session disposition flip (Conditions A/B) |
REQUESTER_SESSION_CYCLING | Trust Reset Detector | Medium | 3+ sessions in 30 min with varying dispositions |
WORKFLOW_PARTICIPANT_UNEXPECTED | Workflow Baseline Monitor | Medium | Unknown agent in workflow |
WORKFLOW_DEPTH_SPIKE | Workflow Baseline Monitor | Medium | Delegation depth exceeds 2x baseline |
WORKFLOW_TOOL_DISTRIBUTION_ANOMALY | Workflow Baseline Monitor | Medium | Tool usage diverged from baseline |
WORKFLOW_DURATION_ANOMALY | Workflow Baseline Monitor | Medium | Session duration exceeds 3x baseline |
INJECTION_CONDITIONING_SUSPECTED | Drift Detector | High | 4+ injection findings in 15 min |
BEHAVIORAL_DRIFT_DETECTED | Drift Detector | Medium | Sustained upward risk score trend |
All alerts are visible in the dashboard Alerts tab and stream to connected clients via SSE in real time.