Behavioral Monitor

Behavry's behavioral monitoring layer is composed of four detection systems that run concurrently, each subscribed to the internal async event bus in main.py. Together they cover acute anomalies, cross-session trust manipulation, multi-agent workflow deviations, and gradual conditioning attacks.

Architecture overview

All four systems consume events from the same event bus and write alerts to the same alerts table and SSE stream. They are designed to be complementary: the original monitor catches acute threshold breaches, the trust reset detector catches cross-session manipulation, the workflow baseline monitor catches multi-agent orchestration anomalies, and the drift detector catches slow-moving conditioning patterns that no single-event threshold would flag.

System	Scope	Detection style	Event types produced
Behavioral Monitor	Per-agent	Z-score anomaly vs rolling baseline	`FREQUENCY_SPIKE`, `NEW_RESOURCE_ACCESS`, `ERROR_RATE_ELEVATED`, `DATA_VOLUME_SPIKE`, `BASELINE_DRIFT`
Trust Reset Detector	Per-agent, cross-session	Disposition reversal pattern matching	`BEHAVIOR_REVERSAL`, `REQUESTER_SESSION_CYCLING`
Workflow Baseline Monitor	Per-workflow	EWMA baseline comparison	`WORKFLOW_PARTICIPANT_UNEXPECTED`, `WORKFLOW_DEPTH_SPIKE`, `WORKFLOW_TOOL_DISTRIBUTION_ANOMALY`, `WORKFLOW_DURATION_ANOMALY`
Drift Detector	Per-agent	Linear trend analysis, windowed counting	`INJECTION_CONDITIONING_SUSPECTED`, `BEHAVIORAL_DRIFT_DETECTED`

Behavioral Monitor (original)

The original Behavioral Monitor tracks what each agent does over time, builds per-agent baselines, and fires alerts when behavior deviates from those baselines.

What it tracks

Metric	Description
Tool call frequency	Calls per minute; alerts on spikes vs rolling baseline
Resource access patterns	New paths/servers an agent has never touched before
Error rates	Denial and failure rates vs normal
Data volume	Bytes transferred; DLP-adjacent but behavioral
Action diversity	Agent suddenly doing things far outside its usual scope
Session duration	Unusually long or short sessions

How baselines work

Baselines are computed per-agent over a rolling 7-day window stored in TimescaleDB. Each metric tracks a rolling mean and standard deviation. Anomaly detection uses z-score comparison with a default 2-sigma threshold — the z-score is normalized to a 0.0-1.0 anomaly score, where values at or above 4 sigma map to 1.0.

A minimum of 5 baseline samples must accumulate before anomaly detection engages for a given metric. Before that threshold, no alerts fire (there is insufficient history to distinguish anomalous behavior from normal ramp-up).

Baselines update continuously — a new access pattern becomes "normal" after it occurs enough times within the window.

Risk scoring

Every agent has a Behavry Risk Score (0-100) computed across six dimensions:

Dimension	Weight
Policy denial rate	25%
Anomaly frequency	20%
Data volume	15%
New resource access	15%
Session behavior	15%
Escalation outcomes	10%

The score maps to a risk tier:

Tier	Score	Policy behavior
Low	0-25	Standard enforcement
Medium	26-50	Enhanced logging
High	51-75	Escalate borderline actions
Critical	76-100	Block and suspend

Anomaly severity mapping

The normalized anomaly score (0.0-1.0) maps to platform-wide severity levels:

Score range	Severity
0.7+	Critical
0.5-0.69	High
0.3-0.49	Medium
Below 0.3	Low

Alerts

The original monitor emits alerts for:

FREQUENCY_SPIKE — call rate exceeds baseline by a statistically significant margin
NEW_RESOURCE_ACCESS — agent accesses a server/path it hasn't before
ERROR_RATE_ELEVATED — denial rate exceeds baseline thresholds in a session
DATA_VOLUME_SPIKE — bytes transferred exceeds baseline by a statistically significant margin
BASELINE_DRIFT — sustained behavior shift over 24+ hours

All alerts are visible in the dashboard Alerts tab and stream via SSE.

Trust Reset Detector

Added in Sprint V (AOC-4, March 2026). Source: backend/behavry/monitor/trust_reset_detector.py

The Trust Reset Detector addresses a class of attacks where a requester exploits the fact that AI agents lose memory of prior suspicion when a new session starts. If an agent blocked a requester's action in Session A, the requester can open Session B on a new channel and the agent may comply — effectively resetting the trust relationship.

This attack pattern is documented in Shapira et al., "Agents of Chaos" (arXiv:2602.20021, Feb 2026), specifically Case Study #8 (new channel causes loss of prior suspicion) and Case Study #15 (circular verification / echo-chamber reinforcement).

Action class derivation

Raw actions from tool calls are mapped to five semantic action classes for cross-session comparison:

Action class	Raw actions mapped
read	read, get, list, search, query
write	write, create, update, put, patch
delete	delete, remove
execute	execute, run, call, invoke
send	send, post, publish, message

Unrecognized actions pass through as-is (lowercased).

Detection conditions

The detector evaluates three conditions against a per-agent rolling buffer of the last 500 ActionDispositionEntry records:

Condition A: Opposite disposition in new session (HIGH severity)

Fires BEHAVIOR_REVERSAL when the same agent handles the same action class for the same requester with an opposite disposition (blocked/escalated vs. allowed) in a different session, within a 2-hour look-back window.

Condition B: Accumulated blocks overridden (HIGH severity)

Fires BEHAVIOR_REVERSAL when an agent has accumulated 3 or more blocked/escalated events for a specific requester and action class across prior sessions, but a new session allows the same action. This catches cases where the reversal window for Condition A has passed but the pattern is still suspicious.

Condition C: Session cycling (MEDIUM severity)

Fires REQUESTER_SESSION_CYCLING when the same requester opens 3 or more distinct sessions within 30 minutes, each requesting the same tool, with varying dispositions. This indicates the requester is probing for a session where the agent will comply.

Cooldown and storage

A 5-minute per-agent cooldown prevents alert storms after legitimate policy changes that shift disposition patterns. Only BEHAVIOR_REVERSAL is subject to this cooldown; session cycling alerts fire independently.
In-memory records are persisted to the action_disposition_records TimescaleDB hypertable every 60 seconds via a background flush loop in main.py. This ensures detection state survives process restarts with bounded latency.
The hypertable is indexed on (agent_id, action_class, requester_id) for efficient cross-session queries.

Workflow Baseline Monitor

Added in Sprint V (March 2026). Source: backend/behavry/monitor/workflow_baseline_monitor.py

The Workflow Baseline Monitor tracks behavioral metrics across multi-agent workflow sessions and fires alerts when a live session deviates from established per-workflow baselines. While the original Behavioral Monitor operates at the individual agent level, the Workflow Baseline Monitor operates at the workflow level, comparing the collective behavior of all agents participating in a workflow session.

Per-session tracking

For each active workflow session, the monitor tracks:

Metric	Description
Participants	Set of agent IDs that have participated in the session
Event count	Total tool call, deny, and escalate events
Max delegation depth	Deepest delegation chain observed in the session
Tool call distribution	Per-tool call counts (converted to fractions for comparison)
Scope probe count	Number of `DELEGATION_SCOPE_PROBE` events in the session
Inter-agent call times	Timestamps and agent IDs for latency measurement

EWMA baselines

Baselines are updated on session close using an Exponential Weighted Moving Average with alpha = 0.2. This gives recent sessions more weight while retaining memory of older patterns. Baselines are stored in the workflow_baselines database table and include:

Average session duration
Average participant count
Average events per session
Average delegation depth
Tool distribution (fractional, EWMA-merged)
Inter-agent latency
Recent participants (union of last 5 sessions, capped at 50)
Sample count

Anomaly detection

A minimum of 3 completed sessions (MIN_BASELINE_SAMPLES) is required before anomaly detection engages. Before that threshold, only participant tracking runs.

WORKFLOW_PARTICIPANT_UNEXPECTED (MEDIUM severity)

Fires when an agent ID participates in a workflow session but is not in the recent participant set derived from the baseline. This can indicate unauthorized agent injection into an established workflow.

WORKFLOW_DEPTH_SPIKE (MEDIUM severity)

Fires when the observed delegation depth exceeds max(2x average depth, average depth + 2). A depth spike can indicate an unauthorized sub-agent spawn or a prompt injection that causes excessive delegation.

WORKFLOW_TOOL_DISTRIBUTION_ANOMALY (MEDIUM severity)

Fires on session close when the Bray-Curtis divergence between the session's tool usage distribution and the baseline distribution exceeds 0.5. Bray-Curtis dissimilarity ranges from 0.0 (identical) to 1.0 (completely different) and handles missing tools gracefully. This also fires when 3 or more DELEGATION_SCOPE_PROBE events accumulate in a single session, which is treated as a prompt injection signal attempting to enumerate agent permissions.

WORKFLOW_DURATION_ANOMALY (MEDIUM severity)

Fires on session close when the session duration exceeds 3x the baseline average. Anomalously long sessions can indicate a stalled delegation chain or an attacker keeping a session alive to exfiltrate data.

Drift Detector

Added in Sprint AOC-1.5 (March 2026). Source: backend/behavry/monitor/drift_detector.py

The Drift Detector catches gradual behavioral shifts that are invisible to threshold-based anomaly detection. Where the original Behavioral Monitor handles acute spikes, the Drift Detector handles slow-moving patterns where each individual step is within normal bounds but the trend over time is malicious.

Injection conditioning detection

INJECTION_CONDITIONING_SUSPECTED (HIGH severity)

Fires when an agent receives 4 or more inbound injection findings (of any severity) within a 15-minute window. This matches the "boiling frog" attack pattern where an attacker who controls multiple external resources embeds low-severity injection patterns across several fetches, gradually shifting the agent's context without triggering any single critical alert.

The detector subscribes to INBOUND_INJECTION_DETECTED and INBOUND_INJECTION_BLOCKED events and maintains a per-agent sliding window of finding timestamps (bounded at 100 entries).

Behavioral drift detection

BEHAVIORAL_DRIFT_DETECTED (MEDIUM severity)

Fires when the agent's risk score shows a sustained upward linear trend over the last 10 tool call events. The detector computes a simple linear regression slope on the risk score history; if the slope exceeds 0.05 (per sample, on a 0-1 normalized scale), the alert fires.

This catches an attacker who can incrementally escalate privilege where each individual step is acceptable on its own but the cumulative trajectory is not. Threshold-based detection would not fire because no single event crosses the threshold.

Cooldown

Both alert types respect a 30-minute per-agent, per-alert-type cooldown to avoid alert storms when a legitimate policy change shifts the risk baseline. The cooldown is tracked independently for injection_conditioning and behavioral_drift.

Tuning constants

Constant	Default	Description
`CONDITIONING_THRESHOLD`	4	Findings in window to trigger conditioning alert
`CONDITIONING_WINDOW_MINUTES`	15	Sliding window for finding count
`DRIFT_MIN_SAMPLES`	10	Minimum risk score samples before slope analysis
`DRIFT_SLOPE_THRESHOLD`	0.05	Minimum upward slope to trigger drift alert
`ALERT_COOLDOWN_MINUTES`	30	Per-agent, per-alert-type cooldown

Combined alert types

Across all four monitoring systems, Behavry produces the following alert and event types:

Event type	Source	Severity	Trigger
`FREQUENCY_SPIKE`	Behavioral Monitor	Varies	Call rate anomaly vs baseline
`NEW_RESOURCE_ACCESS`	Behavioral Monitor	Medium	First-time resource access
`ERROR_RATE_ELEVATED`	Behavioral Monitor	Varies	Denial rate anomaly vs baseline
`DATA_VOLUME_SPIKE`	Behavioral Monitor	Varies	Transfer volume anomaly vs baseline
`BASELINE_DRIFT`	Behavioral Monitor	Medium	Sustained 24h+ behavior shift
`BEHAVIOR_REVERSAL`	Trust Reset Detector	High	Cross-session disposition flip (Conditions A/B)
`REQUESTER_SESSION_CYCLING`	Trust Reset Detector	Medium	3+ sessions in 30 min with varying dispositions
`WORKFLOW_PARTICIPANT_UNEXPECTED`	Workflow Baseline Monitor	Medium	Unknown agent in workflow
`WORKFLOW_DEPTH_SPIKE`	Workflow Baseline Monitor	Medium	Delegation depth exceeds 2x baseline
`WORKFLOW_TOOL_DISTRIBUTION_ANOMALY`	Workflow Baseline Monitor	Medium	Tool usage diverged from baseline
`WORKFLOW_DURATION_ANOMALY`	Workflow Baseline Monitor	Medium	Session duration exceeds 3x baseline
`INJECTION_CONDITIONING_SUSPECTED`	Drift Detector	High	4+ injection findings in 15 min
`BEHAVIORAL_DRIFT_DETECTED`	Drift Detector	Medium	Sustained upward risk score trend

All alerts are visible in the dashboard Alerts tab and stream to connected clients via SSE in real time.

Alerts & Escalations

Architecture overview​

Behavioral Monitor (original)​

What it tracks​

How baselines work​

Risk scoring​

Anomaly severity mapping​

Alerts​

Trust Reset Detector​

Action class derivation​

Detection conditions​

Cooldown and storage​

Workflow Baseline Monitor​

Per-session tracking​

EWMA baselines​

Anomaly detection​

Drift Detector​

Injection conditioning detection​

Behavioral drift detection​

Cooldown​

Tuning constants​

Combined alert types​