Skip to main content

Behavioral Monitor

Behavry's behavioral monitoring layer is composed of four detection systems that run concurrently, each subscribed to the internal async event bus in main.py. Together they cover acute anomalies, cross-session trust manipulation, multi-agent workflow deviations, and gradual conditioning attacks.

Architecture overview

All four systems consume events from the same event bus and write alerts to the same alerts table and SSE stream. They are designed to be complementary: the original monitor catches acute threshold breaches, the trust reset detector catches cross-session manipulation, the workflow baseline monitor catches multi-agent orchestration anomalies, and the drift detector catches slow-moving conditioning patterns that no single-event threshold would flag.

SystemScopeDetection styleEvent types produced
Behavioral MonitorPer-agentZ-score anomaly vs rolling baselineFREQUENCY_SPIKE, NEW_RESOURCE_ACCESS, ERROR_RATE_ELEVATED, DATA_VOLUME_SPIKE, BASELINE_DRIFT
Trust Reset DetectorPer-agent, cross-sessionDisposition reversal pattern matchingBEHAVIOR_REVERSAL, REQUESTER_SESSION_CYCLING
Workflow Baseline MonitorPer-workflowEWMA baseline comparisonWORKFLOW_PARTICIPANT_UNEXPECTED, WORKFLOW_DEPTH_SPIKE, WORKFLOW_TOOL_DISTRIBUTION_ANOMALY, WORKFLOW_DURATION_ANOMALY
Drift DetectorPer-agentLinear trend analysis, windowed countingINJECTION_CONDITIONING_SUSPECTED, BEHAVIORAL_DRIFT_DETECTED

Behavioral Monitor (original)

The original Behavioral Monitor tracks what each agent does over time, builds per-agent baselines, and fires alerts when behavior deviates from those baselines.

What it tracks

MetricDescription
Tool call frequencyCalls per minute; alerts on spikes vs rolling baseline
Resource access patternsNew paths/servers an agent has never touched before
Error ratesDenial and failure rates vs normal
Data volumeBytes transferred; DLP-adjacent but behavioral
Action diversityAgent suddenly doing things far outside its usual scope
Session durationUnusually long or short sessions

How baselines work

Baselines are computed per-agent over a rolling 7-day window stored in TimescaleDB. Each metric tracks a rolling mean and standard deviation. Anomaly detection uses z-score comparison with a default 2-sigma threshold — the z-score is normalized to a 0.0-1.0 anomaly score, where values at or above 4 sigma map to 1.0.

A minimum of 5 baseline samples must accumulate before anomaly detection engages for a given metric. Before that threshold, no alerts fire (there is insufficient history to distinguish anomalous behavior from normal ramp-up).

Baselines update continuously — a new access pattern becomes "normal" after it occurs enough times within the window.

Risk scoring

Every agent has a Behavry Risk Score (0-100) computed across six dimensions:

DimensionWeight
Policy denial rate25%
Anomaly frequency20%
Data volume15%
New resource access15%
Session behavior15%
Escalation outcomes10%

The score maps to a risk tier:

TierScorePolicy behavior
Low0-25Standard enforcement
Medium26-50Enhanced logging
High51-75Escalate borderline actions
Critical76-100Block and suspend

Anomaly severity mapping

The normalized anomaly score (0.0-1.0) maps to platform-wide severity levels:

Score rangeSeverity
0.7+Critical
0.5-0.69High
0.3-0.49Medium
Below 0.3Low

Alerts

The original monitor emits alerts for:

  • FREQUENCY_SPIKE — call rate exceeds baseline by a statistically significant margin
  • NEW_RESOURCE_ACCESS — agent accesses a server/path it hasn't before
  • ERROR_RATE_ELEVATED — denial rate exceeds baseline thresholds in a session
  • DATA_VOLUME_SPIKE — bytes transferred exceeds baseline by a statistically significant margin
  • BASELINE_DRIFT — sustained behavior shift over 24+ hours

All alerts are visible in the dashboard Alerts tab and stream via SSE.


Trust Reset Detector

Added in Sprint V (AOC-4, March 2026). Source: backend/behavry/monitor/trust_reset_detector.py

The Trust Reset Detector addresses a class of attacks where a requester exploits the fact that AI agents lose memory of prior suspicion when a new session starts. If an agent blocked a requester's action in Session A, the requester can open Session B on a new channel and the agent may comply — effectively resetting the trust relationship.

This attack pattern is documented in Shapira et al., "Agents of Chaos" (arXiv:2602.20021, Feb 2026), specifically Case Study #8 (new channel causes loss of prior suspicion) and Case Study #15 (circular verification / echo-chamber reinforcement).

Action class derivation

Raw actions from tool calls are mapped to five semantic action classes for cross-session comparison:

Action classRaw actions mapped
readread, get, list, search, query
writewrite, create, update, put, patch
deletedelete, remove
executeexecute, run, call, invoke
sendsend, post, publish, message

Unrecognized actions pass through as-is (lowercased).

Detection conditions

The detector evaluates three conditions against a per-agent rolling buffer of the last 500 ActionDispositionEntry records:

Condition A: Opposite disposition in new session (HIGH severity)

Fires BEHAVIOR_REVERSAL when the same agent handles the same action class for the same requester with an opposite disposition (blocked/escalated vs. allowed) in a different session, within a 2-hour look-back window.

Condition B: Accumulated blocks overridden (HIGH severity)

Fires BEHAVIOR_REVERSAL when an agent has accumulated 3 or more blocked/escalated events for a specific requester and action class across prior sessions, but a new session allows the same action. This catches cases where the reversal window for Condition A has passed but the pattern is still suspicious.

Condition C: Session cycling (MEDIUM severity)

Fires REQUESTER_SESSION_CYCLING when the same requester opens 3 or more distinct sessions within 30 minutes, each requesting the same tool, with varying dispositions. This indicates the requester is probing for a session where the agent will comply.

Cooldown and storage

  • A 5-minute per-agent cooldown prevents alert storms after legitimate policy changes that shift disposition patterns. Only BEHAVIOR_REVERSAL is subject to this cooldown; session cycling alerts fire independently.
  • In-memory records are persisted to the action_disposition_records TimescaleDB hypertable every 60 seconds via a background flush loop in main.py. This ensures detection state survives process restarts with bounded latency.
  • The hypertable is indexed on (agent_id, action_class, requester_id) for efficient cross-session queries.

Workflow Baseline Monitor

Added in Sprint V (March 2026). Source: backend/behavry/monitor/workflow_baseline_monitor.py

The Workflow Baseline Monitor tracks behavioral metrics across multi-agent workflow sessions and fires alerts when a live session deviates from established per-workflow baselines. While the original Behavioral Monitor operates at the individual agent level, the Workflow Baseline Monitor operates at the workflow level, comparing the collective behavior of all agents participating in a workflow session.

Per-session tracking

For each active workflow session, the monitor tracks:

MetricDescription
ParticipantsSet of agent IDs that have participated in the session
Event countTotal tool call, deny, and escalate events
Max delegation depthDeepest delegation chain observed in the session
Tool call distributionPer-tool call counts (converted to fractions for comparison)
Scope probe countNumber of DELEGATION_SCOPE_PROBE events in the session
Inter-agent call timesTimestamps and agent IDs for latency measurement

EWMA baselines

Baselines are updated on session close using an Exponential Weighted Moving Average with alpha = 0.2. This gives recent sessions more weight while retaining memory of older patterns. Baselines are stored in the workflow_baselines database table and include:

  • Average session duration
  • Average participant count
  • Average events per session
  • Average delegation depth
  • Tool distribution (fractional, EWMA-merged)
  • Inter-agent latency
  • Recent participants (union of last 5 sessions, capped at 50)
  • Sample count

Anomaly detection

A minimum of 3 completed sessions (MIN_BASELINE_SAMPLES) is required before anomaly detection engages. Before that threshold, only participant tracking runs.

WORKFLOW_PARTICIPANT_UNEXPECTED (MEDIUM severity)

Fires when an agent ID participates in a workflow session but is not in the recent participant set derived from the baseline. This can indicate unauthorized agent injection into an established workflow.

WORKFLOW_DEPTH_SPIKE (MEDIUM severity)

Fires when the observed delegation depth exceeds max(2x average depth, average depth + 2). A depth spike can indicate an unauthorized sub-agent spawn or a prompt injection that causes excessive delegation.

WORKFLOW_TOOL_DISTRIBUTION_ANOMALY (MEDIUM severity)

Fires on session close when the Bray-Curtis divergence between the session's tool usage distribution and the baseline distribution exceeds 0.5. Bray-Curtis dissimilarity ranges from 0.0 (identical) to 1.0 (completely different) and handles missing tools gracefully. This also fires when 3 or more DELEGATION_SCOPE_PROBE events accumulate in a single session, which is treated as a prompt injection signal attempting to enumerate agent permissions.

WORKFLOW_DURATION_ANOMALY (MEDIUM severity)

Fires on session close when the session duration exceeds 3x the baseline average. Anomalously long sessions can indicate a stalled delegation chain or an attacker keeping a session alive to exfiltrate data.


Drift Detector

Added in Sprint AOC-1.5 (March 2026). Source: backend/behavry/monitor/drift_detector.py

The Drift Detector catches gradual behavioral shifts that are invisible to threshold-based anomaly detection. Where the original Behavioral Monitor handles acute spikes, the Drift Detector handles slow-moving patterns where each individual step is within normal bounds but the trend over time is malicious.

Injection conditioning detection

INJECTION_CONDITIONING_SUSPECTED (HIGH severity)

Fires when an agent receives 4 or more inbound injection findings (of any severity) within a 15-minute window. This matches the "boiling frog" attack pattern where an attacker who controls multiple external resources embeds low-severity injection patterns across several fetches, gradually shifting the agent's context without triggering any single critical alert.

The detector subscribes to INBOUND_INJECTION_DETECTED and INBOUND_INJECTION_BLOCKED events and maintains a per-agent sliding window of finding timestamps (bounded at 100 entries).

Behavioral drift detection

BEHAVIORAL_DRIFT_DETECTED (MEDIUM severity)

Fires when the agent's risk score shows a sustained upward linear trend over the last 10 tool call events. The detector computes a simple linear regression slope on the risk score history; if the slope exceeds 0.05 (per sample, on a 0-1 normalized scale), the alert fires.

This catches an attacker who can incrementally escalate privilege where each individual step is acceptable on its own but the cumulative trajectory is not. Threshold-based detection would not fire because no single event crosses the threshold.

Cooldown

Both alert types respect a 30-minute per-agent, per-alert-type cooldown to avoid alert storms when a legitimate policy change shifts the risk baseline. The cooldown is tracked independently for injection_conditioning and behavioral_drift.

Tuning constants

ConstantDefaultDescription
CONDITIONING_THRESHOLD4Findings in window to trigger conditioning alert
CONDITIONING_WINDOW_MINUTES15Sliding window for finding count
DRIFT_MIN_SAMPLES10Minimum risk score samples before slope analysis
DRIFT_SLOPE_THRESHOLD0.05Minimum upward slope to trigger drift alert
ALERT_COOLDOWN_MINUTES30Per-agent, per-alert-type cooldown

Combined alert types

Across all four monitoring systems, Behavry produces the following alert and event types:

Event typeSourceSeverityTrigger
FREQUENCY_SPIKEBehavioral MonitorVariesCall rate anomaly vs baseline
NEW_RESOURCE_ACCESSBehavioral MonitorMediumFirst-time resource access
ERROR_RATE_ELEVATEDBehavioral MonitorVariesDenial rate anomaly vs baseline
DATA_VOLUME_SPIKEBehavioral MonitorVariesTransfer volume anomaly vs baseline
BASELINE_DRIFTBehavioral MonitorMediumSustained 24h+ behavior shift
BEHAVIOR_REVERSALTrust Reset DetectorHighCross-session disposition flip (Conditions A/B)
REQUESTER_SESSION_CYCLINGTrust Reset DetectorMedium3+ sessions in 30 min with varying dispositions
WORKFLOW_PARTICIPANT_UNEXPECTEDWorkflow Baseline MonitorMediumUnknown agent in workflow
WORKFLOW_DEPTH_SPIKEWorkflow Baseline MonitorMediumDelegation depth exceeds 2x baseline
WORKFLOW_TOOL_DISTRIBUTION_ANOMALYWorkflow Baseline MonitorMediumTool usage diverged from baseline
WORKFLOW_DURATION_ANOMALYWorkflow Baseline MonitorMediumSession duration exceeds 3x baseline
INJECTION_CONDITIONING_SUSPECTEDDrift DetectorHigh4+ injection findings in 15 min
BEHAVIORAL_DRIFT_DETECTEDDrift DetectorMediumSustained upward risk score trend

All alerts are visible in the dashboard Alerts tab and stream to connected clients via SSE in real time.

Alerts & Escalations