Adversarial Red Team Swarm (ARS)

Feature rows 20 + 21 — Sprints ARS / ARS.1

ARS is the internal lab that runs continuously on demo.behavry.ai. Its findings feed the Red Team → Policy Automation Loop.

What this is

Adversarial Red Team Swarm is Behavry's continuous internal red team. It runs 25 evasion techniques across 7 categories against a live instance of the product and scores how often each one slips through detection. The sprint ARS.1 extension adds an Adaptive Adversarial Evolution loop — a mutation engine that takes techniques ARS has failed on and generates variants until one works, then feeds the new variant back into the corpus.

The goal is to keep the product's detection surface ahead of what real attackers will try. Every week of ARS runs is a dataset of known-effective attacks we can use to test policy changes, regression-test detection, and publish into the community policy library as attack fingerprints.

The 25 techniques, grouped

Category	Techniques
Imperative override	Direct ignore, delayed ignore, role reset, step replacement
Context smuggling	Hidden tool envelopes, Markdown poisoning, code-block poisoning
Trust exploitation	Trusted-source spoofing, domain-tag forgery
Exfiltration	URL param leak, image-src leak, DNS rebinding, Markdown-image leak
Tool hijack	Fake tool name, fake tool args, schema confusion
Behavioral drift	Gradual imperative, style conditioning, persona drift
Credential solicitation	Direct key ask, debug ruse, environment dump

Each technique is a Python class under backend/behavry/adversarial/ with:

A probe() method that generates one attack payload
A success_criteria() method that decides, given the agent's response and the audit trail, whether the attack succeeded
Metadata: category, difficulty, references

How a run works

An APScheduler job (backend/behavry/adversarial/service.py) fires every 15 minutes:

Pick targets — choose an agent persona from the demo fleet; pick one technique from the corpus
Execute — run the probe against the live proxy as if it were a real request
Score — compare the response against success_criteria; record the outcome
Record — write ars.probe and ars.result events with full payloads
Evolve (ARS.1) — on failure, pass the technique + the detection signature to the mutation engine, which returns a variant; queue the variant for the next run

Adaptive evolution loop

The evolution engine (sprint ARS.1) turns ARS from a static test suite into an autonomous adversary:

Input: a failed technique + the detection signature that caught it (rule IDs, severity, matched span)
Transformation: language model generates a semantically equivalent variant that avoids the flagged surface — synonym substitution, encoding changes, structural rearrangement, added noise
Output: a new probe with a parent pointer back to the original technique

Evolved probes are tagged evolution_generation > 0 so you can always tell a synthesized variant from a hand-written one. Successful evolved probes are promoted into the canonical corpus; unsuccessful ones are retained for trend analysis.

Dashboard

demo.behavry.ai exposes an ARS dashboard at Security → ARS with:

Per-technique success rate over time
"New effective techniques" feed — variants the evolution engine has found that now work
Coverage matrix — techniques × categories × agent personas
"Policies this would have caught" — a dry-run against the tenant's current policy set

Where results go

Audit log — every probe + result, normal audit_events schema
Community library — high-signal attacks publish as DLP patterns / injection patterns / policy templates
Red Team → Policy Automation Loop — auto-generated candidate policies that would have blocked the successful attacks, pending review by policy_author

Inbound Injection Scanner — primary defense ARS targets
Intent Drift Detection — another ARS target (behavioral-drift techniques)
Red Team → Policy Automation Loop — closes the loop from finding to policy
Community Library — where successful attack fingerprints are published

What this is​

The 25 techniques, grouped​

How a run works​

Adaptive evolution loop​

Dashboard​

Where results go​

Related​