Data Protection Pipeline
Data Protection is included on the Enterprise plan. Phase 1 (classify / redact) ships today; Phase 2 (full KMS-backed encryption, additional providers) is in progress.
What this is
By default, Behavry stores metadata about every tool call — who did it, what tool, against what target, what policy decision — but does not store the payload itself. That's the right default for most tenants: less data at rest, less to leak, faster audit log.
Some tenants want different defaults. The Data Protection Pipeline lets you choose, per tenant (or per agent class), how payloads are handled:
| Mode | Payload at rest |
|---|---|
full | Complete payload stored in audit log |
metadata_only | Default — only metadata, no payload |
redacted | Payload stored with DLP-matched fields replaced by [redacted] placeholders |
encrypted | Full payload stored, encrypted at rest with a tenant-held key (KMS-backed) |
The four stages
The pipeline (backend/behavry/proxy/dp_pipeline.py) runs four stages in order for every in-flight payload:
1. Classify
Detect data categories in the payload:
- Run the DLP scanner to tag segments (
pii,pci:*,hipaa:phi,gdpr:*, etc.) - Run the injection scanner to tag adversarial content
- Record the tag set in the audit event's
dlp_findings
2. Redact
If the mode is redacted:
- Replace matched segments with
[redacted:{category}]placeholders - Preserve length where possible so downstream consumers don't break
- Keep a redaction map in memory for the encrypt stage
3. Dispose
Decide what to do with the redacted-or-not payload:
metadata_only→ drop the payload entirely before writefull→ keep it as-isredacted→ keep the redacted copyencrypted→ hand off to stage 4
4. Encrypt
For encrypted mode:
- Request an envelope-encrypted payload from the KMS client (
backend/behavry/proxy/kms_client.py) - The key is tenant-scoped — Behavry never holds plaintext for tenants in
encryptedmode - Write the ciphertext + key reference to the audit row
- Decryption requires the same KMS key and is audited
KMS providers
Today the pipeline supports:
- Local (dev only) — symmetric AES-256 with a 32-byte key from
BEHAVRY_LOCAL_ENCRYPTION_KEY - AWS KMS — production-ready, uses
kms:Encrypt/kms:Decryptwith a Customer Master Key (CMK)
Azure Key Vault and GCP Cloud KMS are scoped for a follow-up phase — this is the "Partial" in the Sprint DP status.
Configuration
Settings → Limits → Data protection.
- Mode — the four options above
- Override by agent class — e.g. healthcare-agents →
redacted, research-agents →metadata_only - KMS provider — local (dev) / AWS KMS (production)
- Key alias — the CMK alias to use (
alias/behavry-{tenant-slug}by default) - Retention override — data-protection mode can set a shorter retention window than the tenant default
Costs
Encrypted mode carries a KMS request cost (one envelope-encryption call per audit write) and a small latency overhead (tens of milliseconds). Redacted mode has no runtime cost beyond the DLP scan that already happens. Full mode has no runtime cost but the largest at-rest data size.
Migration path
Tenants typically start with the default (metadata_only) and move to redacted or encrypted as part of a compliance roll-out (HIPAA, PCI, GDPR). Switching modes takes effect on the next audit write; existing events are left in their original form.
Related
- DLP Scanner
- Environment variables —
BEHAVRY_LOCAL_ENCRYPTION_KEY, AWS KMS vars - Compliance overview