Skip to main content

SIEM Integration

Behavry ships with native SIEM integration that forwards audit events from the proxy pipeline to your security operations stack. Every policy decision, DLP finding, behavioral alert, and escalation can be streamed to one or more SIEM destinations in near-real-time, giving SOC teams full visibility into AI agent activity without polling or manual export.

Why SIEM integration matters

  • Compliance evidence -- Continuous, machine-readable audit delivery satisfies SOC 2 CC7.2, NIST 800-53 AU-6, and ISO 27001 A.12.4 requirements for centralized log collection.
  • SOC visibility -- AI agent actions appear alongside your existing endpoint, network, and identity telemetry. Analysts can correlate agent behavior with other signals in a single pane of glass.
  • Threat correlation -- SIEM rules can fire on Behavry event types such as INBOUND_INJECTION_DETECTED, BEHAVIOR_REVERSAL, or BLAST_RADIUS_ESCALATION, enabling automated playbooks that respond to AI-specific threats.

Supported destinations

DestinationType keyTransportAuth methodFormat
Splunksplunk_hecHTTPS (HEC)Authorization: Splunk {token}JSON (newline-delimited HEC events)
Microsoft SentinelsentinelHTTPS (Log Analytics API)HMAC-SHA256 SharedKeyJSON array
Google ChroniclechronicleHTTPS (UDM batchCreate)Service account JWT (OAuth2)UDM SecurityEvent
IBM QRadarqradarTLS syslog (port 6514)Certificate-basedLEEF 2.0 inside RFC 5424 frames
Generic SyslogsyslogTCP, UDP, or TLSNone / certificateRFC 5424
Custom WebhookwebhookHTTPSHMAC-SHA256 (X-Behavry-Signature)JSON array

All destinations support per-destination event filtering, configurable batch sizes, and independent retry policies.


Architecture

The SIEM pipeline is an extension of Behavry's internal async event bus. It runs entirely in-process -- no external queues or message brokers are required.

Event Bus (all BehavryEvents)
|
v
SIEMDispatcher (wildcard subscriber)
| - converts BehavryEvent -> EventMetadata (no raw payload)
| - loads active destinations per tenant
| - evaluates Python-native event_filter per destination
| - enqueues matching events into per-destination asyncio.Queue (max 10,000)
|
+---> [Queue: dest-1] ---> SIEMBatchWorker ---> SplunkHECConnector
+---> [Queue: dest-2] ---> SIEMBatchWorker ---> SentinelConnector
+---> [Queue: dest-N] ---> SIEMBatchWorker ---> WebhookConnector

Each destination gets its own in-memory queue and background SIEMBatchWorker task. The worker flushes when the batch reaches batch_size events or when flush_interval_secs elapses -- whichever comes first. On graceful shutdown, remaining events in the batch are flushed before the task exits.


Event filtering

Every destination can define an event_filter that controls which events are forwarded. Filtering is evaluated in Python (no OPA round-trip) for performance. A destination with no filter receives all events.

Supported filter fields:

FieldTypeDescription
min_severitystringMinimum severity level: info, low, medium, high, critical. Events below this threshold are dropped.
event_typesstring[]Allowlist of event type strings (e.g., ["tool_call", "INBOUND_INJECTION_DETECTED"]). Only matching events pass.
agent_idsstring[]Allowlist of agent IDs. Only events from these agents pass.
policy_resultsstring[]Allowlist of policy results (e.g., ["deny", "escalate"]). Only events with matching results pass.

Filters are AND-combined: an event must pass every specified filter field to be forwarded.

Example filter

Forward only deny and escalate decisions at medium severity or above:

{
"event_filter": {
"min_severity": "medium",
"policy_results": ["deny", "escalate"]
}
}

Data isolation

The SIEM pipeline enforces strict data isolation consistent with the Data Protection (DP) pipeline. Events forwarded to SIEM destinations use the EventMetadata envelope, which contains only behavioral metadata:

IncludedExcluded
Event ID, timestamp, event typeRequest body (request_body)
Agent ID, session IDResponse body (response_body)
Tool name, MCP server, actionRaw payload content
Policy result and reasonDLP finding content (count only)
Behavioral score, risk tierRedacted/encrypted payload fields
DLP findings count (integer)Any field processed by the DP pipeline
Causal depth, alert type/severity

This ensures that sensitive data classified by the DLP scanner (credentials, PII, PHI, financial data) never reaches external SIEM infrastructure, even if the destination is misconfigured.


Output formats

JSON (default)

Events are serialized as JSON objects using EventMetadata.to_dict(). Splunk HEC wraps each event in the standard HEC envelope with time, host, source, sourcetype, and index fields. Sentinel and Chronicle apply their own format wrappers on top of the JSON payload.

LEEF 2.0

The LEEF 2.0 serializer (to_leef()) produces IBM-standard log lines for QRadar consumption:

LEEF:2.0|Behavry|BehavryProxy|1.0|{event_id}|{TAB-delimited extensions}

Extension fields include devTime, src (agent ID), dst (tool name), severity (1-10 scale), usrName, policy_result, dlp_findings, session_id, and causal_depth. All field values are injection-safe -- tabs, newlines, and carriage returns are escaped to spaces.

Severity mapping (1-10 scale):

Risk tierLEEF severityPolicy resultLEEF severity
critical10dlp_block8
high7deny7
medium5escalate5
low3allow3
info1

CEF

CEF output is available through the existing audit export endpoint (GET /api/v1/audit/export?format=cef). This pre-dates the SIEM module and remains supported for backward compatibility.


Retry and resilience

Each destination has independent retry configuration:

ParameterDefaultRangeDescription
retry_max_attempts51--20Maximum delivery attempts per batch
retry_backoff_secs101--300Base backoff interval in seconds

Exponential backoff with jitter

Failed deliveries use the formula:

wait = min(backoff_secs * 2^attempt + uniform(0, backoff_secs), 3600)

The jitter component prevents thundering-herd effects when multiple destinations recover simultaneously. The maximum wait is capped at 3,600 seconds (1 hour).

Auto-disable

After 10 consecutive failures, the destination is automatically disabled (enabled = false) and a SIEM_DESTINATION_UNHEALTHY alert is fired on the event bus. This alert appears in the dashboard Alerts page and can itself be forwarded to other healthy SIEM destinations.

To re-enable a destination after resolving the underlying issue, use PATCH /api/v1/siem/destinations/{id} with {"enabled": true}.

Dead letter queue

When all retry attempts are exhausted for a batch, the events are written to the dead letter queue (DLQ). DLQ payloads are encrypted with AES-256-GCM via the KMS client when available, and stored unencrypted as a fallback.

Each DLQ entry records:

  • Destination ID and tenant ID
  • Batch ID (UUID)
  • List of event IDs in the batch
  • Encrypted payload
  • Attempt count
  • Last attempt timestamp and next scheduled retry
  • Error message (truncated to 500 characters)

DLQ management

List DLQ entries

curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://behavry.example.com/api/v1/siem/dlq?destination_id=DEST_ID" | jq

Response:

{
"items": [
{
"id": "...",
"destination_id": "...",
"batch_id": "a1b2c3d4-...",
"event_ids": ["evt-1", "evt-2", "evt-3"],
"attempt_count": 5,
"last_attempt_at": "2026-03-17T10:30:00Z",
"next_retry_at": "2026-03-17T11:30:00Z",
"error_message": "HEC returned 503: Service Unavailable",
"resolved": false,
"created_at": "2026-03-17T10:25:00Z"
}
],
"total": 1
}

Retry DLQ entries

Re-queue all unresolved DLQ entries for a destination:

curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://behavry.example.com/api/v1/siem/destinations/DEST_ID/retry-dlq" | jq
{
"queued": 15,
"message": "Re-queued 15 events from 3 DLQ entries"
}

Discard a DLQ entry

Mark a DLQ entry as resolved without retrying:

curl -s -X POST -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://behavry.example.com/api/v1/siem/dlq/ENTRY_ID/discard" | jq

Credential security

Destination credentials (HEC tokens, shared keys, service account JSON, webhook secrets) are encrypted at rest using AES-256-GCM via the KMS abstraction layer (the same kms_client.py used by the Data Protection pipeline).

Key security properties:

  • Encrypted storage -- Credentials are encrypted before being written to the siem_destinations table. The encryption context includes tenant_id and purpose: "siem_credential".
  • Never returned on read -- GET and LIST responses include credential_configured: true/false and an optional credential_hint (last 4 characters of the token), but never the full credential.
  • Re-encryption on update -- PATCH with a new credential field re-encrypts and replaces the stored ciphertext.
  • Decrypted per-batch -- Credentials are decrypted only at delivery time, within the batch worker's flush loop. They are not cached in memory.

API reference

All SIEM endpoints require admin JWT authentication and are scoped to the current tenant.

Destinations CRUD

MethodEndpointDescription
POST/api/v1/siem/destinationsCreate a new destination
GET/api/v1/siem/destinationsList all destinations
GET/api/v1/siem/destinations/{id}Get a destination by ID
PATCH/api/v1/siem/destinations/{id}Update a destination
DELETE/api/v1/siem/destinations/{id}Soft-delete (disable) a destination

Operations

MethodEndpointDescription
POST/api/v1/siem/destinations/{id}/testSend a synthetic test event to verify connectivity
GET/api/v1/siem/destinations/{id}/healthGet health stats (last delivery, failure count, DLQ depth)
POST/api/v1/siem/destinations/{id}/retry-dlqRe-queue all unresolved DLQ entries

Dead letter queue

MethodEndpointDescription
GET/api/v1/siem/dlqList DLQ entries (optional ?destination_id= filter)
POST/api/v1/siem/dlq/{id}/discardMark a DLQ entry as resolved

Audit export enhancements

The existing audit export endpoint has been extended with SIEM-related capabilities:

FeatureDescription
LEEF formatGET /api/v1/audit/export?format=leef returns LEEF 2.0 formatted output
Cursor paginationResponse includes X-Next-Cursor and X-Total-Events headers for efficient iteration over large result sets (no 10,000-event cap)

Setup example: Splunk HEC

This walkthrough creates a Splunk HEC destination, verifies connectivity, and confirms event delivery.

1. Create the destination

curl -s -X POST \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
"https://behavry.example.com/api/v1/siem/destinations" \
-d '{
"name": "Splunk Production",
"destination_type": "splunk_hec",
"format": "json",
"endpoint_url": "https://splunk.corp.example.com:8088",
"credential": {
"token": "your-hec-token-here",
"index": "security"
},
"event_filter": {
"min_severity": "low",
"policy_results": ["deny", "escalate", "dlp_block"]
},
"batch_size": 100,
"flush_interval_secs": 30,
"retry_max_attempts": 5,
"retry_backoff_secs": 10
}' | jq

Response:

{
"id": "d1e2f3a4-...",
"name": "Splunk Production",
"destination_type": "splunk_hec",
"format": "json",
"endpoint_url": "https://splunk.corp.example.com:8088",
"credential_configured": true,
"credential_hint": "...here",
"event_filter": {
"min_severity": "low",
"policy_results": ["deny", "escalate", "dlp_block"]
},
"batch_size": 100,
"flush_interval_secs": 30,
"retry_max_attempts": 5,
"retry_backoff_secs": 10,
"enabled": true,
"last_delivery_at": null,
"last_error": null,
"consecutive_failures": 0,
"created_at": "2026-03-17T12:00:00Z"
}

2. Test connectivity

curl -s -X POST \
-H "Authorization: Bearer $ADMIN_TOKEN" \
"https://behavry.example.com/api/v1/siem/destinations/d1e2f3a4-.../test" | jq
{
"delivered": true,
"latency_ms": 142.5,
"error": null
}

The test endpoint sends a synthetic event with event_type: "tool_call", agent_id: "test-agent", and action: "test_connectivity". It exercises the full delivery path including credential decryption, payload formatting, and network transport.

3. Check health

curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://behavry.example.com/api/v1/siem/destinations/d1e2f3a4-.../health" | jq
{
"destination_id": "d1e2f3a4-...",
"last_delivery_at": "2026-03-17T12:01:30Z",
"consecutive_failures": 0,
"last_error": null,
"enabled": true,
"dlq_depth": 0
}

4. Verify in Splunk

In your Splunk search console, run:

index=security sourcetype=behavry:audit | head 10

Each event will contain the EventMetadata fields: event_type, agent_id, session_id, tool_name, policy_result, behavioral_score, dlp_findings_count, risk_tier, and causal_depth.


Connector-specific notes

Microsoft Sentinel

Credential fields: workspace_id, shared_key, and optional log_type (defaults to BehavryAudit_CL). Events are delivered to the Log Analytics Data Collector API with HMAC-SHA256 SharedKey authentication.

Google Chronicle

The credential is the full Google service account JSON (the same file you download from the GCP console). The connector authenticates via OAuth2 with the malachite-ingestion scope and delivers events to the UDM unstructuredlogentries:batchCreate endpoint. Requires the google-auth and requests Python packages.

IBM QRadar

Events are delivered as LEEF 2.0 messages inside RFC 5424 syslog frames over TLS (default port 6514). Each message includes a structured data element with agent_id, policy_result, and tool fields for QRadar indexing.

Generic Syslog

Endpoint URL determines the transport: tls://host:port, tcp://host:port, or udp://host:port. Defaults to TCP on port 514 if no scheme is specified. The optional facility credential field overrides the default syslog facility (local0 / 16).

Custom Webhook

Every request includes three security headers:

  • X-Behavry-Signature: sha256={hmac_hex_digest} computed over the JSON body using the configured secret.
  • X-Behavry-Event-Count: Number of events in the batch.
  • X-Behavry-Timestamp: Unix timestamp of the request.

The receiving endpoint should verify the HMAC signature before processing the payload. Any 2xx response is treated as successful delivery.