Skip to main content

Troubleshooting

Startup Issues

TimescaleDB extension not found

Symptom:

Could not enable TimescaleDB (may not be installed)

Cause: The PostgreSQL container doesn't have TimescaleDB. Using the wrong image.

Fix: Use timescale/timescaledb:latest-pg16 (not plain postgres:16):

# docker-compose.yml
db:
image: timescale/timescaledb:latest-pg16

Audit events will still write to a plain table without the hypertable — functionality is preserved but time-series queries and compression won't work.


OPA connection refused / Base policy push failed

Symptom:

Base policy push failed: httpx.ConnectError

Cause: OPA container isn't ready yet, or BEHAVRY_OPA_URL is misconfigured.

Fix:

  1. Check OPA is running: docker ps | grep opa
  2. Test directly: curl http://localhost:8181/health
  3. Ensure BEHAVRY_OPA_URL=http://localhost:8181 (local dev) or http://opa:8181 (Docker Compose)
  4. OPA must start before the backend. Check depends_on + health check in docker-compose.yml

With BEHAVRY_OPA_FAIL_CLOSED=true (default): all tool calls will be denied until OPA is healthy. The app will still start, but agents can't do anything.


BEHAVRY_ADMIN_PASSWORD must be set in production

Cause: BEHAVRY_ENV=production but no admin password configured.

Fix: Set BEHAVRY_ADMIN_PASSWORD to a strong password.


JWT keys not configured warning on startup

Symptom:

JWT keys not configured — auto-generating for development

This is normal in development. Keys are regenerated each restart. Sessions don't persist across restarts in dev.

For production: set BEHAVRY_JWT_PRIVATE_KEY and BEHAVRY_JWT_PUBLIC_KEY.


Authentication Issues

401 Unauthorized from the dashboard

Symptom: All API calls return 401. You're logged in but can't see data.

Causes and fixes:

  1. Token expired: Log out and log back in.
  2. Wrong localStorage key: Check browser DevTools → Application → Local Storage for behavry_admin_token.
  3. JWT keys rotated: If the server restarted in dev mode, old tokens are invalid. Log out and back in.
  4. Clerk mode mismatch: Ensure VITE_CLERK_PUBLISHABLE_KEY is set in dashboard/.env if using Clerk mode.

Agent token rejected: Invalid or expired agent token

Causes:

  1. Token expired — re-authenticate with POST /api/v1/auth/token
  2. Agent is suspended (status: suspended) — reactivate via the Agents UI
  3. Session revoked — create a new session

Session revoked or expired

The agent's session was explicitly revoked (via API) or the DB was reset. Re-authenticate.


CORS Errors

Symptom:

Access to fetch at 'http://localhost:8000/api/v1/...' from origin 'http://localhost:5173' has been blocked by CORS policy

Fix: Ensure BEHAVRY_CORS_ORIGINS_STR includes the dashboard origin:

# Development
BEHAVRY_CORS_ORIGINS_STR=http://localhost:3000,http://localhost:5173

# Production
BEHAVRY_CORS_ORIGINS_STR=https://your-dashboard.com

Restart the backend after changing env vars.


OPA / Policy Issues

All tool calls denied with "No policy matched"

Cause: OPA started but policies weren't pushed.

Debug:

# Check what's in OPA
curl http://localhost:8181/v1/policies

# Expected: list including "policies/base/rbac.rego"
# If empty: policies/base/ directory was not found at startup

Fix: Ensure the policies/base/ directory exists relative to the backend (../policies/base from backend/behavry/main.py resolves to the repo root policies/base/).

ls /Users/ward/Code/behavry/policies/base/
# Should show: rbac.rego resource_access.rego

OPA returns deny for everything even with correct permissions

Cause: Agent JWT doesn't include the expected permissions.

Debug:

# Decode the agent JWT (base64 decode the middle segment)
echo "eyJ..." | cut -d. -f2 | base64 -d | python3 -m json.tool

Expected:

{"sub": "...", "roles": ["filesystem-reader"], "permissions": ["filesystem:read"], "risk_tier": "medium"}

If permissions is empty: the agent has no roles assigned. Assign a role via the Agents UI or POST /api/v1/agents/{id}/roles.


Custom policy not taking effect

  1. Check the policy status is active (not draft)
  2. Check it was synced to OPA: curl http://localhost:8181/v1/policies
  3. Test the policy directly against OPA:
    curl -X POST http://localhost:8181/v1/data/behavry/authz \
    -H "Content-Type: application/json" \
    -d '{"input": {"agent": {"id": "test", "roles": [], "permissions": [], "risk_tier": "low"}, "request": {"tool_name": "write_file", "action": "write", "resource": "/tmp/test.txt", "parameters": {}, "mcp_server": "filesystem"}}}'

Database Issues

asyncpg.exceptions.ConnectionDoesNotExistError

Cause: Database connection dropped. The pool can't recover.

Fix: Restart the backend. For production, ensure the DB is on a stable network connection and BEHAVRY_DB_POOL_SIZE / BEHAVRY_DB_MAX_OVERFLOW are not exhausted.


Could not create hypertable warning

Cause: TimescaleDB extension not installed, or hypertable already exists.

Fix: Usually benign — the warning appears on subsequent restarts because the hypertable already exists. If it's the first run and TimescaleDB is not installed, audit events write to a plain table (functional but not optimized).


Dashboard Issues

Blank screen / 404 on page refresh

Cause: Nginx isn't configured to serve the React SPA for all routes (history API fallback missing).

Fix: Ensure the nginx config includes:

location / {
try_files $uri $uri/ /index.html;
}

SSE stream disconnects frequently

Cause: Proxy or load balancer buffering or closing idle connections.

Fix:

  1. Increase proxy timeout: nginx proxy_read_timeout 3600s;
  2. Decrease BEHAVRY_SSE_KEEPALIVE_SECONDS to keep the connection alive more frequently

Dark mode not persisting

Cause: localStorage is unavailable or behavry_theme key is missing.

Fix: Open DevTools → Application → Local Storage → verify behavry_theme is set to dark or light. If missing, toggle once via the sidebar toggle.


Escalations Not Resolving

Agent stuck waiting, admin approved but agent didn't get the response

Cause: The backend was restarted between when the escalation was created and when it was approved. The asyncio.Future is in-memory and lost on restart.

Symptoms: Escalation shows approved in the database but the agent got a timeout error.

Fix: The agent will eventually get a timeout error (configured by risk tier). This is a known limitation of the in-memory Future approach. Future: replace with Redis pub/sub for durability across restarts.

Workaround for now: Inform agents to retry the failed operation after an escalation approval when this occurs.


Webhook Delivery Failures

Webhook not receiving events

  1. Check BEHAVRY_WEBHOOK_URL is set correctly
  2. Check BEHAVRY_WEBHOOK_MIN_SEVERITY — default is high; alerts with lower severity won't be sent
  3. Check backend logs for Webhook delivery failed
  4. Test the endpoint manually: curl -X POST <webhook_url> -d '{"test": true}'
  5. Verify HMAC: the X-Behavry-Signature header should match hmac-sha256(secret, body)