Skip to main content

Cost Attribution & Token Economics

Feature row 38 — Sprint CA

Cost Attribution is included on the Professional and Enterprise plans.

What this is

Cost Attribution turns proxy traffic into dollar figures. Behavry extracts token usage from every LLM call (OpenAI, Anthropic, Google, Ollama, NemoClaw) that flows through the proxy, multiplies by the current model pricing, and rolls the result up per agent, per model, per provider, and per tenant.

You get two things out of it:

  • Attribution — who spent what. "Agent X cost $14.20 yesterday; $11.80 of that was Claude Opus; here are the workflows that used it."
  • Optimization signal — which agents are running on models that are too expensive for what they do, and which tool-heavy agents would benefit from Context Window Governance.

Pricing

Default pricing for 14 models is seeded at startup from backend/behavry/analytics/cost_attribution.py:

ProviderModelInput $/1kOutput $/1k
OpenAIgpt-4o2.5010.00
OpenAIgpt-4o-mini0.150.60
OpenAIgpt-4.12.008.00
OpenAIgpt-4.1-mini0.401.60
OpenAIgpt-4.1-nano0.100.40
OpenAIo32.008.00
OpenAIo3-mini1.104.40
OpenAIo4-mini1.104.40
Anthropicclaude-sonnet-4-*3.0015.00
Anthropicclaude-haiku-3-5-*0.804.00
Anthropicclaude-opus-4-*15.0075.00
Googlegemini-2.5-pro1.2510.00
Googlegemini-2.5-flash0.150.60
Googlegemini-2.0-flash0.100.40

Tenants can override any row, or add custom models, through the pricing API. Ollama is treated as free by default (local inference); tenants can set a synthetic per-1k rate to capture compute cost.

How tokens get counted

Two integration paths feed the same audit_events row:

  1. AI proxies (anthropic_proxy.py, openai_proxy.py, gemini_proxy.py, ollama_proxy.py, nemoclaw_proxy.py) publish events with input_tokens and output_tokens parsed from the upstream response body.
  2. MCP proxy writes its own audit rows directly and the CostAuditWriter subscriber enriches them with token counts extracted from the tool response.

Both paths flow through extract_token_usage()lookup_pricing()estimated_cost on the event row.

When the upstream response does not include token counts (some streaming paths), a fallback tokenizer estimates them and tags the row with token_source=estimated.

Queries

All cost aggregation runs on TimescaleDB time_bucket so daily / hourly rollups are cheap regardless of table size.

API

Routes: backend/behavry/analytics/cost_api.py, pricing_routes.py.

MethodPathPurpose
GET/api/v1/cost/summary?period=7dTenant total, breakdown by provider
GET/api/v1/cost/by-agent?period=30dPer-agent spend with rank
GET/api/v1/cost/by-model?period=30dPer-model spend with token volume
GET/api/v1/cost/timeseries?bucket=1hTime-series for charting
GET/api/v1/cost/pricingCurrent pricing table
POST/api/v1/cost/pricingAdd or override a pricing row

All endpoints respect tenant isolation via RLS.

Dashboard

Observability → Cost shows:

  • Headline: yesterday / last 7d / last 30d spend
  • Stacked area chart by provider
  • Top 10 agents by spend
  • Top 10 tool patterns by token burn
  • Projected monthly spend based on the current 7-day rate