AI Asset Dependency & Data Lineage
Dependency & Lineage is included on the Enterprise plan.
What this is
AI Surface Mapping gives you a flat inventory of every AI-capable asset in your environment. Dependency & Lineage turns that flat inventory into a directed graph: which agent feeds data into which model, which model is wired to which vector store, which vector store pulls from which SaaS connector, which downstream system an agent writes to.
Once the graph exists you can ask the questions that matter: if this data source is compromised, which agents touch it? / what would blast-radius limits actually contain? / which high-risk assets are reachable from a low-trust entry point?
Graph model
The graph lives in backend/behavry/discovery/dependency.py and lineage.py. Nodes are AI assets from the discovery catalog; edges are typed relationships:
| Relationship | Meaning |
|---|---|
feeds_data | Source provides data that lands in the target's context / training set |
authenticates_as | Source acts on behalf of the target (delegation chain) |
invokes_tool | Source calls a tool hosted or exposed by the target |
writes_to | Source writes output to the target |
embeds_from | Source computes embeddings from the target's content |
shares_secret | Two assets share a credential (tight coupling) |
Edges carry a weight representing how much data flows across them (events / day over the last 30 days).
Transitive risk
Each node has a base risk score from the Behavioral Risk Framework. Transitive risk is computed via BFS outward from a node:
transitive_risk(node) = base_risk(node)
+ sum(edge_weight * base_risk(neighbor) * decay^depth)
decay attenuates by distance (default 0.6), so a risky neighbor one hop away contributes more than one four hops away. The result is a single score that captures not just this asset's risk but the risk of everything it can reach.
Data-flow scoring
Flow scoring is the dual question: how much data is actually moving along this edge, and is that volume consistent with stated purpose? A stale edge (zero events in 14 days) gets deprioritized; a hot edge between a low-trust source and a high-value sink gets flagged for review.
Scoring output goes into the Dependency → Flows report, sorted by concern level.
Visualization
Discovery → Dependency Graph renders the graph as a D3 force-directed layout. Controls:
- Highlight a node — its transitive reach lights up in one color, incoming reach in another
- Filter by edge type — show only
feeds_data, hideshares_secret, etc. - Risk overlay — color nodes by base risk or transitive risk
- Time window — recompute edge weights for the last 7 / 30 / 90 days
API
Routes: backend/behavry/discovery/relationship_routes.py.
| Method | Path | Purpose |
|---|---|---|
GET | /api/v1/discovery/graph | Full graph (nodes + edges) for the tenant |
GET | /api/v1/discovery/graph/{node_id}/reach?depth=3 | BFS from a node with transitive risk |
GET | /api/v1/discovery/flows?min_weight=10 | Flow-scored edge list |
GET | /api/v1/discovery/relationships | Edge list only |
POST | /api/v1/discovery/relationships | Manually add an edge not captured by automatic discovery |
How edges are populated
Three sources:
- Passive observation — the MCP proxy, browser extension, and AI proxies record who called what, which becomes
invokes_toolandwrites_toedges automatically - Connector enrichment — SaaS admin-API connectors (
backend/behavry/discovery/connectors/) pull "this agent has access to this resource" edges - Manual annotation — auditors can add authoritative edges for things the product can't see (e.g. vendor-managed LLMs)
Related
- AI Surface Mapping — the node inventory this graph runs on
- Behavioral Risk Framework — where base risk comes from
- Citizen Coder Governance — feeds assets from Replit / Lovable / v0 / Bolt / Cursor / Windsurf into the graph