AI Asset Dependency & Data Lineage

Feature row 31 — Sprint AD.1

Dependency & Lineage is included on the Enterprise plan.

What this is

AI Surface Mapping gives you a flat inventory of every AI-capable asset in your environment. Dependency & Lineage turns that flat inventory into a directed graph: which agent feeds data into which model, which model is wired to which vector store, which vector store pulls from which SaaS connector, which downstream system an agent writes to.

Once the graph exists you can ask the questions that matter: if this data source is compromised, which agents touch it? / what would blast-radius limits actually contain? / which high-risk assets are reachable from a low-trust entry point?

Graph model

The graph lives in backend/behavry/discovery/dependency.py and lineage.py. Nodes are AI assets from the discovery catalog; edges are typed relationships:

Relationship	Meaning
`feeds_data`	Source provides data that lands in the target's context / training set
`authenticates_as`	Source acts on behalf of the target (delegation chain)
`invokes_tool`	Source calls a tool hosted or exposed by the target
`writes_to`	Source writes output to the target
`embeds_from`	Source computes embeddings from the target's content
`shares_secret`	Two assets share a credential (tight coupling)

Edges carry a weight representing how much data flows across them (events / day over the last 30 days).

Transitive risk

Each node has a base risk score from the Behavioral Risk Framework. Transitive risk is computed via BFS outward from a node:

transitive_risk(node) = base_risk(node)
                      + sum(edge_weight * base_risk(neighbor) * decay^depth)

decay attenuates by distance (default 0.6), so a risky neighbor one hop away contributes more than one four hops away. The result is a single score that captures not just this asset's risk but the risk of everything it can reach.

Data-flow scoring

Flow scoring is the dual question: how much data is actually moving along this edge, and is that volume consistent with stated purpose? A stale edge (zero events in 14 days) gets deprioritized; a hot edge between a low-trust source and a high-value sink gets flagged for review.

Scoring output goes into the Dependency → Flows report, sorted by concern level.

Visualization

Discovery → Dependency Graph renders the graph as a D3 force-directed layout. Controls:

Highlight a node — its transitive reach lights up in one color, incoming reach in another
Filter by edge type — show only feeds_data, hide shares_secret, etc.
Risk overlay — color nodes by base risk or transitive risk
Time window — recompute edge weights for the last 7 / 30 / 90 days

API

Routes: backend/behavry/discovery/relationship_routes.py.

Method	Path	Purpose
`GET`	`/api/v1/discovery/graph`	Full graph (nodes + edges) for the tenant
`GET`	`/api/v1/discovery/graph/{node_id}/reach?depth=3`	BFS from a node with transitive risk
`GET`	`/api/v1/discovery/flows?min_weight=10`	Flow-scored edge list
`GET`	`/api/v1/discovery/relationships`	Edge list only
`POST`	`/api/v1/discovery/relationships`	Manually add an edge not captured by automatic discovery

How edges are populated

Three sources:

Passive observation — the MCP proxy, browser extension, and AI proxies record who called what, which becomes invokes_tool and writes_to edges automatically
Connector enrichment — SaaS admin-API connectors (backend/behavry/discovery/connectors/) pull "this agent has access to this resource" edges
Manual annotation — auditors can add authoritative edges for things the product can't see (e.g. vendor-managed LLMs)

AI Surface Mapping — the node inventory this graph runs on
Behavioral Risk Framework — where base risk comes from
Citizen Coder Governance — feeds assets from Replit / Lovable / v0 / Bolt / Cursor / Windsurf into the graph

What this is​

Graph model​

Transitive risk​

Data-flow scoring​

Visualization​

API​

How edges are populated​

Related​