Core Concepts

The mental models you need to understand how Graphify works.

The Knowledge Graph Model

A knowledge graph is a network of nodes (things) connected by edges (relationships). In Graphify:

Nodes represent code entities (functions, classes, modules), concepts from documents, and rationale fragments (design decisions from comments)
Edges represent relationships: calls, imports, contains, inherits, semantically_similar_to, rationale_for
Communities are clusters of densely-connected nodes discovered automatically

Interactive: Example Knowledge Graph

Click a node to see its details. Hover over edges to see relationship types.

Class/Module Function Rationale Concept Import Inferred

Confidence Tiers

Every relationship in the graph is tagged with a confidence level. This is Graphify's honesty mechanism — you always know what was found vs. what was guessed.

Confidence Levels

EXTRACTED

1.0

Found directly in source

INFERRED

0.5 – 0.9

Reasonable deduction

AMBIGUOUS

???

Uncertain — flagged for review

Tier	Score	Example
EXTRACTED	1.0 (always)	An `import` statement, a direct function call
INFERRED	0.5 – 0.9	Two functions likely calling each other (name matching)
AMBIGUOUS	Unscored	A possible relationship flagged for human review

How Confidence Influences the System

Report sorting: AMBIGUOUS connections surface first in "surprising connections"
Question generation: The system generates questions about ambiguous and inferred relationships
Surprise scoring: Lower confidence + cross-file-type = higher surprise score
God node filtering: INFERRED edges on god nodes trigger review questions

Node Types

Type	`file_type`	Created By
Function	`code`	tree-sitter AST parsing
Class	`code`	tree-sitter AST parsing
Import / Module	`code`	tree-sitter AST parsing
Concept	`document` / `paper`	Claude semantic extraction
Visual Concept	`image`	Claude vision extraction
Rationale	`rationale`	Comment patterns: `# WHY:`, `# NOTE:`, `# HACK:`

Communities

Graphify automatically groups related nodes into communities. Think of them as neighborhoods in your codebase — nodes that are densely connected end up grouped together.

Interactive: Community Clusters

Hover over a community to highlight its connections.

Authentication

log

ses

tok

val

Database

Usr

qry

mig

API Layer

rtr

hnd

mid

res

Topology-Based, Not Embedding-Based

The clustering is graph-topology-based. There is no separate vector database or embedding step. Semantic similarity edges that Claude extracts (semantically_similar_to) are already in the graph as INFERRED edges, so they influence community detection directly.

Leiden vs Louvain: Leiden (via graspologic) is preferred — it guarantees well-connected communities and runs faster. If graspologic is not installed, Graphify falls back to NetworkX's Louvain with tuned parameters (max_level=10, threshold=1e-4).

Oversized Community Splitting

If a community contains more than 25% of graph nodes (and at least 10 nodes), Graphify recursively extracts its subgraph and re-runs Leiden. This prevents one giant cluster from dominating the analysis.

Cohesion score: Each community gets a score = actual_intra_edges / max_possible_edges (range 0.0–1.0). Communities below 0.15 are flagged as candidates for splitting.

God Nodes

God nodes are the highest-degree concepts — what everything connects through. They're the architectural pillars. If you remove a god node, large parts of the graph would disconnect.

God Node Filtering

Nodes are sorted by degree (edge count), then filtered:

Remove file-level hubs (label matches source filename — just containers)
Remove method stubs (.method_name() — too granular)
Remove isolates (degree ≤ 1 — not well-connected)
Remove concept nodes (empty source_file — manually injected)
Return top N remaining by degree

Surprising Connections

Edges ranked by a multi-factor surprise score:

Surprise Score Breakdown

Confidence tier: Crosses file types (code ↔ paper)? Different top-level dirs? Cross-community? Semantic similarity edge? Peripheral to hub?

Surprise Score:

Hyperedges

Some relationships connect 3+ nodes and can't be expressed as pairwise edges. Examples:

All classes implementing a shared protocol
All functions in an authentication flow
All concepts from a paper section forming one idea

Hyperedge Visualization

Hyperedges are rendered as shaded convex hulls in graph.html

← Home Architecture →