Home › Core Concepts

Core Concepts

The mental models you need to understand how Graphify works.

The Knowledge Graph Model

A knowledge graph is a network of nodes (things) connected by edges (relationships). In Graphify:

  • Nodes represent code entities (functions, classes, modules), concepts from documents, and rationale fragments (design decisions from comments)
  • Edges represent relationships: calls, imports, contains, inherits, semantically_similar_to, rationale_for
  • Communities are clusters of densely-connected nodes discovered automatically
Interactive: Example Knowledge Graph

Click a node to see its details. Hover over edges to see relationship types.

auth UserSvc hash_pw validate session WHY concept jwt
Class/Module Function Rationale Concept Import Inferred

Confidence Tiers

Every relationship in the graph is tagged with a confidence level. This is Graphify's honesty mechanism — you always know what was found vs. what was guessed.

Confidence Levels
EXTRACTED
1.0
Found directly in source
INFERRED
0.5 – 0.9
Reasonable deduction
AMBIGUOUS
???
Uncertain — flagged for review
TierScoreExample
EXTRACTED1.0 (always)An import statement, a direct function call
INFERRED0.5 – 0.9Two functions likely calling each other (name matching)
AMBIGUOUSUnscoredA possible relationship flagged for human review

How Confidence Influences the System

  • Report sorting: AMBIGUOUS connections surface first in "surprising connections"
  • Question generation: The system generates questions about ambiguous and inferred relationships
  • Surprise scoring: Lower confidence + cross-file-type = higher surprise score
  • God node filtering: INFERRED edges on god nodes trigger review questions

Node Types

Typefile_typeCreated By
Functioncodetree-sitter AST parsing
Classcodetree-sitter AST parsing
Import / Modulecodetree-sitter AST parsing
Conceptdocument / paperClaude semantic extraction
Visual ConceptimageClaude vision extraction
RationalerationaleComment patterns: # WHY:, # NOTE:, # HACK:

Communities

Graphify automatically groups related nodes into communities. Think of them as neighborhoods in your codebase — nodes that are densely connected end up grouped together.

Interactive: Community Clusters

Hover over a community to highlight its connections.

Authentication

log
ses
tok
val

Database

Usr
qry
mig

API Layer

rtr
hnd
mid
res

Topology-Based, Not Embedding-Based

The clustering is graph-topology-based. There is no separate vector database or embedding step. Semantic similarity edges that Claude extracts (semantically_similar_to) are already in the graph as INFERRED edges, so they influence community detection directly.

Leiden vs Louvain: Leiden (via graspologic) is preferred — it guarantees well-connected communities and runs faster. If graspologic is not installed, Graphify falls back to NetworkX's Louvain with tuned parameters (max_level=10, threshold=1e-4).

Oversized Community Splitting

If a community contains more than 25% of graph nodes (and at least 10 nodes), Graphify recursively extracts its subgraph and re-runs Leiden. This prevents one giant cluster from dominating the analysis.

Cohesion score: Each community gets a score = actual_intra_edges / max_possible_edges (range 0.0–1.0). Communities below 0.15 are flagged as candidates for splitting.

God Nodes

God nodes are the highest-degree concepts — what everything connects through. They're the architectural pillars. If you remove a god node, large parts of the graph would disconnect.

God Node Filtering

Nodes are sorted by degree (edge count), then filtered:

  1. Remove file-level hubs (label matches source filename — just containers)
  2. Remove method stubs (.method_name() — too granular)
  3. Remove isolates (degree ≤ 1 — not well-connected)
  4. Remove concept nodes (empty source_file — manually injected)
  5. Return top N remaining by degree

Surprising Connections

Edges ranked by a multi-factor surprise score:

Surprise Score Breakdown
Surprise Score:

Hyperedges

Some relationships connect 3+ nodes and can't be expressed as pairwise edges. Examples:

  • All classes implementing a shared protocol
  • All functions in an authentication flow
  • All concepts from a paper section forming one idea
Hyperedge Visualization
Authentication Flow (hyperedge) login verify token session

Hyperedges are rendered as shaded convex hulls in graph.html

← Home Architecture →