Core Concepts
The mental models you need to understand how Graphify works.
The Knowledge Graph Model
A knowledge graph is a network of nodes (things) connected by edges (relationships). In Graphify:
- Nodes represent code entities (functions, classes, modules), concepts from documents, and rationale fragments (design decisions from comments)
- Edges represent relationships:
calls,imports,contains,inherits,semantically_similar_to,rationale_for - Communities are clusters of densely-connected nodes discovered automatically
Click a node to see its details. Hover over edges to see relationship types.
Confidence Tiers
Every relationship in the graph is tagged with a confidence level. This is Graphify's honesty mechanism — you always know what was found vs. what was guessed.
| Tier | Score | Example |
|---|---|---|
| EXTRACTED | 1.0 (always) | An import statement, a direct function call |
| INFERRED | 0.5 – 0.9 | Two functions likely calling each other (name matching) |
| AMBIGUOUS | Unscored | A possible relationship flagged for human review |
How Confidence Influences the System
- Report sorting: AMBIGUOUS connections surface first in "surprising connections"
- Question generation: The system generates questions about ambiguous and inferred relationships
- Surprise scoring: Lower confidence + cross-file-type = higher surprise score
- God node filtering: INFERRED edges on god nodes trigger review questions
Node Types
| Type | file_type | Created By |
|---|---|---|
| Function | code | tree-sitter AST parsing |
| Class | code | tree-sitter AST parsing |
| Import / Module | code | tree-sitter AST parsing |
| Concept | document / paper | Claude semantic extraction |
| Visual Concept | image | Claude vision extraction |
| Rationale | rationale | Comment patterns: # WHY:, # NOTE:, # HACK: |
Communities
Graphify automatically groups related nodes into communities. Think of them as neighborhoods in your codebase — nodes that are densely connected end up grouped together.
Hover over a community to highlight its connections.
Authentication
Database
API Layer
Topology-Based, Not Embedding-Based
The clustering is graph-topology-based. There is no separate vector database or embedding step. Semantic similarity edges that Claude extracts (semantically_similar_to) are already in the graph as INFERRED edges, so they influence community detection directly.
Leiden vs Louvain: Leiden (via graspologic) is preferred — it guarantees well-connected communities and runs faster. If graspologic is not installed, Graphify falls back to NetworkX's Louvain with tuned parameters (max_level=10, threshold=1e-4).
Oversized Community Splitting
If a community contains more than 25% of graph nodes (and at least 10 nodes), Graphify recursively extracts its subgraph and re-runs Leiden. This prevents one giant cluster from dominating the analysis.
Cohesion score: Each community gets a score = actual_intra_edges / max_possible_edges (range 0.0–1.0). Communities below 0.15 are flagged as candidates for splitting.
God Nodes
God nodes are the highest-degree concepts — what everything connects through. They're the architectural pillars. If you remove a god node, large parts of the graph would disconnect.
God Node Filtering
Nodes are sorted by degree (edge count), then filtered:
- Remove file-level hubs (label matches source filename — just containers)
- Remove method stubs (
.method_name()— too granular) - Remove isolates (degree ≤ 1 — not well-connected)
- Remove concept nodes (empty
source_file— manually injected) - Return top N remaining by degree
Surprising Connections
Edges ranked by a multi-factor surprise score:
Hyperedges
Some relationships connect 3+ nodes and can't be expressed as pairwise edges. Examples:
- All classes implementing a shared protocol
- All functions in an authentication flow
- All concepts from a paper section forming one idea
Hyperedges are rendered as shaded convex hulls in graph.html