Graphify
Turn any folder into a queryable knowledge graph
An AI coding assistant skill that reads your files, builds a persistent knowledge graph, and gives you back structure you didn't know was there.
The Problem
AI coding assistants have no persistent memory of your codebase. Every session, they re-read raw files to establish context. This is expensive in tokens and loses structural understanding — the AI sees text, not architecture.
Graphify creates a compact, structured representation that persists across sessions. You pay the indexing cost once; every subsequent query reads the graph instead of raw files.
Origin Story
On April 2, 2026, Andrej Karpathy posted about his raw/ folder workflow — dropping papers, code, screenshots, and notes into a folder for LLM consumption — and issued a public challenge:
"I think there is room here for an incredible new product instead of a hacky collection of scripts."
Safi Shamsi built Graphify within 48 hours.
What You Get
graph.html
Interactive graph visualization. Click nodes, search, filter by community.
GRAPH_REPORT.md
God nodes, surprising connections, suggested questions — a one-page architecture overview.
graph.json
Persistent graph for querying weeks later without re-reading files.
cache/
SHA256 cache — re-runs only process changed files.
What You'll Learn
Core Concepts
Knowledge graphs, confidence tiers, communities, god nodes, and the surprise score algorithm.
Architecture & Pipeline
The 7-stage pipeline, module map, data flow, and design principles that make it composable.
Implementation Deep Dive
AST extraction, Leiden clustering, security model, caching strategy, and testing.
Use Cases & Workflows
Onboarding, research integration, code review, incremental builds, and CLI commands.
Tech Stack & Ecosystem
NetworkX, tree-sitter, Leiden, vis.js, platform integrations, and how to extend Graphify.
How It Compares
| Tool | Approach | Key Difference |
|---|---|---|
| Graphify | AST + LLM hybrid | Code parsed free via tree-sitter; LLM only for docs/images |
| Microsoft GraphRAG | Fully LLM-driven | Designed for enterprise text, not codebases; needs vector DB |
| code-review-graph | tree-sitter + SQLite | Narrower scope — optimizes review file sets only |
| FalkorDB CodeGraph | GraphRAG-SDK + Neo4j | Heavy infrastructure; requires external database |
Key Architectural Insight
Graphify splits extraction into two passes:
Pass 1 Deterministic AST parsing via tree-sitter for code — free, fast, reproducible. Same source always produces the same extraction.
Pass 2 LLM subagents (Claude/GPT-4) for docs, papers, and images — probabilistic but honest via confidence tiers (EXTRACTED / INFERRED / AMBIGUOUS).
This hybrid means you only pay LLM costs for unstructured content. On a mixed corpus, code extraction is instant and free; semantic extraction costs tokens but only runs on changed files (cached by SHA256).
Built from safishamsi/graphify