Graphify

Turn any folder into a queryable knowledge graph

An AI coding assistant skill that reads your files, builds a persistent knowledge graph, and gives you back structure you didn't know was there.

71.5x

Fewer tokens per query

Languages supported

Infrastructure required

The Problem

AI coding assistants have no persistent memory of your codebase. Every session, they re-read raw files to establish context. This is expensive in tokens and loses structural understanding — the AI sees text, not architecture.

Graphify creates a compact, structured representation that persists across sessions. You pay the indexing cost once; every subsequent query reads the graph instead of raw files.

Origin Story

On April 2, 2026, Andrej Karpathy posted about his raw/ folder workflow — dropping papers, code, screenshots, and notes into a folder for LLM consumption — and issued a public challenge:

"I think there is room here for an incredible new product instead of a hacky collection of scripts."

Safi Shamsi built Graphify within 48 hours.

What You Get

`graph.html`

Interactive graph visualization. Click nodes, search, filter by community.

`GRAPH_REPORT.md`

God nodes, surprising connections, suggested questions — a one-page architecture overview.

`graph.json`

Persistent graph for querying weeks later without re-reading files.

`cache/`

SHA256 cache — re-runs only process changed files.

What You'll Learn

💡

Core Concepts

Knowledge graphs, confidence tiers, communities, god nodes, and the surprise score algorithm.

🛠

Architecture & Pipeline

The 7-stage pipeline, module map, data flow, and design principles that make it composable.

🔌

Implementation Deep Dive

AST extraction, Leiden clustering, security model, caching strategy, and testing.

🚀

Use Cases & Workflows

Onboarding, research integration, code review, incremental builds, and CLI commands.

🌐

Tech Stack & Ecosystem

NetworkX, tree-sitter, Leiden, vis.js, platform integrations, and how to extend Graphify.

How It Compares

Tool	Approach	Key Difference
Graphify	AST + LLM hybrid	Code parsed free via tree-sitter; LLM only for docs/images
Microsoft GraphRAG	Fully LLM-driven	Designed for enterprise text, not codebases; needs vector DB
code-review-graph	tree-sitter + SQLite	Narrower scope — optimizes review file sets only
FalkorDB CodeGraph	GraphRAG-SDK + Neo4j	Heavy infrastructure; requires external database

Key Architectural Insight

Graphify splits extraction into two passes:

Pass 1 Deterministic AST parsing via tree-sitter for code — free, fast, reproducible. Same source always produces the same extraction.

Pass 2 LLM subagents (Claude/GPT-4) for docs, papers, and images — probabilistic but honest via confidence tiers (EXTRACTED / INFERRED / AMBIGUOUS).

This hybrid means you only pay LLM costs for unstructured content. On a mixed corpus, code extraction is instant and free; semantic extraction costs tokens but only runs on changed files (cached by SHA256).

Built from safishamsi/graphify