Home

Graphify

Turn any folder into a queryable knowledge graph

An AI coding assistant skill that reads your files, builds a persistent knowledge graph, and gives you back structure you didn't know was there.

71.5x
Fewer tokens per query
19
Languages supported
0
Infrastructure required

The Problem

AI coding assistants have no persistent memory of your codebase. Every session, they re-read raw files to establish context. This is expensive in tokens and loses structural understanding — the AI sees text, not architecture.

Graphify creates a compact, structured representation that persists across sessions. You pay the indexing cost once; every subsequent query reads the graph instead of raw files.

Origin Story

On April 2, 2026, Andrej Karpathy posted about his raw/ folder workflow — dropping papers, code, screenshots, and notes into a folder for LLM consumption — and issued a public challenge:

"I think there is room here for an incredible new product instead of a hacky collection of scripts."

Safi Shamsi built Graphify within 48 hours.

What You Get

graph.html

Interactive graph visualization. Click nodes, search, filter by community.

GRAPH_REPORT.md

God nodes, surprising connections, suggested questions — a one-page architecture overview.

graph.json

Persistent graph for querying weeks later without re-reading files.

cache/

SHA256 cache — re-runs only process changed files.

What You'll Learn

How It Compares

ToolApproachKey Difference
GraphifyAST + LLM hybridCode parsed free via tree-sitter; LLM only for docs/images
Microsoft GraphRAGFully LLM-drivenDesigned for enterprise text, not codebases; needs vector DB
code-review-graphtree-sitter + SQLiteNarrower scope — optimizes review file sets only
FalkorDB CodeGraphGraphRAG-SDK + Neo4jHeavy infrastructure; requires external database

Key Architectural Insight

Graphify splits extraction into two passes:

Pass 1 Deterministic AST parsing via tree-sitter for code — free, fast, reproducible. Same source always produces the same extraction.

Pass 2 LLM subagents (Claude/GPT-4) for docs, papers, and images — probabilistic but honest via confidence tiers (EXTRACTED / INFERRED / AMBIGUOUS).

This hybrid means you only pay LLM costs for unstructured content. On a mixed corpus, code extraction is instant and free; semantic extraction costs tokens but only runs on changed files (cached by SHA256).

Built from safishamsi/graphify