HNSW Graph Construction
Qdrant builds an HNSWHierarchical Navigable Small World -- a multi-layer graph index where layer 0 has all points, and each higher layer has an exponentially smaller random subset. Enables O(log N) approximate nearest neighbor search. graph for each segment's vector data. The graph is a multi-layered structure where layer 0 contains all points and each successive layer contains an exponentially decreasing random subset.
During construction, each new point is inserted by finding the entry point at the topmost layer and performing a greedy search downward. At each layer from the point's assigned level down to 0, a beam search with width ef_construct finds candidate neighbors. Up to m neighbors are selected using a heuristic that balances proximity and diversity. Bidirectional edges are added between the new point and its selected neighbors.
m (default 16): connections per node. Higher = better recall, more memory. Layer 0 uses m*2 connections.ef_construct (default 100): build-time search width. Higher = better graph quality, slower indexing.hnsw_ef (default 128): query-time beam width. Higher = better recall, slower queries.
HNSW Search Algorithm
Search Flow Through the HNSW Graph
The hnsw_ef parameter controls recall vs. latency at query time. Higher values explore more nodes for better results at the cost of more distance computations. Setting exact: true bypasses HNSW entirely for a brute-force scan with perfect recall.
Filterable HNSW
Standard HNSW has a critical problem with filtered search: if you filter out 90% of points, the graph becomes disconnected. Qdrant solves this with a multi-strategy approach.
Query-Time Strategy Selection
The query planner dynamically selects the best strategy per segment based on filter selectivity:
Filter Strategy Decision Flow
The ACORNAn algorithm introduced in v1.16 that explores two-hop neighbors during filtered HNSW traversal. When a neighbor doesn't match the filter, ACORN checks its neighbors, finding matching points that would otherwise require many more hops. algorithm (v1.16+) further improves filtered search by performing two-hop exploration -- when a neighbor doesn't match, it checks that neighbor's neighbors.
Quantization
Quantization compresses vectors from 32-bit floats to lower-precision representations, reducing memory footprint while maintaining search quality.
Scalar Quantization (int8)
Each float dimension is mapped linearly to [0, 255]. Reduces memory 4x with typically less than 1% recall loss. Works with virtually all embedding models -- the safest default choice.
Product Quantization
Divides vectors into sub-vectors and quantizes each with a learned codebook. Achieves 8-32x compression but with larger accuracy trade-offs. Best for high-dimensional vectors (768+).
Binary Quantization
Each float dimension becomes a single bit. Achieves 32x compression and enables fast Hamming distance. Only works with compatible models (OpenAI text-embedding-3-large, Cohere embed-v3).
Storage Engine
Qdrant stores vectors in configurable formats: in-memory (fastest, most RAM), memory-mapped files (mmap -- uses OS page cache, allows datasets larger than RAM), and chunked mmap (optimized for large segments). Payloads use RocksDB-backed or mmap storage.
The payload indexing system (StructPayloadIndex) supports keyword, full-text, numeric range, geo-bounding box, datetime, UUID, and boolean indexes. These are built explicitly via API call -- you create an index on a field, and Qdrant builds the appropriate structure.
Every mutation passes through the per-shard Write-Ahead Log before being applied to segments, ensuring crash recovery. On restart, uncommitted WAL entries are replayed to restore state.