HNSW Graph Construction

Qdrant builds an HNSWHierarchical Navigable Small World -- a multi-layer graph index where layer 0 has all points, and each higher layer has an exponentially smaller random subset. Enables O(log N) approximate nearest neighbor search. graph for each segment's vector data. The graph is a multi-layered structure where layer 0 contains all points and each successive layer contains an exponentially decreasing random subset.

During construction, each new point is inserted by finding the entry point at the topmost layer and performing a greedy search downward. At each layer from the point's assigned level down to 0, a beam search with width ef_construct finds candidate neighbors. Up to m neighbors are selected using a heuristic that balances proximity and diversity. Bidirectional edges are added between the new point and its selected neighbors.

Key HNSW Parameters m (default 16): connections per node. Higher = better recall, more memory. Layer 0 uses m*2 connections.
ef_construct (default 100): build-time search width. Higher = better graph quality, slower indexing.
hnsw_ef (default 128): query-time beam width. Higher = better recall, slower queries.

HNSW Search Algorithm

Search Flow Through the HNSW Graph

1. Enter at the top layer via stored entry point (highest-level node)
2. Greedy descent (layers L_max to 1): move to nearest neighbor at each layer, beam size = 1
3. Beam search at layer 0: expand candidates using priority queue with width = hnsw_ef
4. Score unvisited neighbors, add better-than-worst to candidate and result queues
5. Return top-k results ranked by similarity score

The hnsw_ef parameter controls recall vs. latency at query time. Higher values explore more nodes for better results at the cost of more distance computations. Setting exact: true bypasses HNSW entirely for a brute-force scan with perfect recall.

Filterable HNSW

Standard HNSW has a critical problem with filtered search: if you filter out 90% of points, the graph becomes disconnected. Qdrant solves this with a multi-strategy approach.

Query-Time Strategy Selection

The query planner dynamically selects the best strategy per segment based on filter selectivity:

Filter Strategy Decision Flow

Evaluate filter selectivity: what fraction of points match?
Large result set: HNSW traversal, skip non-matching nodes
Medium: HNSW with extended ef to compensate for skipped nodes
Small: retrieve IDs from payload index, compute exact distances

The ACORNAn algorithm introduced in v1.16 that explores two-hop neighbors during filtered HNSW traversal. When a neighbor doesn't match the filter, ACORN checks its neighbors, finding matching points that would otherwise require many more hops. algorithm (v1.16+) further improves filtered search by performing two-hop exploration -- when a neighbor doesn't match, it checks that neighbor's neighbors.

Quantization

Quantization compresses vectors from 32-bit floats to lower-precision representations, reducing memory footprint while maintaining search quality.

Scalar Quantization (int8)

Each float dimension is mapped linearly to [0, 255]. Reduces memory 4x with typically less than 1% recall loss. Works with virtually all embedding models -- the safest default choice.

Product Quantization

Divides vectors into sub-vectors and quantizes each with a learned codebook. Achieves 8-32x compression but with larger accuracy trade-offs. Best for high-dimensional vectors (768+).

Binary Quantization

Each float dimension becomes a single bit. Achieves 32x compression and enables fast Hamming distance. Only works with compatible models (OpenAI text-embedding-3-large, Cohere embed-v3).

Asymmetric Quantization (v1.15+) The query vector remains unquantized while stored vectors are quantized. Distances are computed between full-precision query and quantized database vectors, improving accuracy without increasing memory usage. This is the default when quantization is enabled.

Storage Engine

Qdrant stores vectors in configurable formats: in-memory (fastest, most RAM), memory-mapped files (mmap -- uses OS page cache, allows datasets larger than RAM), and chunked mmap (optimized for large segments). Payloads use RocksDB-backed or mmap storage.

The payload indexing system (StructPayloadIndex) supports keyword, full-text, numeric range, geo-bounding box, datetime, UUID, and boolean indexes. These are built explicitly via API call -- you create an index on a field, and Qdrant builds the appropriate structure.

Every mutation passes through the per-shard Write-Ahead Log before being applied to segments, ensuring crash recovery. On restart, uncommitted WAL entries are replayed to restore state.