Implementation Details — Qdrant Course

🗂 Source Map

▼

lib/segment/src/index/hnsw_index/hnsw.rs -- HNSWIndex struct
lib/segment/src/index/hnsw_index/graph_layers.rs -- GraphLayers + search
lib/segment/src/vector_storage/query_scorer/mod.rs -- QueryScorer trait
lib/segment/src/types.rs -- Distance enum
lib/segment/src/segment/mod.rs -- Segment struct
lib/segment/src/entry/entry_point.rs -- SegmentEntry trait
lib/collection/src/shards/local_shard/mod.rs -- LocalShard struct
lib/segment/src/payload_storage/payload_storage_enum.rs -- PayloadStorageEnum
lib/segment/src/index/hnsw_index/graph_links.rs -- GraphLinks storage

Repository: qdrant/qdrant @ v1.17.1

Getting Started

bash

# Start Qdrant server
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Install Python client
pip install qdrant-client

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

# Create a collection with HNSW tuning
client.create_collection(
    collection_name="my_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

# Upsert points with vectors and payloads
client.upsert(collection_name="my_collection", points=[
    PointStruct(id=1, vector=[0.05, 0.61, 0.76, ...],
               payload={"city": "Berlin", "price": 299}),
])

# Search with filtering
results = client.query_points(
    collection_name="my_collection",
    query=[0.2, 0.1, 0.9, ...],
    query_filter={"must": [{"key": "city", "match": {"value": "Berlin"}}]},
    limit=10,
)

Source Code Walkthrough

HNSWIndex -- The Core Index Structure

The primary vector index wrapping a multi-layer graph with vector storage, quantization, and payload filtering.

Rust lib/segment/src/index/hnsw_index/hnsw.rs View on GitHub ↗

pub struct HNSWIndex {
    id_tracker: Arc<AtomicRefCell<IdTrackerEnum>>,
    vector_storage: Arc<AtomicRefCell<VectorStorageEnum>>,
    quantized_vectors: Arc<AtomicRefCell<Option<QuantizedVectors>>>,
    payload_index: Arc<AtomicRefCell<StructPayloadIndex>>,
    config: HnswGraphConfig,
    path: PathBuf,
    graph: GraphLayers,
    searches_telemetry: HNSWSearchesTelemetry,
    is_on_disk: bool,
}

Note the Arc<AtomicRefCell<...>> pattern: shared ownership with interior mutability for concurrent reads with serialized writes. The optional quantized_vectors is populated only when quantization is configured.

GraphLayers -- The Multi-Layer HNSW Graph

Rust lib/segment/src/index/hnsw_index/graph_layers.rs View on GitHub ↗

pub struct GraphLayers {
    pub(super) hnsw_m: HnswM,
    pub(super) links: GraphLinks,
    pub(super) entry_points: EntryPoints,
    pub(super) visited_pool: VisitedPool,
}

// The core search method signature
fn search_on_level(
    &self,
    level_entry: ScoredPointOffset,
    level: usize,
    ef: usize,
    points_scorer: &mut FilteredScorer,
    is_stopped: &AtomicBool,
) -> CancellableResult<FixedLengthPriorityQueue<ScoredPointOffset>>

FilteredScorer wraps a distance scorer with a filter check, allowing beam search to skip non-matching points during traversal. VisitedPool maintains thread-local bitsets to prevent re-scoring.

QueryScorer -- Distance Computation

Rust lib/segment/src/vector_storage/query_scorer/mod.rs View on GitHub ↗

pub trait QueryScorer {
    type TVector: ?Sized;
    fn score_stored(&self, idx: PointOffsetType) -> ScoreType;
    fn score_stored_batch(
        &self, ids: &[PointOffsetType], scores: &mut [ScoreType]
    );
    fn score(&self, v2: &Self::TVector) -> ScoreType;
    fn score_internal(
        &self, a: PointOffsetType, b: PointOffsetType
    ) -> ScoreType;
}

score_stored_batch enables CPU prefetch optimization when scoring multiple stored vectors. The generic TMetric parameter plugs in different distance functions without changing the search algorithm.

Distance Enum

Rust lib/segment/src/types.rs View on GitHub ↗

pub enum Distance {
    Cosine,
    Euclid,
    Dot,
    Manhattan,
}

Cosine and Dot prefer larger scores; Euclidean and Manhattan prefer smaller. Qdrant pre-normalizes Cosine vectors at insertion, making Cosine equivalent to Dot on unit vectors.

Segment -- The Storage Unit

Rust lib/segment/src/segment/mod.rs View on GitHub ↗

pub struct Segment {
    pub uuid: Uuid,
    pub version: Option<SeqNumberType>,
    pub segment_path: PathBuf,
    pub id_tracker: Arc<AtomicRefCell<IdTrackerEnum>>,
    pub vector_data: HashMap<VectorNameBuf, VectorData>,
    pub payload_index: Arc<AtomicRefCell<StructPayloadIndex>>,
    pub payload_storage: Arc<AtomicRefCell<PayloadStorageEnum>>,
    pub appendable_flag: bool,
    pub segment_type: SegmentType,
    pub segment_config: SegmentConfig,
}

The vector_data HashMap maps vector names to VectorData structs (index + storage + quantized reps). This is how named vectors work: independent indexes per vector name. appendable_flag distinguishes write-optimized from read-optimized segments.

LocalShard -- Shard Coordination

Rust lib/collection/src/shards/local_shard/mod.rs View on GitHub ↗

pub struct LocalShard {
    collection_name: CollectionId,
    pub(super) segments: LockedSegmentHolder,
    pub(super) wal: RecoverableWal,
    pub(super) update_handler: Arc<Mutex<UpdateHandler>>,
    pub(super) path: PathBuf,
    pub(super) optimizers: ArcSwap<Vec<Arc<Optimizer>>>,
}

ArcSwap for optimizers allows hot-swapping configurations without restart. The WAL (RecoverableWal) replays uncommitted entries on crash recovery.