High-Level Design
Qdrant follows a client-server architecture where the server is a standalone Rust binary exposing REST and gRPCA high-performance RPC framework using HTTP/2 and Protocol Buffers for binary serialization. Qdrant exposes gRPC on port 6334 for low-latency, high-throughput operations. APIs. The system is layered: each layer has a clear responsibility, from API handling down to storage.
Qdrant System Architecture
Design Decisions
Rust over C++ or Go
Qdrant chose Rust for memory safety without garbage collection overhead. A vector database holds large data in memory and performs latency-sensitive operations. Rust's ownership model prevents memory leaks and data races at compile time, while zero-cost abstractions and SIMD intrinsics deliver C-level performance. No GC means no unpredictable pauses during search.
Client-Server over Embedded
Unlike embedded databases, Qdrant runs as a server. This enables multi-client access, horizontal scaling via sharding and replication, and operational features like rolling upgrades. The trade-off is network overhead, but for AI applications making API calls, this is negligible compared to embedding model latency.
HNSW over IVF or LSH
HNSW provides the best combination of recall and speed for general-purpose vector search. Its graph structure also lends itself to filter-aware extensions -- Qdrant can skip non-matching nodes during traversal, which is much harder with partition-based indexes like IVF.
Segment-Based Storage
The segment architecture separates write-optimized and read-optimized data paths. New writes go to small, unindexed appendable segments. The optimizer periodically seals these and builds HNSW indexes, creating immutable search-optimized segments. This avoids the tension between write and search performance.