Common Q&A — Qdrant Course

How much RAM does Qdrant need for my dataset? ▼

Vectors in RAM: N * D * 4 bytes for float32. 1M vectors at 1536d = ~5.7 GB.

HNSW graph: ~N * m * 2 * 8 bytes. 1M vectors, m=16 = ~256 MB.

With scalar quantization (int8): Vector RAM drops to N * D * 1 byte -- 4x reduction. 1M at 1536d = ~1.4 GB.

With mmap: Vectors on disk, RAM depends on OS page cache. Full dataset can exceed RAM.

Rule of thumb: Budget ~2 bytes/dimension/point for quantized + indexed storage. 10M points at 768d = ~15 GB.

How do I choose between Cosine, Euclidean, and Dot Product? ▼

Cosine: Measures angle between vectors, ignoring magnitude. Best for most text embedding models (OpenAI, sentence-transformers). Qdrant pre-normalizes at insert time, making it as fast as Dot Product.

Dot Product: Measures both direction and magnitude. Use when magnitude carries meaning (popularity-weighted embeddings). Identical to Cosine on L2-normalized vectors.

Euclidean: Straight-line distance. Use for geometric applications or when the model documentation specifically recommends L2.

When in doubt, use Cosine. It is the most universally compatible choice.

When should I enable quantization, and which type? ▼

Scalar (int8): 4x memory reduction, <1% recall loss. Works with all models. Safest default.

Binary: 32x reduction with Hamming distance. Only for compatible models (OpenAI text-embedding-3-large, Cohere embed-v3). Test recall first.

Product: 8-32x reduction for high-dim vectors (1536+). Larger accuracy trade-off.

No quantization: When accuracy is paramount and RAM is abundant, or for small datasets.

How do I handle multi-tenancy? ▼

Payload filtering (recommended): Add tenant_id to every point, create keyword index, include in every query filter. Qdrant's filter-aware HNSW handles this efficiently.

Shard-per-tenant (v1.16+): Large tenants get dedicated shards, small tenants share a fallback shard. Physical isolation with cost efficiency.

Collection-per-tenant: Strongest isolation but scales poorly beyond ~100 tenants due to per-collection overhead.

How do I optimize search latency for production? ▼

1. Enable scalar quantization with always_ram=true. 2. Tune hnsw_ef (start at 128, lower for latency, raise for recall). 3. Build payload indexes on all filtered fields. 4. Use indexed_only: true to skip unindexed segments. 5. Right-size HNSW m parameter (16-32). 6. Monitor optimizer -- rising segment count means it is falling behind.

How does Qdrant compare to pgvector? ▼

Performance: Qdrant is typically 5-20x faster, especially with filtering. SIMD-optimized, filterable HNSW vs. pgvector's general-purpose extension.

Scalability: Qdrant supports horizontal sharding and replication. pgvector is limited to PostgreSQL's options.

Features: Qdrant offers sparse vectors, named multi-vectors, multiple quantization methods, hybrid search with RRF.

pgvector wins when: small dataset (<1M vectors), simple filtering, you want a single database, operational simplicity matters most.

What is the maximum dataset size on a single node? ▼

100M+ vectors with appropriate configuration. 100M at 768d with scalar quantization needs ~76 GB for quantized vectors + ~12 GB for HNSW graph. Fits on 128 GB RAM.

Beyond ~200M vectors, distributed deployment with multiple shards is recommended. Benchmarks show 3ms p99 latency for 1M OpenAI embeddings.

How does Qdrant handle updates and deletes? ▼

Upserts: New points go to the appendable segment. Old versions are soft-deleted via ID tracker. During search, only the latest version is returned.

Deletes: Soft-deleted via ID tracker. Points remain in HNSW graph but are excluded from results. VacuumOptimizer periodically rebuilds segments without deleted points.

Why not modify HNSW in-place? Removing graph nodes requires complex rewiring that degrades quality. Soft-delete + periodic rebuild is simpler, faster per operation, and produces better graph quality.

📚 References & Resources