Should You Use DuckDB?
Answer a few questions to find out whether DuckDB is a good fit for your workload.
Your workload is analytical, fits on one machine, and doesn't require multi-user concurrency. This is DuckDB's sweet spot -- you'll get fast OLAP performance with zero server setup.
Your workload is analytical but the data may exceed single-machine limits. DuckDB works well up to ~500GB-1TB with disk spilling. For truly massive datasets, consider MotherDuck (cloud DuckDB) or distributed systems like ClickHouse, Spark, or BigQuery.
DuckDB supports multiple concurrent readers but only one writer at a time. If your concurrency needs are read-heavy, DuckDB works in READ_ONLY mode from multiple processes. For full read-write concurrency, a client-server database like PostgreSQL is more appropriate.
DuckDB is designed for OLAP, not OLTP. For transactional workloads with point reads and writes, use PostgreSQL, MySQL, or SQLite. DuckDB supports one writer at a time and is optimized for scanning, not key-value lookups.
When to Use It
DuckDB excels in five categories of workloads, all centered on analytical processing without the overhead of a server.
Ad-Hoc Analytical Queries
You have Parquet, CSV, or JSON files and need answers fast. DuckDB queries these directly without a loading step -- SELECT AVG(price) FROM 'sales.parquet' WHERE region = 'EMEA' just works. This is DuckDB's sweet spot: a data scientist with a laptop and a question.
Data Pipeline Transformations
Replace pandas or Spark for small-to-medium ETL transformations (up to ~100GB). DuckDB's SQL engine is faster and more memory-efficient than pandas for aggregations and joins. FinQore cut their financial ETL pipeline from 8 hours to 8 minutes by replacing PostgreSQL with DuckDB.
Embedded Analytics
Ship DuckDB inside your product to provide analytical query capabilities without requiring users to set up a database server. Evidence uses DuckDB as a universal SQL engine for BI, and Rill uses it as their analytics backbone (3x--30x faster than SQLite for analytical queries).
Notebook Data Exploration
In Jupyter, R, or Observable notebooks, DuckDB provides instant SQL querying of DataFrames, Arrow tables, and files. Hex reports 5--10x speedups in notebook execution after switching to DuckDB.
Browser-Based Analytics
DuckDB compiles to WebAssembly, enabling analytical queries directly in web browsers. South Australia's government uses duckdb-wasm for their climate change data dashboard. Mosaic used it to explore 18M data points from the Gaia star catalog entirely in-browser.
When NOT to Use It
DuckDB is deliberately focused on analytical workloads. These are the scenarios where it is the wrong tool.
High-Concurrency OLTP Workloads
DuckDB is not a replacement for PostgreSQL, MySQL, or SQLite for transaction-heavy applications with hundreds of concurrent users doing point reads and writes. It supports one writer at a time and is optimized for scanning, not key-value lookups.
Multi-Terabyte Distributed Datasets
DuckDB runs on a single machine. If your data exceeds what one machine can handle (typically beyond ~500GB--1TB), use distributed systems like Apache Spark, ClickHouse, or BigQuery. MotherDuck extends DuckDB to the cloud with hybrid execution, but for truly massive datasets, native distributed systems are more appropriate.
Real-Time Streaming Ingestion
DuckDB is designed for batch-oriented analytical queries, not continuous streaming. For real-time event processing, use Kafka + Flink or similar streaming architectures. DuckDB works well for querying the results of streaming pipelines after they land in Parquet files.
Multi-User Shared Database
DuckDB is an embedded database -- it's designed for single-user or single-application access. If multiple services need to share a database with concurrent read-write access, use a client-server database like PostgreSQL.
Real-World Examples
Production deployments across industries show DuckDB handling everything from carbon analytics to AI dataset exploration.
Watershed
Carbon AnalyticsWatershed processes carbon footprint data for enterprises. They store customer datasets as Parquet files on Google Cloud Storage (largest: ~750MB, 17M rows) and use DuckDB to translate natural-language analytics requests into SQL. DuckDB handles 75,000 daily queries with 10x performance gains over their previous PostgreSQL setup.
Okta
Enterprise SecurityOkta, a Fortune 500 identity provider, uses DuckDB to process security telemetry at massive scale -- 7.5 trillion records. DuckDB's ability to efficiently scan and aggregate columnar data makes it viable for security analytics workloads that would traditionally require dedicated distributed infrastructure.
Hugging Face
AI Dataset AccessHugging Face integrated DuckDB to provide SQL querying across their 150,000+ AI/ML datasets. Users query datasets directly using hf:// protocol URLs, making it trivial to explore and filter training data without downloading entire datasets.
NSW Department of Education
Data PortalAustralia's NSW Department of Education uses DuckDB as part of a modern data stack (Dagster + dbt + dlt + Evidence) for their education data portal. DuckDB enables local-first development and testing of analytical pipelines before deploying to production.
Ibis Project
Large-Scale AnalyticsThe Ibis team processed 1.1 billion PyPI package download rows in 38 seconds on a laptop using only 1GB of RAM, demonstrating DuckDB's efficiency for large-scale analytical processing on commodity hardware.