High-Level Design

MLflow follows a client-server architecture with a clear separation between metadata storage (backend store) and file storage (artifact store). The tracking server acts as the gateway for all operations.

This design was deliberate: ML workflows produce two fundamentally different types of data. Metadata is small, structured, and needs efficient querying. Artifacts are large, unstructured, and need scalable storage. Mixing them would compromise one or the other.

System Components

Click any component below to see its details and purpose in the architecture.

MLflow Architecture

💻
ML Client (SDK)
Logging & queries
🌐
Tracking Server
REST API + UI
💾
Backend Store
Metadata (SQL)
📦
Artifact Store
Files (S3/GCS)
📑
Model Registry
Versioning

Design Decisions

REST API over Direct Database Access

Clients never talk to the backend store directly — everything goes through the tracking server’s REST API. This provides access control, schema evolution, and the ability to swap storage backends without changing client code. The trade-off is an extra network hop, mitigated by batched and async logging.

Separation of Metadata and Artifacts

This is MLflow’s most important architectural decision. Parameters and metrics go to a fast SQL database; model files and plots go to cheap object storage. You get the query performance of PostgreSQL for experiment comparison and the scale of S3 for artifact durability.

🔍
Why Stateless? The tracking server stores no state itself — all state lives in the backend and artifact stores. This means you can run multiple server instances behind a load balancer for high availability. A server crash loses no data, because data has already been persisted to the stores.

Pluggable Storage Backends

Both stores are abstracted behind interfaces (AbstractStore for tracking, artifact repository interfaces for artifacts). You can run SQLite locally, PostgreSQL in staging, and Databricks in production — without changing ML code.

Fluent API with Thread-Local Context

The fluent API (mlflow.log_param()) uses thread-local storage to track the active run. This makes the API clean for interactive use but can cause confusion in multi-threaded code. The explicit MlflowClient API avoids this by requiring run IDs for every call.