High-Level Design
MLflow follows a client-server architecture with a clear separation between metadata storage (backend store) and file storage (artifact store). The tracking server acts as the gateway for all operations.
This design was deliberate: ML workflows produce two fundamentally different types of data. Metadata is small, structured, and needs efficient querying. Artifacts are large, unstructured, and need scalable storage. Mixing them would compromise one or the other.
System Components
Click any component below to see its details and purpose in the architecture.
MLflow Architecture
Design Decisions
REST API over Direct Database Access
Clients never talk to the backend store directly — everything goes through the tracking server’s REST API. This provides access control, schema evolution, and the ability to swap storage backends without changing client code. The trade-off is an extra network hop, mitigated by batched and async logging.
Separation of Metadata and Artifacts
This is MLflow’s most important architectural decision. Parameters and metrics go to a fast SQL database; model files and plots go to cheap object storage. You get the query performance of PostgreSQL for experiment comparison and the scale of S3 for artifact durability.
Pluggable Storage Backends
Both stores are abstracted behind interfaces (AbstractStore for tracking, artifact repository interfaces for artifacts). You can run SQLite locally, PostgreSQL in staging, and Databricks in production — without changing ML code.
Fluent API with Thread-Local Context
The fluent API (mlflow.log_param()) uses thread-local storage to track the active run. This makes the API clean for interactive use but can cause confusion in multi-threaded code. The explicit MlflowClient API avoids this by requiring run IDs for every call.