The Building Blocks of MLflow

MLflow organizes ML work around a few core abstractions. Understanding these concepts is the foundation for everything else in the platform. Click each card below to reveal the analogy and details.

Experiment
A named container for related training runs. Groups all attempts at solving one ML problem.

Analogy: Think of an experiment as a lab notebook. A scientist has one notebook per research question. Inside that notebook, every attempt (run) is recorded — different reagent amounts, temperatures, and observations. The notebook keeps everything organized.

Experiments define the organizational boundary for finding, comparing, and sharing related work. They also set the default artifact storage location for all runs within them.

Run
A single execution of ML code — one training session with specific inputs and outputs.

Analogy: If the experiment is the lab notebook, a run is a single page. It records exactly what you did (parameters), what happened (metrics), and what you produced (artifacts). Each page has a unique ID for future reference.

Runs are the atomic unit of work in MLflow. Every parameter, metric, artifact, and tag is tied to a run. Comparing models means comparing runs.

Parameters
Key-value pairs describing inputs — hyperparameters like learning rate, batch size, model type.

Analogy: Parameters are recipe ingredients. When a chef tries a new recipe, they write down “2 cups flour, 1 cup sugar, 350°F oven.” If it works, they know exactly what to reproduce. If it fails, they know what to change.

Parameters are logged once per run and are immutable. They capture the “what did I configure?” question, supporting reproducibility. MLflow supports values up to 6000 characters.

Metrics
Numerical measurements tracking model performance — accuracy, loss, F1 score, latency.

Analogy: Metrics are vital signs monitors. Just as a heart monitor tracks heart rate and oxygen over time, metrics track your model’s health during training — is accuracy improving? Is loss plateauing? Is overfitting starting?

Unlike parameters, metrics can be logged at different steps (epochs), creating time series. MLflow stores the full history for learning curve visualization and comparison across runs.

Artifacts
Output files — model weights, plots, data samples, configuration files.

Analogy: Artifacts are physical lab products. The notebook records what you did, but the test tube (model file), photographs (plots), and data printouts (eval results) are the artifacts. You store them carefully because they are what you ship to production.

Artifacts are stored separately from metadata (in S3, GCS, or local filesystem) because they can be gigabytes in size. This separation is what makes MLflow scalable.

Model (MLmodel Format)
A standard packaging format wrapping model weights with metadata, flavors, and dependencies.

Analogy: The MLmodel format is a shipping container for models. Just as a shipping container has standardized dimensions so any port can handle it, the MLmodel format provides a standardized wrapper so any deployment system can serve the model — regardless of framework.

The key is the flavor system: a model can have multiple flavors (e.g., sklearn + python_function). The pyfunc flavor is the universal interface — any MLflow model can be called with model.predict(data).

Model Registry
A centralized store for versioning, staging, and promoting models to production.

Analogy: The model registry is a wine cellar catalog. Each wine (model) has a name and vintage (version). The sommelier tracks which are aging (staging), ready to serve (production), or retired (archived). Anyone can look up a wine’s origin and status.

The registry provides aliases (“champion”, “challenger”), version tracking, and audit trails. It answers the critical question: “which model is deployed right now?”

How They Fit Together

Here is the flow from writing ML code to deploying a model in production:

MLflow Lifecycle Flow

Set Experiment
Start Run
Log Params
Log Metrics
Log Model
Register Model
Deploy

When you run a training script, MLflow creates a run inside your experiment. During training, parameters are logged once, and metrics are logged at each step. The trained model is saved as an artifact using the MLmodel format. If it performs well, you register it in the model registry, creating a new version that can be promoted to production.

💡
Key Insight: The separation between metadata (params, metrics, tags in the backend store) and files (models, plots in the artifact store) is what makes MLflow scale. Small structured data goes to a fast database; large binary files go to cheap object storage.