MinIO acts as an object storage backend for workflow artifacts, offering a scalable and S3-compatible way to store and retrieve the outputs of your computational processes.

Let’s see it in action. Imagine a simple workflow that processes a CSV file, generates a plot, and saves both the processed data and the plot as artifacts.

from minio import Minio
from minio.error import S3Error
import pandas as pd
import matplotlib.pyplot as plt
import os

# MinIO connection details
MINIO_ENDPOINT = "localhost:9000"
MINIO_ACCESS_KEY = "minioadmin"
MINIO_SECRET_KEY = "minioadmin"
BUCKET_NAME = "workflow-artifacts"

# Initialize MinIO client
client = Minio(
    MINIO_ENDPOINT,
    access_key=MINIO_ACCESS_KEY,
    secret_key=MINIO_SECRET_KEY,
    secure=False # Set to True if using TLS
)

# --- Workflow Step 1: Load and Process Data ---
def process_data(input_csv_path):
    df = pd.read_csv(input_csv_path)
    df['new_column'] = df['value'] * 2
    processed_csv_path = "processed_data.csv"
    df.to_csv(processed_csv_path, index=False)
    return processed_csv_path

# --- Workflow Step 2: Generate Plot ---
def generate_plot(processed_csv_path):
    df = pd.read_csv(processed_csv_path)
    plt.figure(figsize=(10, 6))
    plt.bar(df['category'], df['new_column'])
    plt.title("Processed Data Plot")
    plt.xlabel("Category")
    plt.ylabel("Processed Value")
    plot_path = "processed_plot.png"
    plt.savefig(plot_path)
    return plot_path

# --- Workflow Step 3: Upload Artifacts to MinIO ---
def upload_artifact(file_path, object_name):
    try:
        # Make bucket if it doesn't exist
        found = client.bucket_exists(BUCKET_NAME)
        if not found:
            client.make_bucket(BUCKET_NAME)
            print(f"Bucket '{BUCKET_NAME}' created.")
        else:
            print(f"Bucket '{BUCKET_NAME}' already exists.")

        # Upload the file
        client.fput_object(BUCKET_NAME, object_name, file_path)
        print(f"Successfully uploaded '{file_path}' to MinIO as '{object_name}' in bucket '{BUCKET_NAME}'.")
    except S3Error as exc:
        print(f"Error uploading artifact: {exc}")

if __name__ == "__main__":
    # Create a dummy input CSV for demonstration
    dummy_data = {'category': ['A', 'B', 'C', 'D'], 'value': [10, 20, 15, 25]}
    dummy_df = pd.DataFrame(dummy_data)
    input_csv = "input_data.csv"
    dummy_df.to_csv(input_csv, index=False)

    # Run workflow
    processed_data_file = process_data(input_csv)
    plot_file = generate_plot(processed_data_file)

    # Upload artifacts
    upload_artifact(processed_data_file, "processed_data/data.csv")
    upload_artifact(plot_file, "plots/report.png")

    # Clean up local files
    os.remove(input_csv)
    os.remove(processed_data_file)
    os.remove(plot_file)

This script demonstrates a basic workflow: loading data, transforming it, generating a visual, and then storing both the transformed data and the plot in MinIO. The minio-py SDK handles the interaction, treating MinIO as a distributed filesystem where files are objects within buckets.

The core problem MinIO solves here is providing a centralized, durable, and accessible location for the outputs of complex or distributed workflows. Instead of scattering files across local disks, cloud object storage, or network shares, all artifacts land in one place. This is crucial for reproducibility, auditing, and enabling downstream processes to easily find and consume these outputs.

Internally, MinIO is a distributed object storage server. It shards data across multiple nodes, providing high availability and scalability. When you upload an object, MinIO breaks it down, distributes the pieces, and stores them. Retrieval reassembles these pieces. Its S3 compatibility means you can use standard S3 tools and SDKs, making it interoperable with a vast ecosystem of applications.

The key levers you control are:

  • Endpoint, Access Key, Secret Key: These define how your application connects to the MinIO instance.
  • Bucket Name: This is the top-level container for your objects, analogous to a filesystem directory. You define logical buckets for different types of artifacts or workflows.
  • Object Name: This is the unique identifier for an object within a bucket, akin to a file path. You can use prefixes (e.g., processed_data/data.csv) to organize objects hierarchically.
  • secure parameter: Determines whether to use TLS/SSL for the connection.

When uploading, the client.fput_object(BUCKET_NAME, object_name, file_path) call is the workhorse. It takes the local file_path, reads its content, and sends it to MinIO to be stored as an object named object_name within BUCKET_NAME. The MinIO server then handles the distribution and persistence of that data.

A subtle but powerful aspect of using MinIO for workflow artifacts is its ability to serve as a cache. If a workflow step produces an artifact that has already been uploaded with the exact same object name, you can choose to skip re-computation. The fput_object operation, by default, will overwrite existing objects. However, you can implement logic to check for object existence (client.stat_object) before uploading, or leverage versioning if your workflow needs to retain multiple versions of an artifact. This prevents redundant work and speeds up iterative development.

The next step in managing workflow artifacts might involve implementing versioning or exploring MinIO’s lifecycle policies for automatic cleanup of old artifacts.

Want structured learning?

Take the full Argo-workflows course →