Argo Workflows doesn’t actually run workflows; it orchestrates them, and the controller is the component that makes this happen by constantly watching for new workflow resources and then taking action.

Let’s see the controller in action. Imagine we’ve just applied a simple workflow manifest:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: hello
  templates:
  - name: hello
    container:
      image: alpine:latest
      command: ["echo", "Hello World!"]

When this is applied, the Argo Workflows controller, running as a Deployment in your Kubernetes cluster, notices the new Workflow resource. Its primary job is to reconcile the desired state (the Workflow manifest) with the actual state of the cluster. It does this by creating Kubernetes Pod resources that execute the steps defined in your workflow.

Here’s the mental model:

  1. Watch and Inform: The controller continuously watches the Kubernetes API server for Workflow resources. When a new one appears, it’s added to an internal queue.
  2. Reconciliation Loop: The controller picks up a Workflow from the queue and begins its reconciliation process. It checks the spec of the Workflow to understand what needs to be done.
  3. Resource Generation: For each step (template) in the workflow, the controller translates it into a Kubernetes native resource, most commonly a Pod. It populates the Pod spec with the container image, command, arguments, environment variables, and any other configurations specified in the workflow template.
  4. Status Updates: As the Pods (and any other resources like Services or CronJobs) are created, the controller monitors their status. It updates the status field of the Workflow resource in the API server to reflect progress (e.g., Pending, Running, Succeeded, Failed).
  5. Eventing and Notifications: The controller can also trigger events or send notifications based on workflow completion or failure, configured via artifactRepository and workflowDefaults in its own ConfigMap.

The core configuration for the Argo Workflows controller resides in a Kubernetes ConfigMap named argo-workflows in the same namespace where the controller is deployed (typically argo).

Here’s a snippet of what that ConfigMap might look like:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argo-workflows
  namespace: argo
data:
  # controls artifact storage for workflow outputs
  artifactRepository: |
    s3:
      bucket: my-argo-artifacts
      keyPrefix: argo-workflows
      endpoint: s3.amazonaws.com
      region: us-east-1
      useTLS: "true"
  # default settings for all workflows
  workflowDefaults: |
    timezone: "UTC"
    archiveLogs: "true"
  # feature flags and tuning parameters
  controllerConfig: |
    parallelism: "100"
    executorThreads: "20"
    workflowMaxConcurrent: "1000"
    logLevel: "info"

The artifactRepository section is crucial. It tells Argo Workflows where to store logs and artifacts generated by your workflow steps. Without this, you won’t be able to retrieve output from your container steps. The workflowDefaults section allows you to set sensible defaults for all workflows, like the timezone or whether to automatically archive logs. The controllerConfig section offers fine-grained control over the controller’s behavior, such as how many workflows it can process concurrently (workflowMaxConcurrent) or how many pods it can create in parallel (parallelism).

The executorThreads setting influences how many concurrent operations the controller can perform related to executor management, like starting or stopping pods. If you have a high volume of short-lived workflows, you might need to tune this up. Conversely, if you see the controller struggling to keep up with many concurrent workflows, increasing workflowMaxConcurrent and parallelism can help, but be mindful of your Kubernetes cluster’s resource capacity.

A common point of confusion is how the controller interacts with the Kubernetes API. It doesn’t directly execute commands in pods; it creates pods and then watches their lifecycle events. The actual execution happens within the Kubernetes runtime (e.g., containerd, Docker) on your worker nodes.

If you encounter issues where workflows seem stuck in Pending state and their associated pods are not being created, it’s often related to RBAC permissions for the controller’s ServiceAccount. The controller needs broad permissions to create, get, list, and watch Pods, Events, Nodes, and Services across all namespaces (or specific ones if configured). The workflowMaxConcurrent parameter in controllerConfig dictates the maximum number of workflows that can be in the queue or running at any given time. If this is set too low, and you have many workflows being submitted, they will back up.

Want structured learning?

Take the full Argo-workflows course →