Argo Workflows doesn’t just run tasks; it orchestrates them, and managing how many run at once is key to efficiency and stability.
Let’s see it in action with a simple workflow that has a few parallel steps.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: parallel-example-
spec:
entrypoint: parallel-steps
templates:
- name: parallel-steps
dag:
tasks:
- name: echo-a
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from A"
- name: echo-b
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from B"
- name: echo-c
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from C"
- name: echo-template
container:
image: alpine:latest
command: ["echo"]
args: ["{{=inputs.parameters.message}}"]
When you submit this, Argo Workflows will spin up pods for echo-a, echo-b, and echo-c concurrently. This is the default behavior for tasks defined within a dag template that don’t have explicit dependencies on each other. They’ll run as fast as your Kubernetes cluster can provision the pods.
The core concept here is parallelism, which Argo Workflows manages primarily through the dag template and its task definitions. Within a dag, tasks without direct dependency links are considered candidates for parallel execution. You can also explicitly define parallelism for steps that do have dependencies, but we’ll get to that.
The real power comes when you start managing this parallelism. What if you have 100 tasks that can run in parallel, but your cluster can only handle 10 pods at a time without choking? You don’t want to overwhelm your system. This is where Argo’s built-in mechanisms come into play.
The most direct way to control parallelism for a group of tasks is using the parallelism field within a dag or step template. Let’s say you want to run those three echo tasks, but only allow two of them to run at any given moment:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: controlled-parallel-example-
spec:
entrypoint: controlled-parallel-steps
templates:
- name: controlled-parallel-steps
dag:
parallelism: 2 # <-- This limits concurrent tasks in this DAG
tasks:
- name: echo-a
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from A"
- name: echo-b
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from B"
- name: echo-c
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from C"
- name: echo-template
container:
image: alpine:latest
command: ["echo"]
args: ["{{=inputs.parameters.message}}"]
In this modified example, even though echo-a, echo-b, and echo-c are all ready to run, only two will start. Once one of those finishes, the third will then be able to start. This parallelism field acts as a throttle for the tasks within that specific template.
You can also apply parallelism constraints at the workflow level itself, which affects all DAGs and steps within that workflow. This is done via the parallelism field in the WorkflowSpec:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: global-parallel-example-
spec:
parallelism: 5 # <-- Limits total concurrent pods for this entire workflow
entrypoint: parallel-steps
templates:
- name: parallel-steps
dag:
tasks:
- name: echo-a
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from A"
- name: echo-b
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from B"
- name: echo-c
template: echo-template
arguments:
parameters:
- name: message
value: "Hello from C"
- name: echo-template
container:
image: alpine:latest
command: ["echo"]
args: ["{{=inputs.parameters.message}}"]
Here, the spec.parallelism: 5 limits the total number of pods that Argo Workflows will launch for this entire workflow run. If this workflow had multiple DAGs or steps, this limit would apply across all of them. This is crucial for preventing runaway resource consumption on your Kubernetes cluster.
Beyond DAGs, the steps template also supports parallelism. When you have a sequence of steps, you can define parallelism for groups of steps that are ready to run.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: steps-parallel-example-
spec:
entrypoint: steps-parallel
templates:
- name: steps-parallel
steps:
- - name: step-a
template: echo-template
arguments:
parameters:
- name: message
value: "Step A"
- name: step-b
template: echo-template
arguments:
parameters:
- name: message
value: "Step B"
- - name: step-c
template: echo-template
arguments:
parameters:
- name: message
value: "Step C"
- name: step-d
template: echo-template
arguments:
parameters:
- name: message
value: "Step D"
- name: echo-template
container:
image: alpine:latest
command: ["echo"]
args: ["{{=inputs.parameters.message}}"]
In this steps structure, the first row has step-a and step-b. By default, these would run in parallel. The second row has step-c and step-d, which would also run in parallel after both step-a and step-b complete.
To control parallelism within a row of steps, you again use the parallelism field, this time on the steps array itself:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: controlled-steps-parallel-example-
spec:
entrypoint: controlled-steps-parallel
templates:
- name: controlled-steps-parallel
steps:
- - name: step-a
template: echo-template
arguments:
parameters:
- name: message
value: "Step A"
- name: step-b
template: echo-template
arguments:
parameters:
- name: message
value: "Step B"
- name: step-c
template: echo-template
arguments:
parameters:
- name: message
value: "Step C"
# This steps array has parallelism: 2
- - name: step-d
template: echo-template
arguments:
parameters:
- name: message
value: "Step D"
- name: step-e
template: echo-template
arguments:
parameters:
- name: message
value: "Step E"
- name: echo-template
container:
image: alpine:latest
command: ["echo"]
args: ["{{=inputs.parameters.message}}"]
If you applied parallelism: 1 to the second steps array (the one containing step-d and step-e), they would run sequentially, one after the other, even though they are defined in the same row.
The most overlooked aspect of controlling parallelism isn’t just setting limits, but understanding the scope of those limits. The parallelism field on a dag template only affects tasks within that specific DAG. The parallelism field on a WorkflowSpec is a global limit for the entire workflow. When both are present, the more restrictive limit wins for the tasks within that DAG. For example, if WorkflowSpec.parallelism is 5 and dag.parallelism is 2, only 2 tasks from that DAG will run concurrently, and the global limit of 5 still applies to the rest of the workflow.
This granular control is essential for optimizing resource utilization and preventing cascading failures by overloading your Kubernetes cluster. When you see workflows taking too long or failing due to resource exhaustion, checking these parallelism settings is your first step.
The next logical challenge is managing dependencies between parallel tasks, which leads into advanced DAG features like when conditions and depends.