Argo Workflows can pause themselves and then resume later, but the mechanism isn’t about a human hitting a button. Instead, it’s driven by specific conditions within the workflow itself.
Let’s see it in action. Imagine a workflow that needs to wait for an external event, like a file appearing in an S3 bucket.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: pause-resume-example-
spec:
entrypoint: main
templates:
- name: main
steps:
- - name: wait-for-file
template: external-signal
- - name: process-data
template: process
dependencies:
- wait-for-file
- name: external-signal
container:
image: alpine
command: ["sh", "-c"]
args:
- |
echo "Waiting for signal..."
# In a real scenario, this would poll an external system
# For demonstration, we'll simulate a delay and then exit successfully
sleep 60
echo "Signal received!"
# This is the key: when this step completes, it signals the workflow to proceed
# If this step *didn't* complete (e.g., due to an error or manual intervention),
# the workflow would stay paused here.
- name: process
container:
image: alpine
command: ["echo", "Processing data..."]
In this example, the wait-for-file step is a placeholder. In a real-world scenario, this step would contain logic to poll an external system (like S3, a database, or an API) and wouldn’t exit until a specific condition is met. When this step successfully completes, the workflow resumes to the process-data step. If wait-for-file were to fail, the workflow would stop. The "pause" here is implicit: the workflow is waiting for wait-for-file to finish.
The core problem Argo Workflows solves with this pattern is orchestrating tasks that depend on external, asynchronous events. It allows complex processes to be broken down into stages, where each stage can represent a waiting period for something outside the workflow’s immediate control to happen. This prevents the workflow from consuming resources while it’s idle, waiting for an external signal.
Internally, when a step completes, Argo marks that step as successful. If the next step in the sequence has dependencies that are now all met (meaning all preceding steps in that stage have completed), Argo schedules that next step. If a step is designed to "wait" (like our placeholder external-signal), it simply keeps running its container until that container exits. The workflow engine doesn’t actively "pause" and "resume" in the sense of a debugger; it simply waits for the current active task to finish.
The primary lever you control is the logic within the container of the "waiting" step. This container needs to implement the polling or waiting mechanism. It could be a simple sleep in a script, a more complex loop checking an S3 object’s existence, or a dedicated Argo artifact repository driver that waits for an artifact to be uploaded. The success or failure of this container’s execution dictates whether the workflow proceeds or halts.
You can also use suspend templates to explicitly pause a workflow. These are not about waiting for an external event, but rather for a manual resume action.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: manual-pause-resume-
spec:
entrypoint: main
templates:
- name: main
steps:
- - name: initial-task
template: hello
- - name: pause-for-review
template: suspend-template
- - name: resume-task
template: goodbye
dependencies:
- pause-for-review
- name: hello
container:
image: alpine
command: ["echo", "Hello!"]
- name: suspend-template
suspend:
template: manual-resume-message
- name: manual-resume-message
container:
image: alpine
command: ["echo", "Workflow paused. Waiting for manual resume."]
- name: goodbye
container:
image: alpine
command: ["echo", "Goodbye!"]
When the suspend-template is reached, the workflow’s status becomes Suspended. It will stay in this state indefinitely until someone triggers a resume action, typically via the argo resume CLI command or the Argo UI. This is useful for human-in-the-loop processes, like manual approvals before proceeding.
The suspend template itself doesn’t run a container; its sole purpose is to halt the workflow execution at that point. The template field within suspend specifies a template that will be executed after the workflow is resumed. This is where you’d put a message or a notification task to inform operators that the workflow is waiting.
The resume action isn’t about the workflow checking a condition; it’s an external command that tells the Argo controller to change the workflow’s status from Suspended back to Running. The controller then looks for the next step that can be executed (in our example, resume-task which depends on pause-for-review).
The most surprising thing about suspend templates is that the template specified within the suspend block is executed after the resume action, not during the suspension. The suspension itself is a state change managed by the controller, not an active task. If you want to display a message while the workflow is suspended, you’d typically have a separate step before the suspend that prints a message, or you’d rely on the workflow’s status and events in the Argo UI to indicate it’s paused.
The next concept you’ll likely encounter is handling retries and error conditions within these waiting or suspended states.