AWS Step Functions is a serverless orchestration service that lets you coordinate multiple AWS services into business workflows.

Here’s a Step Functions workflow executing:

{
  "Comment": "A simple example of a Step Functions workflow.",
  "StartAt": "HelloWorld",
  "States": {
    "HelloWorld": {
      "Type": "Pass",
      "Result": "Hello World!",
      "Next": "Goodbye"
    },
    "Goodbye": {
      "Type": "Pass",
      "Result": "Goodbye World!",
      "End": true
    }
  }
}

This JSON defines a state machine. The StartAt field specifies the initial state, which is HelloWorld. This state is a Pass state, meaning it simply passes its input to its output. In this case, it outputs "Hello World!". The Next field then directs the workflow to the Goodbye state, which also passes its input and outputs "Goodbye World!". Finally, End: true signifies the end of the workflow.

This simple example demonstrates the core concept: defining a sequence of states and transitions. In a real-world scenario, these states would typically involve invoking AWS Lambda functions, interacting with SQS queues, calling DynamoDB, or even triggering other Step Functions workflows. The power lies in coordinating these disparate services into a coherent, resilient, and observable process.

Consider a common use case: an order processing system.

  1. CreateOrder: A Lambda function that receives order details, validates them, and stores them in a database (e.g., DynamoDB).
  2. ProcessPayment: Another Lambda function that calls a payment gateway API.
  3. UpdateInventory: A Lambda function that decrements stock levels in a separate inventory service.
  4. SendConfirmationEmail: A Lambda function that sends an email to the customer.

Step Functions orchestrates these, handling retries, error conditions, and parallel execution if needed.

The state machine definition describes the flow using states and transitions.

  • States: Represent individual steps in the workflow. Common types include:

    • Task: Executes a service action (e.g., invoking a Lambda function, sending a message to SQS).
    • Pass: Passes its input to its output, useful for debugging or defining intermediate points.
    • Choice: Implements conditional logic based on the state’s input.
    • Parallel: Executes multiple branches of states concurrently.
    • Wait: Pauses execution for a specified duration.
    • Succeed / Fail: Explicitly ends the workflow in a success or failure state.
  • Transitions: Define how the workflow moves from one state to another. This is typically handled by Next, End, or Choices fields.

When a Step Functions state machine executes, it creates an execution. Each execution has a unique ID and a history of all state transitions, inputs, and outputs. This history is invaluable for debugging and auditing.

The input and output of each state are JSON objects. Step Functions uses the Amazon States Language (ASL), a JSON-based language, to define state machines. ASL allows for powerful data manipulation between states using a JsonPath syntax.

For instance, if a CreateOrder Lambda function returns an order ID, you can pass that ID to the ProcessPayment state:

{
  "Comment": "Order Processing Workflow",
  "StartAt": "CreateOrder",
  "States": {
    "CreateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:createOrderFunction",
      "ResultPath": "$.order",
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:processPaymentFunction",
      "InputPath": "$.order.orderId",
      "ResultPath": "$.paymentResult",
      "Next": "UpdateInventory"
    },
    // ... other states
  }
}

In CreateOrder, ResultPath: "$.order" means the output of the Lambda function will be placed under a key named order in the state’s output JSON. In ProcessPayment, InputPath: "$.order.orderId" means only the orderId from the state’s input (which is the output of CreateOrder) will be passed as input to the processPaymentFunction.

The one thing most people don’t realize is how much control you have over the state transition itself, not just the task execution. The Catch and Retry fields on Task states allow you to define sophisticated error handling directly within the state machine definition. You can specify error codes to catch, the ARN of a state to transition to on error, and retry policies with backoff and jitter. This means you can build highly resilient workflows without writing explicit error-handling code in every Lambda function.

You’ll next want to explore how to integrate Step Functions with other AWS services, particularly for event-driven architectures.

Want structured learning?

Take the full Aws course →