You can run containerized applications on AWS Elastic Container Service (ECS) in two primary ways: as long-running services or as one-off jobs. Choosing the right one is crucial for cost, reliability, and operational efficiency, and the distinction often trips people up because they look so similar on the surface.

Let’s see what a long-running service looks like in action. Imagine you have a web application that needs to be available 24/7. You’d define an ECS Service, and ECS would ensure that a specified number of tasks (your containers) are always running and healthy. If a task crashes, ECS automatically starts a new one to replace it.

{
  "family": "my-web-app",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "web-server",
      "image": "nginx:latest",
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 80,
          "protocol": "tcp"
        }
      ],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/my-web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "web-server"
        }
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/myWebAppTaskRole"
}

This Task Definition tells ECS how to run your Nginx web server. Then, you’d create an ECS Service pointing to this Task Definition. You’d configure it to maintain a "desired count" of, say, 3 tasks. ECS, using its control plane, would continuously monitor these tasks. If one fails its health checks or terminates unexpectedly, ECS immediately provisions a new one. This is the core of a long-running service: continuous availability and self-healing. It’s designed to be always on.

Now, contrast that with a one-off job. Think of a batch processing task, like analyzing a large dataset, generating a report, or sending out a batch of emails. These tasks have a defined start and a defined end. They don’t need to run continuously.

Here’s a Task Definition for a batch processing job:

{
  "family": "data-processor",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "processor-container",
      "image": "my-docker-repo/data-processor:v1.2",
      "command": ["python", "process.py", "--input-bucket", "my-data-bucket", "--output-key", "results/report.csv"],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/data-processor",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "processor"
        }
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/dataProcessorTaskRole"
}

Instead of creating an ECS Service, you’d use the RunTask API operation or the AWS CLI command aws ecs run-task. This tells ECS to start one or more tasks based on your Task Definition and let them run to completion. Once a task finishes its work (i.e., the ENTRYPOINT or CMD in your Dockerfile exits with a zero status code), ECS considers it done. It doesn’t try to restart it. The task simply stops. This is the essence of a one-off job: execute and terminate.

The key difference lies in their lifecycle management. Services are designed for continuous uptime. They have a "desired count" and ECS actively works to maintain it. Jobs are designed for discrete execution. You tell ECS to run them, and they run until they naturally finish.

When you use aws ecs run-task, you can specify parameters like count (how many parallel instances of the job to run), launchType (FARGATE or EC2), and networkConfiguration. You can even set enableECSManagedTags to true for better cost allocation. Crucially, you don’t set a desired count; you’re just asking for a specific execution.

The mental model to hold onto is this: Services are daemons; Jobs are scripts. You wouldn’t expect a script to keep running after it’s done its work, and you wouldn’t expect a daemon to stop after a single execution.

One subtle point that often causes confusion is how logs are handled. For services, you typically configure CloudWatch Logs with a prefix that includes the task ID, allowing you to see logs for each individual instance of your service. For jobs, especially if you run multiple parallel tasks, you might want to ensure your application logic includes identifiers in its logging output (e.g., "Processing batch X, task Y") so you can correlate output from different job runs.

The choice isn’t just about "always on" vs. "run once." It has implications for scaling, cost, and deployment strategies. Services can be auto-scaled based on metrics like CPU utilization or request counts. Jobs are typically scaled by simply running more of them in parallel via run-task. Deploying updates to a service involves strategies like rolling updates, while updating a job means you simply run a new version of the task.

If you find yourself manually stopping tasks that have completed their work after using aws ecs run-task, you’re likely misusing the job construct and should be using a Service with a stop command or a graceful shutdown mechanism. Conversely, if you’ve configured an ECS Service to have a desired count of 1 and manually stop it when it’s done, you’re paying for continuous availability you don’t need and missing out on the efficiency of the job model.

Understanding this fundamental difference is key to building resilient and cost-effective containerized applications on AWS. The next step is often understanding how to manage task placement and resource utilization for these different patterns.

Want structured learning?

Take the full Ecs course →