Auto-Scale Flink Jobs with the Kubernetes Operator and HPA (2026)

The Kubernetes operator for Apache Flink, when used with the Horizontal Pod Autoscaler (HPA), doesn’t actually scale your Flink job’s parallel tasks directly. Instead, it scales the number of TaskManager pods that Flink can use, and Flink’s job manager then dynamically rebalances tasks across those available TaskManagers.

Let’s see this in action. Imagine a Flink job processing a Kafka stream. We want it to handle more data as the Kafka topic’s partitions get more active.

Here’s a simplified kustomization.yaml to deploy the Flink operator and a basic Flink cluster:

resources:
- https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/config/crd/flinkdeployment.yaml
- https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/config/manager/flink-operator-deployment.yaml
- https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/config/samples/flink-sample.yaml

Now, let’s define a FlinkDeployment for our job. Notice the taskSlots – this is a key Flink concept. Flink doesn’t just scale by adding more containers per TaskManager; it scales by adding more slots to existing TaskManagers, or by adding more TaskManager pods, each with its own set of slots.

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: my-streaming-job
spec:
  image: flink:1.17.1
  flinkConfiguration:
    taskmanager.memory.process.size: "2048m"
  serviceAccount: flink-service-account
  jobManager:
    resource:
      memory: "1024m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
    # This is important: Flink distributes tasks across slots.
    # More slots mean more potential parallelism *within* a TaskManager.
    # We'll let HPA control the *number* of TaskManagers.
    taskSlots: 2
  # This is where we tell the operator to manage the job
  jobs:
    - id: 1
      jarURI: "local:///opt/flink/jars/my-flink-job.jar"
      mainClass: com.example.MyStreamingJob
      parallelism: 4 # Initial desired parallelism for the job

For autoscaling, we’ll use the Kubernetes Horizontal Pod Autoscaler (HPA). The HPA will watch a metric and adjust the number of TaskManager pods in our Flink cluster.

Here’s an HPA definition targeting our FlinkDeployment’s TaskManager replica count. We’ll scale based on CPU utilization.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-flink-taskmanagers
spec:
  scaleTargetRef:
    apiVersion: flink.apache.org/v1beta1
    kind: FlinkDeployment
    name: my-streaming-job
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60 # Scale up when CPU is over 60%

The crucial part is how Flink itself reacts. When the HPA increases the number of taskmanager pods managed by the FlinkDeployment, Flink’s JobManager sees new TaskManagers become available. If the job’s current parallelism is higher than what can be accommodated by the existing TaskManagers and their slots, the JobManager will automatically rebalance the tasks across the newly available slots.

Think of it this way:

HPA watches CPU (or memory, or custom metrics) across all TaskManager pods.
When the metric crosses a threshold (e.g., 60% CPU), HPA increases the replicas field in the FlinkDeployment’s taskManager spec.
The Flink Operator detects this change and starts creating new taskmanager pods until the desired replica count is met.
The Flink JobManager (running within its own pod) notices the new TaskManagers joining the cluster.
If the job’s current task assignments are oversubscribed for the available slots, the JobManager will redistribute tasks to the new TaskManagers and their available slots.

This means you define your job’s desired parallelism (e.g., parallelism: 4 in the FlinkDeployment) and your cluster’s scaling behavior (e.g., HPA scaling TaskManager pods from 1 to 10). Flink bridges the gap by rebalancing tasks.

The most surprising true thing about this setup is that Flink’s internal parallelism isn’t directly controlled by the HPA. The HPA scales the infrastructure (TaskManager pods), and Flink’s JobManager dynamically adjusts the work distribution within that infrastructure. Your parallelism setting in the FlinkDeployment defines the initial and target parallelism for the job itself, and Flink will try to achieve that using the available slots across all TaskManagers.

The next thing you’ll run into is managing Flink’s state during scaling events. When tasks are rebalanced, Flink needs to ensure that stateful operations (like aggregations or joins) can recover their state correctly, which often involves checkpointing and savepoints.