Elasticsearch aliases are the secret sauce for zero-downtime reindexing, letting you swap out index data without users noticing a flicker.

Let’s see it in action. Imagine we have an index named my_data_v1 containing our current data. We want to reindex this into a new index, my_data_v2, with some schema changes or data transformations.

# Create the initial index
PUT /my_data_v1
{
  "mappings": {
    "properties": {
      "message": { "type": "text" },
      "timestamp": { "type": "date" }
    }
  }
}

# Add some data
POST /my_data_v1/_doc/1
{
  "message": "This is the first document",
  "timestamp": "2023-10-27T10:00:00Z"
}

POST /my_data_v1/_doc/2
{
  "message": "This is the second document",
  "timestamp": "2023-10-27T10:01:00Z"
}

Now, instead of applications querying my_data_v1 directly, we’ll introduce an alias, my_data. This alias will point to my_data_v1.

# Create the alias pointing to the current index
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "my_data_v1",
        "alias": "my_data"
      }
    }
  ]
}

From this point on, all application queries and writes should use my_data instead of my_data_v1.

# Querying via the alias
GET /my_data/_search
{
  "query": {
    "match_all": {}
  }
}
# This will return documents from my_data_v1

Now, for the reindexing. We create a new index, my_data_v2, with the desired schema or transformations.

# Create the new index with a different mapping (e.g., adding a keyword field)
PUT /my_data_v2
{
  "mappings": {
    "properties": {
      "message": { "type": "text" },
      "timestamp": { "type": "date" },
      "message_keyword": { "type": "keyword" }
    }
  }
}

We then use the _reindex API to copy data from my_data_v1 to my_data_v2. Notice we’re not touching the my_data alias yet.

# Reindex data from the old index to the new one
POST /_reindex
{
  "source": {
    "index": "my_data_v1",
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "my_data_v2",
    "version_type": "external",
    "op_type": "create"
  }
}

This reindexing process can take time, but your applications continue to read and write to my_data, which still points to my_data_v1. Once _reindex is complete and my_data_v2 is fully populated, we perform the critical swap.

# The magic happens here: atomically swap the alias
POST /_aliases
{
  "actions": [
    {
      "remove": {
        "index": "my_data_v1",
        "alias": "my_data"
      }
    },
    {
      "add": {
        "index": "my_data_v2",
        "alias": "my_data"
      }
    }
  ]
}

This _aliases API call is atomic. In a single, instantaneous operation, the my_data alias is detached from my_data_v1 and attached to my_data_v2. Applications will immediately start writing to and reading from my_data_v2 without any interruption.

The old index, my_data_v1, is still intact. You can keep it around for a while as a rollback option or delete it once you’re confident in the new index.

The key here is that the alias is the stable pointer. Your applications don’t know or care about the underlying index names; they only interact with the alias. This abstraction is what enables the zero-downtime switch. You can even perform rollbacks by simply swapping the alias back to my_data_v1.

When reindexing, especially with large datasets, you often want to process the data in batches and potentially index into a new index that has the alias pointing to it, while the old index is still being written to. The _reindex API itself doesn’t handle this "live" update scenario directly; it’s a one-off copy. For continuous reindexing or live transformations, you’d typically set up log shipping (like Filebeat) to a new index or use a dedicated data pipeline tool.

The real power of aliases extends beyond just reindexing. You can have multiple aliases pointing to the same index, or a single alias pointing to multiple indices, which is crucial for features like time-based indices and routing queries across different data sets. For example, an alias my_data_latest could point to my_data_v2, while my_data still points to my_data_v1 for a gradual migration. Or, an alias all_my_data could point to both my_data_v1 and my_data_v2 to query all available documents.

One common pitfall is forgetting to update your application configurations to use the alias instead of the direct index name. If your application continues to write to my_data_v1 after the alias has been switched to my_data_v2, your new data will end up in the old, now-unaliased index. Always ensure your applications are configured to use the alias.

After successfully switching your alias, the next challenge is managing the lifecycle of the old index, my_data_v1.

Want structured learning?

Take the full Elasticsearch course →