The _cat/indices API is failing because the Elasticsearch cluster cannot find the index you’re asking about, or it’s in a state where it’s not yet visible.

Here’s what’s actually broken: a shard for the requested index is either missing or inaccessible by the coordinating node, preventing the _cat/indices API from gathering the necessary metadata. This usually means a problem with shard allocation, disk space, or network connectivity between nodes.

Common Causes and Fixes:

  1. Index Not Yet Created or Typo in Name:

    • Diagnosis: Double-check the exact spelling of your index name. Use GET _cat/indices to list all available indices and verify.
    • Fix: Ensure you’re using the correct index name. If the index hasn’t been created yet, run the creation command, e.g., PUT my-new-index.
    • Why it works: The _cat/indices API queries for specific index metadata. If the index doesn’t exist, the metadata can’t be found.
  2. Disk Space Full on Data Nodes:

    • Diagnosis: Check disk usage on your data nodes. Run GET _cat/allocation?v and look for disk.used_percent values approaching 90% or higher. Also, check shards and disk.indices for specific indices consuming large amounts of space.
    • Fix: Free up space on the affected data nodes by deleting old indices or increasing disk capacity. If you need to temporarily allow writes despite full disks, you can adjust the cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high settings, but this is a temporary workaround. A more robust solution is to increase disk space or implement a lifecycle management policy.
    • Why it works: Elasticsearch stops allocating new shards and can even mark existing shards as unassigned or corrupted when disk space runs critically low to prevent data loss.
  3. Shard Allocation Issues (Unassigned Shards):

    • Diagnosis: Use GET _cat/shards?h=index,shard,prirep,state,unassigned.reason and filter for UNASSIGNED states. The unassigned.reason will tell you why. Common reasons include NO_VALID_SHARD_COPY, ALLOCATION_FAILED, or NODE_LEFT.
    • Fix:
      • NO_VALID_SHARD_COPY / ALLOCATION_FAILED: Often due to disk issues (see point 2) or node instability. Restarting the node or ensuring disk health can help. If a node is permanently offline, Elasticsearch may need to reallocate shards from its replicas.
      • NODE_LEFT: If a node has left the cluster and hasn’t rejoined, Elasticsearch will try to reallocate its shards. Ensure the node is back online or that the cluster has enough capacity to handle the loss.
      • Explicitly enable allocation: If allocation is disabled for maintenance, re-enable it with PUT _cluster/settings {"persistent": {"cluster.routing.allocation.enable": "all"}}.
    • Why it works: Shards must be allocated to a node for the index to be considered active and visible. Unassigned shards mean the cluster is actively trying (or unable) to place them.
  4. Network Partition or Node Unreachability:

    • Diagnosis: Check cluster health with GET _cluster/health. Look for status other than green or yellow. Examine logs on your Elasticsearch nodes for network-related errors (e.g., connection refused, timed out). Use ping or traceroute from one node to another to test connectivity.
    • Fix: Resolve network issues. This could involve firewall rules, DNS resolution problems, or physical network hardware. Ensure all nodes in the cluster can communicate with each other on the transport layer (default port 9300).
    • Why it works: Nodes need to communicate to share shard status and manage cluster state. If nodes can’t reach each other, shards might appear missing or be marked as unavailable.
  5. Corrupted Index or Shard Data:

    • Diagnosis: Check GET _cat/indices?v and look for indices with status other than open. Examine Elasticsearch logs for exceptions related to shard corruption (e.g., corrupt index, IOError, segment corruption).
    • Fix: If corruption is detected, you’ll likely need to restore from a snapshot. If you don’t have a snapshot, you might have to delete the index and recreate it, accepting data loss. For minor corruption that prevents recovery, you might try to force shard recovery if possible, but this is advanced and risky.
    • Why it works: Corrupted shard files mean Elasticsearch cannot read or write data to that shard, making it unavailable for the index.
  6. Master Node Instability or Unavailability:

    • Diagnosis: Check GET _cluster/health. If the status is red and unassigned_shards is high, it can indicate master issues. Look for logs on master-eligible nodes indicating they are not being elected or are losing quorum.
    • Fix: Ensure you have a stable quorum of master-eligible nodes. Restart master-eligible nodes one by one if necessary, ensuring they can re-establish a quorum. Check discovery.seed_hosts and cluster.initial_master_nodes configuration.
    • Why it works: The master node is responsible for managing cluster state, including shard allocation. If the master is unstable or unavailable, shard allocation decisions cannot be made, leading to unassigned shards and inaccessible indices.

After resolving these issues, you might encounter IndexNotFoundException if the index was truly deleted or if you’re querying a different cluster.

Want structured learning?

Take the full Cassandra course →