The primary difference between text and keyword in Elasticsearch isn’t about how you search, but how Elasticsearch processes the data before you search or aggregate.

Let’s see this in action. Imagine we have a simple index with two fields, message (analyzed as text) and status (not analyzed, hence keyword).

PUT my_logs
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text"
      },
      "status": {
        "type": "keyword"
      }
    }
  }
}

POST my_logs/_doc
{
  "message": "User logged in successfully.",
  "status": "SUCCESS"
}

POST my_logs/_doc
{
  "message": "User logged in successfully.",
  "status": "SUCCESS"
}

POST my_logs/_doc
{
  "message": "User login failed.",
  "status": "FAILURE"
}

Now, let’s query.

Searching text fields:

If we search for "login" in the message field, Elasticsearch will find documents where "login" is present, even if it’s part of a larger word or in a different case. This is because the text field is analyzed.

GET my_logs/_search
{
  "query": {
    "match": {
      "message": "login"
    }
  }
}

The analysis process for text fields typically involves:

  1. Tokenization: Breaking the text into individual words (tokens). "User logged in successfully." becomes ["User", "logged", "in", "successfully"].
  2. Lowercasing: Converting all tokens to lowercase. ["user", "logged", "in", "successfully"].
  3. Stop word removal: Removing common words like "in", "a", "the". ["user", "logged", "successfully"].
  4. Stemming (optional): Reducing words to their root form. "logged" might become "log".

So, when you search for "login", Elasticsearch actually searches for the analyzed token "login" (or its stemmed form), which might match "login", "logins", "logged in", etc.

Searching keyword fields:

If we search for "SUCCESS" in the status field, it must be an exact match.

GET my_logs/_search
{
  "query": {
    "term": {
      "status": "SUCCESS"
    }
  }
}

Here, term query is used because keyword fields are not analyzed. They are indexed as is, in their exact form. This makes them perfect for exact matches, sorting, and aggregations.

Aggregation on keyword fields:

This is where the keyword type truly shines. If you want to count how many logs have each status, you aggregate on the status field.

GET my_logs/_search
{
  "size": 0,
  "aggs": {
    "status_counts": {
      "terms": {
        "field": "status"
      }
    }
  }
}

This will give you buckets for "SUCCESS" and "FAILURE", with their respective counts. This works because each distinct keyword value is treated as a single term.

Aggregation on text fields (and why it’s usually a bad idea):

If you try to aggregate on a text field like message directly, you’ll likely get unexpected results or no results at all.

GET my_logs/_search
{
  "size": 0,
  "aggs": {
    "message_terms": {
      "terms": {
        "field": "message"
      }
    }
  }
}

This query will likely return an error or an empty result because the message field has been broken down into many individual tokens during analysis. Aggregating on this field would mean trying to aggregate on tokens like "user", "logged", "in", "successfully", which isn’t what you typically want for a whole message.

The fielddata caveat for text fields:

While text fields are optimized for full-text search (inverted index), they are not optimized for aggregations or sorting by default. To perform aggregations on a text field, you’d need to enable fielddata. However, fielddata consumes significant heap memory, and it’s generally discouraged. The correct approach is to use a multi-field mapping.

The Solution: Multi-fields

The idiomatic Elasticsearch way to handle this is to map a field as both text (for searching) and keyword (for aggregation and sorting).

PUT my_logs_multi
{
  "mappings": {
    "properties": {
      "log_message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

POST my_logs_multi/_doc
{
  "log_message": "User logged in successfully."
}

POST my_logs_multi/_doc
{
  "log_message": "User login failed."
}

Now, you can search the log_message field using full-text search:

GET my_logs_multi/_search
{
  "query": {
    "match": {
      "log_message": "failed"
    }
  }
}

And aggregate on the log_message.keyword sub-field:

GET my_logs_multi/_search
{
  "size": 0,
  "aggs": {
    "message_exact_terms": {
      "terms": {
        "field": "log_message.keyword"
      }
    }
  }
}

This gives you the best of both worlds: efficient full-text search on the text field and precise aggregations/sorting on the keyword sub-field. The ignore_above: 256 is a common setting for keyword fields to prevent overly long strings from being indexed as keywords, which could consume excessive memory.

The fundamental distinction is that text fields are analyzed into individual tokens for search relevance, while keyword fields are indexed as exact, unanalyzed values, making them ideal for exact matching, sorting, and aggregations.

When you start dealing with complex text analysis, like custom analyzers or phonetic matching, you’ll realize that the text field’s behavior is highly configurable, and understanding the analyzer chain is key to effective searching.

Want structured learning?

Take the full Elasticsearch course →