Elasticsearch mappings are more like a schema definition than a strict database schema, and how you define them has a massive impact on both performance and your ability to query data later.

Let’s see this in action. Imagine we’re indexing documents about books.

PUT /books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      },
      "author": {
        "type": "keyword"
      },
      "publication_date": {
        "type": "date"
      },
      "genre": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "keyword"
          },
          "subgenre": {
            "type": "keyword"
          }
        }
      },
      "pages": {
        "type": "integer"
      }
    }
  }
}

In this example:

  • title is text with an english analyzer. This breaks down the title into individual words (tokens) and normalizes them (e.g., "running" becomes "run"). This is great for full-text search, allowing us to find books by "running" even if the title is "The Runner’s Journey."
  • author is keyword. This treats the entire author name as a single, unanalyzed string. It’s perfect for exact matches or aggregations. Searching for "J.R.R. Tolkien" will only find "J.R.R. Tolkien," not "Tolkien" or "J.R.R."
  • publication_date is date. Elasticsearch understands date formats and allows for range queries (e.g., books published between 2000 and 2010).
  • genre is nested. This is crucial. If genre were just an object, Elasticsearch would flatten it. Searching for books with genre.name as "Science Fiction" and genre.subgenre as "Space Opera" would be problematic. nested treats each genre object as a separate document, preserving the relationship between name and subgenre within that specific genre entry.
  • pages is integer. This enables numerical range queries (e.g., books with more than 300 pages).

The problem Elasticsearch mappings solve is efficiently storing and querying diverse data. Unlike a relational database where you define a rigid schema upfront, Elasticsearch is schema-on-write (though it has dynamic mapping capabilities). You tell Elasticsearch how you want to index and search each field. The mapping dictates how data is stored internally, impacting disk space, indexing speed, and query performance.

Internally, Elasticsearch uses Apache Lucene. For text fields, it builds an inverted index. This index maps terms (the analyzed words) to the documents containing them. For keyword fields, it also uses an inverted index, but it stores the entire string as a single term. numeric and date fields are often stored in a sorted structure called a BKD tree, which is highly optimized for range queries.

The most surprising true thing about Elasticsearch mappings is that by default, Elasticsearch will guess your types if you don’t provide a mapping. This dynamic mapping is convenient for quick setups but can lead to suboptimal indexing. For instance, it might map a numeric ID to text if it contains leading zeros, making numerical operations impossible. Or it might map a date string to keyword if it’s not in a standard format, preventing date range queries. Always define your mappings explicitly for production environments.

A common pitfall is using text for fields you only intend to filter or aggregate on. While it works, it’s less efficient than keyword. When you index a text field, Elasticsearch typically creates two versions: one for full-text search (_text) and one for sorting/aggregations (_keyword). If you don’t need full-text search on a field, explicitly mapping it as keyword saves index space and speeds up those specific operations.

When you decide to change a mapping on an existing index, you can’t directly modify it. You’ll need to reindex your data. This involves creating a new index with the desired mapping, then using the _reindex API to copy data from the old index to the new one.

The next concept you’ll grapple with is how to efficiently query these different field types, especially when dealing with complex aggregations or full-text search relevance.

Want structured learning?

Take the full Elasticsearch course →