Elasticsearch can only search what it understands, and by default, it doesn’t understand your application logs.

Let’s see how structured logging with the Elastic Common Schema (ECS) makes Elasticsearch actually useful.

Imagine you have a web server that logs requests. Without structure, it’s just a blob of text:

192.168.1.10 - - [10/Oct/2023:13:55:36 +0000] "GET /api/users HTTP/1.1" 200 1234 "-" "Mozilla/5.0"

This is hard to search. You can’t easily filter by IP, HTTP method, status code, or response size. You’d be stuck with grep and a lot of pain.

Now, let’s structure it using ECS. ECS is a specification for common log fields. It defines a standardized way to represent data like IP addresses, timestamps, HTTP requests, and more.

Here’s the same log, but structured according to ECS, ready for Elasticsearch:

{
  "@timestamp": "2023-10-10T13:55:36.000Z",
  "log.level": "info",
  "message": "Request processed successfully",
  "ecs": {
    "version": "1.12.0"
  },
  "process": {
    "pid": 12345,
    "thread.id": "main"
  },
  "client": {
    "ip": "192.168.1.10",
    "address": "192.168.1.10"
  },
  "server": {
    "address": "127.0.0.1",
    "port": 8080
  },
  "http": {
    "request": {
      "method": "GET",
      "bytes": 1234,
      "user_agent": {
        "original": "Mozilla/5.0"
      },
      "referrer": "-"
    },
    "response": {
      "status_code": 200
    }
  },
  "url": {
    "original": "/api/users",
    "path": "/api/users"
  },
  "user": {
    "name": "anonymous"
  }
}

See the difference? @timestamp is an ISO 8601 string, client.ip is a distinct field, http.request.method is its own value, and http.response.status_code is a number.

When Elasticsearch ingests this, it automatically maps these fields with the correct types. You can then use Kibana to build dashboards and queries like:

  • Show me all 5xx errors from the last hour: http.response.status_code >= 500 and @timestamp > now-1h
  • Count requests per IP address: x-axis: client.ip, y-axis: count()
  • Find all requests from a specific user agent: http.request.user_agent.original: "curl/7.68.0"

This isn’t just about making searches easier; it’s about making your data actionable. Without structure, logs are just noise. With ECS, they become a searchable, analyzable dataset.

How it works internally

When you send logs to Elasticsearch (typically via Filebeat, Logstash, or directly via the API), Elasticsearch’s ingest pipeline or mapping configuration determines how the incoming data is processed and stored.

  1. Parsing: Raw log lines are broken down into key-value pairs. This can be done by log shippers (like Filebeat’s processors) or dedicated parsing stages (like Logstash’s grok or json filters).
  2. ECS Mapping: The parsed fields are then mapped to their corresponding ECS field names. For example, a captured IP address might be mapped to client.ip.
  3. Type Coercion: Elasticsearch assigns data types to these fields (keyword, text, integer, date, etc.). This is crucial for efficient searching and aggregation. For instance, http.response.status_code should be an integer, not text, to allow numerical comparisons and aggregations.
  4. Storage: The structured, typed data is indexed in Elasticsearch.

The ECS structure itself is hierarchical. Fields are grouped logically. For example, all information related to an HTTP request lives under the http.request namespace. This keeps the schema organized and prevents naming collisions.

You can extend ECS with custom fields, but it’s best to prefix them with _<your_company_name>_ to avoid conflicts with future ECS versions. For example, _mycompany_tenant_id.

The key to successful ECS adoption is consistency. Every application, service, and log source needs to emit logs in the same structured format. This is often achieved by using logging libraries that have built-in ECS support or by configuring log shippers to transform logs before they reach Elasticsearch.

For instance, in Python, you might use the python-logstash-formatter with an ECS-compatible template. In Java, libraries like Logback or Log4j can be configured with JsonLayout and ECS field mappings.

The ecs field itself, containing version, isn’t strictly for ingestion processing by Elasticsearch, but rather for consumers of the logs (like Kibana or custom scripts) to know which version of the ECS specification the log event conforms to. This allows for forward and backward compatibility checks when analyzing data over time.

When you configure your Elasticsearch ingest pipeline (or your Logstash conf), you’ll often see processors that directly map fields. For example, a Logstash configuration might look like this:

filter {
  json {
    source => "message" # If your logs are already JSON strings in the message field
  }
  mutate {
    rename => {
      "remote_addr" => "[client][ip]"
      "request_method" => "[http][request][method]"
      "status" => "[http][response][status_code]"
      "bytes_sent" => "[http][request][bytes]"
    }
    convert => {
      "[http][request][bytes]" => "integer"
      "[http][response][status_code]" => "integer"
    }
  }
  # ... other filters to populate other ECS fields
}

This explicit mapping and type conversion ensures that even if your application logs fields with slightly different names, they get normalized into the correct ECS structure before being indexed.

The most surprising thing about ECS is how deeply it influences your entire observability strategy, not just your logging. Adopting ECS for logs naturally pushes you to think about common fields for metrics and traces, creating a unified data model across your entire stack. This makes correlating events across different data types far more straightforward than you might expect.

The next challenge you’ll face is unifying your metrics and traces with the same ECS structure.

Want structured learning?

Take the full Elasticsearch course →