The ELK Stack, now known as the Elastic Stack, is surprisingly good at making your logs more accessible, but its true power lies in its ability to correlate events across disparate systems, not just store them.
Let’s see it in action. Imagine you’re running a simple web application with a separate API backend.
Application (Python/Flask Example)
from flask import Flask, request
import logging
app = Flask(__name__)
# Configure logging to send to Logstash
handler = logging.StreamHandler() # In a real setup, this would be a network handler
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
app.logger.addHandler(handler)
app.logger.setLevel(logging.INFO)
@app.route('/')
def hello_world():
app.logger.info(f"Received request for root endpoint from {request.remote_addr}")
return 'Hello, World!'
@app.route('/api/v1/status')
def api_status():
try:
# Simulate some work
result = 1 / 0
app.logger.info("API status endpoint called successfully.")
return {"status": "ok"}
except ZeroDivisionError:
app.logger.error("Division by zero encountered in API status endpoint.", exc_info=True)
return {"status": "error", "message": "Internal server error"}, 500
if __name__ == '__main__':
app.run(debug=True, port=5000)
Logstash Configuration (logstash-nginx.conf)
input {
beats {
port => 5044
}
}
filter {
if [fileset][module] == "nginx" and [fileset][name] == "access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "nginx-logs-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
When your application and Nginx server are running, and logs are being shipped to Logstash (e.g., via Filebeat), Logstash processes them. It uses Grok patterns to parse unstructured log lines into structured fields (like clientip, request, response). If it’s an Nginx access log, it might use the COMBINEDAPACHELOG pattern. For geographic data, the geoip filter enriches the clientip with location information. Finally, it sends these structured logs to Elasticsearch for indexing and storage.
The core problem the Elastic Stack solves is the chaotic distribution of logs across countless servers and services. In a distributed system, when an error occurs, tracing its origin and impact across multiple components is a nightmare without a centralized, searchable system. The Elastic Stack provides this by:
- Collection: Agents like Filebeat, Metricbeat, or Packetbeat capture logs, metrics, or network data from sources.
- Processing & Enrichment: Logstash acts as a pipeline, parsing, filtering, and transforming raw data into a structured, queryable format. It can enrich data with information like GeoIP lookups or join it with other datasets.
- Storage & Indexing: Elasticsearch, a distributed search and analytics engine, stores the processed data and makes it searchable at scale.
- Visualization: Kibana provides a web interface for searching, visualizing, and creating dashboards from the data in Elasticsearch.
Here’s how you’d configure Filebeat to ship application logs to Logstash:
Filebeat Configuration (filebeat.yml)
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/my_app/*.log
fields_under_root: true
fields:
app_name: my-web-app
environment: production
output.logstash:
hosts: ["localhost:5044"]
logging.level: info
In this setup, Filebeat reads log files from /var/log/my_app/, adds custom fields (app_name, environment) to each log entry, and then forwards them to Logstash on port 5044. Logstash, configured with an input { beats { port => 5044 } }, will receive these. The app_name and environment fields are crucial for filtering and aggregating logs later in Kibana, allowing you to isolate issues within a specific application or environment.
The most surprising thing about the Elastic Stack’s performance is how much of its efficiency comes from the inverted index. Instead of scanning through documents to find terms, Elasticsearch builds an index of every unique term and then lists which documents contain that term. This means searching for a specific word or phrase is incredibly fast, even across petabytes of data, because it’s essentially just looking up entries in a massive, highly optimized dictionary.
When you’re setting up your Logstash pipelines, remember that the order of filters matters immensely. A filter applied too early might operate on data that hasn’t been parsed yet, rendering it ineffective, while a filter applied too late might be processing already-structured data unnecessarily. For instance, if you’re trying to parse a JSON log line with json { source => "message" } but the message field itself is a string representation of JSON that hasn’t been parsed by a preceding grok or json filter, it won’t work. The json filter needs to operate on the raw, unparsed message content.
The next logical step after getting your logs flowing is to think about how you’ll monitor the health and performance of the ELK stack itself.