The most surprising thing about APM event filtering and sampling is that you’re not actually losing data when you do it right; you’re just intelligently reducing the volume of data that gets stored and analyzed, making your APM system more efficient and cost-effective without sacrificing critical insights.
Let’s see this in action. Imagine you’re running a high-traffic e-commerce site. You’re getting thousands of http.request spans per minute, many of them for static assets like /favicon.ico or repetitive health checks. Storing all of that is wasteful.
Here’s a simplified OpenTelemetry Collector configuration showing how you might filter out those noisy requests before they even hit your APM backend:
receivers:
otlp:
protocols:
grpc:
http:
processors:
filter/drop_noisy_requests:
spans:
exclude:
# Drop requests for specific paths
attributes:
http.target: ["/favicon.ico", "/healthz", "/ping"]
# Drop requests from specific user agents (e.g., bots)
- name: "user_agent"
value: ".*(bot|crawler).*" # Regex to match common bot user agents
# You can also filter logs and metrics if needed
# logs:
# ...
# metrics:
# ...
# Example of sampling if you want to keep some noisy requests but at a lower rate
# This is often used for high-volume, non-critical transactions
# Example: Keep only 10% of /api/v1/analytics requests
# sampling/reduce_analytics_volume:
# traces:
# # Keep 10% of traces that match this condition
# percentage: 10.0
# attributes:
# http.route: "/api/v1/analytics"
exporters:
logging: # Replace with your actual APM backend exporter
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [filter/drop_noisy_requests] #, sampling/reduce_analytics_volume] # Add sampling if used
exporters: [logging]
In this configuration, the filter/drop_noisy_requests processor has a spans.exclude.attributes section. If a span has an http.target attribute exactly matching /favicon.ico, /healthz, or /ping, it’s dropped. It also demonstrates filtering based on a regular expression for the user_agent attribute, catching common bot traffic.
The sampling/reduce_analytics_volume processor (commented out but illustrative) shows how you could use percentage sampling. If you wanted to keep only 10% of traces that are specifically for the /api/v1/analytics route, you’d enable this. This is useful for high-volume, less critical data where a representative sample is sufficient.
The problem this solves is the ever-increasing volume of telemetry data. APM systems are powerful, but they can become overwhelmed and prohibitively expensive if they ingest everything. Filtering and sampling allow you to surgically remove the noise and manage the volume of important, but high-frequency, data.
Internally, these processors examine each telemetry signal (span, log, metric) as it passes through the collector pipeline. For filters, they check if the signal’s attributes or other metadata match the defined include or exclude rules. If it’s an exclude rule and a match is found, the signal is discarded. For samplers, they apply probabilistic logic (like keeping a certain percentage) based on defined conditions. The key is that this happens before data is sent to your backend, saving network bandwidth, processing power at the backend, and storage costs.
The exact levers you control are the receivers, processors, and exporters sections of your collector configuration. Within processors, you define filter and sampling components. For filter, you specify spans, logs, or metrics, and then use include or exclude rules based on attributes (key-value pairs), names (for specific signal types if applicable), or severity (for logs). For sampling, you can apply fixed_size or percentage sampling, often with attributes to target specific types of telemetry.
A nuanced point often missed is how sampling decisions are made. When you configure percentage sampling for traces, the decision to keep or drop a trace is typically made at the head of the trace (the first span received). This ensures that all subsequent spans belonging to that same trace are treated consistently, so you don’t end up with fragmented traces where only some parts are sampled.
The next concept you’ll likely explore is how to combine multiple processors in a pipeline to create sophisticated data processing workflows.