Kubernetes auto-instrumentation with Pixie and eBPF is surprisingly more about observing network behavior than application code.

Let’s see it in action. Imagine you have a simple two-service application running in Kubernetes: a frontend pod (frontend-app) and a backend pod (backend-app).

# frontend-app deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: your-docker-repo/frontend-service:latest
        ports:
        - containerPort: 8080

# backend-app deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: your-docker-repo/backend-service:latest
        ports:
        - containerPort: 5000

To get started with Pixie, you typically install it into your cluster. This involves applying a manifest that deploys the Pixie control plane and a small agent (the vizier-agent) to each node. The magic of eBPF happens here: the vizier-agent uses eBPF to tap into kernel network and execution events without modifying your application code or its container images.

Once Pixie is installed, you can query live traffic. For instance, to see HTTP requests between your frontend and backend services:

px run http_request.yaml --namespace=default --selector="app=frontend"

Here’s a simplified http_request.yaml script:

# http_request.yaml
profile: "http_request"
namespace: "default"
pods:
  selector: "app=frontend"
table: "http_reqs"
columns:
  - "timestamp"
  - "upid"
  - "comm"
  - "remote_addr"
  - "req_path"
  - "resp_code"

The output will look something like this, showing real-time HTTP requests originating from pods labeled app=frontend:

2023-10-27 10:30:01.123 +0000 UTC | 12345:67890 | frontend | 10.1.2.3:5000 | /api/data | 200
2023-10-27 10:30:01.456 +0000 UTC | 12345:67890 | frontend | 10.1.2.3:5000 | /api/users | 500
2023-10-27 10:30:02.789 +0000 UTC | 12345:67890 | frontend | 10.1.2.3:5000 | /api/data | 200

This shows the timestamp, the process ID (UPID), the command name (comm), the remote address (which is your backend-app’s IP and port in this case), the request path, and the HTTP response code. All this without adding any agents or libraries to your frontend-app container.

The core problem Pixie solves is the "observability gap" in dynamic, ephemeral environments like Kubernetes. Traditional APM tools often require code instrumentation, which is a pain in Kubernetes: you’d need to rebuild images, manage agents in pods, and deal with compatibility issues. Pixie leverages eBPF, a technology that allows you to run sandboxed programs within the Linux kernel. These eBPF programs can safely hook into kernel functions, like network socket operations or process execution, and collect data with very low overhead.

Pixie’s architecture involves a control plane that manages its agents. The vizier-agent on each node is the eBPF powerhouse. It attaches eBPF programs to relevant kernel tracepoints and kprobes. These programs capture raw network packets and system calls, parse them (e.g., HTTP, gRPC, DNS), and send the structured data back to the control plane. The control plane then exposes this data via a query engine (using a language called "Vizier-specific Query Language" or VQL) and a UI.

The exact levers you control are primarily through the Pixie UI and its scripting capabilities. You select namespaces, pods (via labels or names), and the type of data you want to collect (HTTP, gRPC, DNS, TCP, etc.). You can filter by specific ports, IP addresses, request paths, or even response codes. Pixie also allows you to define custom eBPF programs for more advanced use cases, though this is less common for basic auto-instrumentation.

What most people don’t realize is that eBPF programs can also trace user-space function calls within your applications, provided you have debugging symbols available. While Pixie’s default "auto-instrumentation" focuses on network traffic and common protocols, deeper application-level tracing is possible by writing custom eBPF programs that hook into specific functions exposed by your application’s libraries or runtime. This is how you can get metrics like function latency or error counts without modifying your application’s source code, by targeting specific entry/exit points of critical functions.

The next concept you’ll likely run into is understanding how to correlate network events with application-level errors when the root cause isn’t immediately obvious from HTTP status codes alone.

Want structured learning?

Take the full Ebpf course →