Falco’s performance overhead isn’t a bug; it’s a fundamental trade-off between comprehensive security visibility and system resource utilization.
Let’s see Falco in action. Imagine a simple web server running in Kubernetes. We’ll use kubectl to deploy a basic Nginx pod and then run Falco to monitor it.
# nginx-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-test
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
Apply this with kubectl apply -f nginx-pod.yaml.
Now, let’s assume Falco is already running in the cluster (e.g., as a DaemonSet). We can tail its output to see events:
kubectl logs -n falco falco-xxxx -f
When we access the Nginx server (curl localhost:80 on a node where the pod is running, or via kubectl port-forward), Falco will generate events. For instance, you might see:
{
"output": "10:03:15.1234567890: Notice A permitted syscall was made by the container (nginx-test) on host (k8s-node-1). File: /etc/nginx/html/index.html",
"priority": "Notice",
"rule": "File access by container",
"time": "2023-10-27T10:03:15.1234567890Z",
"ruleInput": {
"container": {
"id": "abcdef1234567890abcdef1234567890abcdef1234567890",
"name": "nginx-test",
"image": "nginx:latest"
},
"fd.name": "/etc/nginx/html/index.html",
"proc.name": "nginx"
},
"source": "syscall"
}
This event signifies that the nginx process within the nginx-test container accessed the /etc/nginx/html/index.html file. Falco, by default, hooks into kernel system calls (syscalls) via eBPF or kernel modules to observe these actions. The rules engine then evaluates these events against a set of predefined or custom rules to detect suspicious activity.
The problem Falco solves is providing fine-grained, runtime security visibility into containerized environments without relying solely on network logs or application-level instrumentation, which often miss low-level, malicious activities. It can detect things like:
- A container trying to execute a shell (
/bin/sh,/bin/bash). - A process attempting to read sensitive files like
/etc/passwdor/etc/shadow. - A container making unexpected network connections.
- A process modifying critical system binaries.
Internally, Falco works by:
- Capturing System Events: It uses eBPF (preferred) or a kernel module to tap into the kernel’s syscall stream. This is the most resource-intensive part.
- Filtering Events: A lightweight, in-kernel filter (eBPF) discards events that don’t match any enabled rules, reducing the load on the userspace daemon.
- Rule Matching (Userspace): The Falco daemon receives filtered events and matches them against its rules engine.
- Alerting: When a rule is matched, Falco triggers an alert, which can be sent to stdout, a file, syslog, or various output plugins (e.g., Kafka, Slack).
The performance overhead comes primarily from the kernel instrumentation and the volume of events generated. Every syscall, especially those involving file I/O, network activity, or process execution, can potentially be captured. The more rules you have enabled, and the more "noisy" your applications are (e.g., heavy I/O, frequent process forking), the higher the overhead.
Measuring Overhead:
The first step is always to establish a baseline. Run your application(s) under normal load without Falco, and then run them with Falco enabled but with a minimal set of rules. Use standard Linux performance tools:
top/htop: Look for the Falco process (usually namedfalco) and monitor its CPU and memory usage. A sustained high CPU percentage (e.g., >15-20%) for the Falco process itself, even with minimal rules, indicates a potential issue.perf: This is your most powerful tool. To measure the kernel’s syscall overhead attributed to Falco’s eBPF program, run:
This records all syscall entries for the Falco process for 10 seconds. Analyze the report for disproportionately high counts of specific syscalls initiated by Falco. More importantly, you can useperf record -e 'syscalls:sys_enter_*' -p $(pidof falco) -- sleep 10 perf reportperfto sample the entire system to see what Falco’s eBPF program is contributing. This is more advanced but crucial for deep dives:
Look for high-frequency tracepoints related to Falco’s eBPF hooks.# Sample the whole system, focusing on eBPF probes perf record -e 'tracepoint:kprobes/falco_probe_*' -a -- sleep 10 perf reportstrace(use with extreme caution on production): While not for measuring overhead directly,strace -p <falco_pid>can show you what syscalls Falco is making. This helps understand why it’s consuming CPU.- Falco’s own metrics: Falco can output performance metrics to stdout or a file. Configure it to do so:
Then,# In falco.yaml log_level: info print_output_to_stdout: true metrics: enabled: true format: json outputs: - stdoutkubectl logs -n falco falco-xxxxwill show JSON metrics, including event processing times and counts.
Reducing Overhead:
-
Rule Optimization (Most Impactful):
- Disable Unused Rules: The
falco.yamlconfiguration file allows you to disable entire rule groups or specific rules. Review yourrules.d/directory andfalco.yamlfor rules you don’t need. For example, if you never want to alert on any file access, disable theFile Accessrule group.
The# In falco.yaml rules: enabled: - "File Integrity" # Example of keeping specific rules - "Container Activity" disabled: - "File Access" # Disabling an entire group - "Shell in Container" # Disabling a specific rule by its IDdisabledlist takes precedence. - Use
skip_events: For very noisy rules that you still want to monitor but with less frequency, useskip_events. This tells Falco to only trigger the rule after a certain number of matching events have occurred within a time window.# Example rule in a .yaml file within rules.d/ - rule: Unusual network connection desc: Detect network connection to an unusual port condition: netevt and fd.sport != 80 and fd.sport != 443 and fd.sport < 1024 output: Unusual network connection attempt (user: %(user.name) user_uid: %(user.uid) comm: %(proc.name) comm_pid: %(proc.pid) comm_ppid: %(proc.ppid) container_id: %(container.id) container_name: %(container.name) - % (fd.sport) priority: Warning skip_events: 5 # Only alert after 5 such events in a short window - Be Specific in Conditions: Instead of broad
openatorexecverules, narrow them down. For example, instead ofevt.type = execve, useevt.type = execve and proc.name = /bin/bash. - Avoid Excessive Wildcards: Wildcards in file paths or process names can lead to more events being evaluated.
- Disable Unused Rules: The
-
eBPF Program Tuning:
- Kernel Module vs. eBPF: If you’re using the kernel module, consider switching to eBPF, which is generally more efficient and less intrusive. Falco defaults to eBPF when available.
- eBPF Helper Functions: Falco’s eBPF program uses helpers to filter events early. Ensure your kernel is recent enough to support optimized eBPF features.
bpf_filter_enabled: Falco’s eBPF program has an internal mechanism to check if any rules are enabled. If you disable all rules, the eBPF program can be made even more passive.
-
Resource Allocation:
- CPU Limits/Requests: For Falco running as a Kubernetes DaemonSet, set appropriate CPU requests and limits. Start with
cpu: "500m"and monitor. You might need1or2CPU cores for very busy clusters. - Memory: Falco’s memory usage is usually modest, but ensure it has enough.
memory: "256Mi"is a common starting point.
- CPU Limits/Requests: For Falco running as a Kubernetes DaemonSet, set appropriate CPU requests and limits. Start with
-
Output Configuration:
- Reduce Output Verbosity: If you’re logging to stdout or a file, ensure the log level is appropriate (
infois good,debugis very noisy). - Use Efficient Outputs: For high-volume environments, consider asynchronous outputs like Kafka or a dedicated logging system rather than writing directly to a file or stdout on every node.
- Reduce Output Verbosity: If you’re logging to stdout or a file, ensure the log level is appropriate (
-
System-Level Tuning:
- Kernel Parameters: In rare cases, kernel parameters related to networking or file system performance might indirectly affect syscall rates, but this is usually a last resort.
inotifyLimits: If you have many rules that watch file system events (though Falco primarily uses syscalls, some rules might indirectly trigger more syscalls), ensurefs.inotify.max_user_watchesis set appropriately on your nodes.
The most common mistake people make is enabling the default "all rules" set in production without understanding what each rule does and its potential impact. Start with a very minimal set of critical rules, measure the overhead, and then incrementally add more, re-measuring after each significant change.
The next challenge you’ll likely face is dealing with the sheer volume of legitimate alerts generated by your optimized ruleset, leading to alert fatigue.