Falco, the real-time Kubernetes threat detection engine, is surprisingly difficult to get running correctly with its official Helm chart if you don’t understand its fundamental dependency on kernel modules.
Here’s how to get it installed and operational, focusing on the common pitfalls:
The Heart of the Matter: Kernel Modules
Falco needs to inspect system calls. It does this by loading a kernel module. If this module isn’t present, or isn’t loaded, Falco will run, but it won’t see anything useful. This is the most common reason people think Falco is broken: it’s installed but silent.
Diagnosis:
On a node where the Falco DaemonSet pod is running, check for the presence of the falco_probe.ko file. You can usually find it in /lib/modules/$(uname -r)/kernel/drivers/misc/.
Also, check dmesg for messages related to falco. You’re looking for errors or lack of any mention of Falco loading.
Fix: The Helm chart tries to handle this. When you install the chart, it builds a kernel module for your specific kernel version. If this fails, it’s usually because the kernel headers for your node’s kernel are missing.
-
Install Kernel Headers: On Debian/Ubuntu nodes:
sudo apt-get update && sudo apt-get install -y linux-headers-$(uname -r) build-essentialOn RHEL/CentOS/Fedora nodes:
sudo yum update && sudo yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) make gccWhy it works: These packages provide the necessary source code and tools for compiling the
falco_probe.komodule against your running kernel. -
Reinstall Falco: After installing the headers, delete the existing Falco release and reinstall it:
helm uninstall falco helm install falco falco-charts/falco --namespace falco --create-namespaceWhy it works: The Helm chart’s build process will now succeed in compiling and loading the kernel module for your nodes.
Helm Chart Configuration: rulesfile.yaml and config.yaml
The Helm chart exposes many configuration options. Misunderstanding these can lead to Falco not loading rules, not reporting events, or having incorrect settings.
Diagnosis:
Check the Falco pod logs for errors related to loading rules or configuration. Look for lines like Error loading rules from ... or Failed to initialize output plugin ....
Fix: The most common issues stem from how rules and configuration are passed.
-
Custom Rules Files: If you’re providing your own
rulesfile.yaml, ensure it’s correctly formatted and that the Helm chart is configured to include it.# values.yaml snippet rulesfile.yaml: | - rule: Arbitrary command executed in a container desc: An arbitrary command was executed in a container. condition: container.id != host.id output: Arbitrary command executed in container (user: %(user.name) command: %(evt.args)) priority: WARNING source: syscall tags: [container, process, unexpected]Why it works: This ensures your custom rules are bundled into the Falco configuration file (
/etc/falco/falco_rules.yaml) inside the Falco container. -
Falco Configuration (
falco.yaml): The Helm chart managesfalco.yaml. If you override settings, ensure they are correct. For example, output plugins need to be enabled and configured.# values.yaml snippet config: # Enable the JSON output to stdout jsonOutput: true # Enable the audit logging logLevel: info # Configure output plugins (e.g., for SIEM integration) # output_plugins: # - name: "syslog" # library: "/usr/share/falco/output/libfalco_syslog.so" # args: "udp:127.0.0.1:514"Why it works: These settings directly translate to the
falco.yamlfile, controlling how Falco processes and outputs its findings.jsonOutput: trueis crucial for many external integrations.
Resource Limits and Requests
Falco is a performance-sensitive application. If its pods don’t have adequate resources, they can be killed by the Kubernetes scheduler or the node itself, leading to intermittent or complete failures.
Diagnosis:
Check the status of the Falco DaemonSet pods in kubectl get pods -n falco -o wide. Look for OOMKilled (Out Of Memory) status or pods frequently restarting. Also, check kubectl describe pod <falco-pod-name> -n falco for events.
Fix:
Adjust resource requests and limits in your values.yaml file.
# values.yaml snippet
resources:
requests:
cpu: "200m"
memory: "200Mi"
limits:
cpu: "500m"
memory: "500Mi"
Why it works: This provides the Falco pods with enough CPU and memory to operate efficiently, preventing them from being terminated by Kubernetes or the operating system due to resource starvation.
Network Policies and Firewall Rules
Falco needs to communicate with the Kubernetes API server to get context about your cluster (e.g., pod names, namespaces, labels). If network policies or external firewalls block this communication, Falco’s effectiveness is severely diminished.
Diagnosis:
Check Falco pod logs for errors connecting to the Kubernetes API server. Look for messages like Failed to get discovery client or timeouts when trying to fetch resources.
Fix:
Ensure that network policies (if you use them) allow egress traffic from Falco pods to the Kubernetes API server’s address and port (usually https://<kubernetes-api-server-ip>:6443).
If your cluster uses a strict egress firewall, you’ll need to whitelist the API server endpoint.
Why it works: This allows Falco to obtain the necessary metadata from the Kubernetes API to enrich its syscall-based detections, making alerts much more informative and actionable.
DaemonSet Node Affinity and Tolerations
Falco runs as a DaemonSet, meaning it should deploy a pod on each (or specific) node. If you have custom node taints or specific node selectors, Falco might not land on the nodes where you expect it, or it might not land at all.
Diagnosis:
Check kubectl get nodes and kubectl describe node <node-name> to see taints. Check kubectl get daemonset falco -n falco -o yaml for nodeSelector or affinity rules. Verify that the Falco pods are running on the intended nodes.
Fix:
Adjust the nodeSelector, affinity, and tolerations in your values.yaml to match your cluster’s node configuration.
# values.yaml snippet
# Example: Deploy Falco on nodes with the "falco-agent=true" label
nodeSelector:
falco-agent: "true"
# Example: Tolerate the "special-node=true" taint
tolerations:
- key: "special-node"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Why it works: This ensures that the Falco DaemonSet correctly schedules its pods onto the appropriate nodes, even if those nodes have specific taints or require certain labels for scheduling.
After successfully fixing these, the next hurdle is often understanding the vast array of rules and tuning them to reduce noise while catching genuine threats.