Aqua’s security posture management is a powerful tool, but its effectiveness hinges on a delicate balance between robust security and cost optimization. Over-provisioning security checks or using inefficient configurations can lead to unnecessary cloud spend, while under-provisioning leaves your environment vulnerable.

Let’s see Aqua in action, managing security and costs. Imagine a Kubernetes cluster where Aqua is configured to scan all new images pushed to the registry.

apiVersion: security.aqua.io/v1
kind: ImageScanPolicy
metadata:
  name: default-image-scan
spec:
  imageSelector:
    matchLabels:
      environment: production
  scanInterval: 24h
  severityThreshold: HIGH
  actions:
    - type: alert
    - type: quarantine
      ifSeverity: CRITICAL

This policy scans images labeled environment: production every 24 hours, alerts on HIGH severity vulnerabilities, and quarantines CRITICAL ones. The cost comes from the compute resources used for scanning, storage for scan results, and potentially the overhead of managing quarantined images.

The core problem Aqua solves is providing visibility into vulnerabilities and compliance risks across your cloud-native environment without becoming an unmanageable cost center. It integrates into CI/CD pipelines, registries, and running workloads to detect and prevent security issues. Internally, Aqua uses a combination of static analysis (scanning images for known vulnerabilities and misconfigurations) and dynamic analysis (monitoring running containers for behavioral anomalies). The levers you control are the policies themselves: what gets scanned, how often, and what actions are triggered.

To optimize costs, you need to be granular. Start by analyzing your current scanning load. Are you scanning every single image, even temporary build images or development test images?

Common Causes of High Aqua Costs and Their Fixes:

  1. Overly Broad Image Scanning Policies: Scanning every image in every repository, regardless of its criticality or lifecycle stage, consumes significant compute and storage.

    • Diagnosis: Review your ImageScanPolicy and ImageCompliancePolicy resources. Look for imageSelector fields that are too general (e.g., * or no specific labels). Check the Aqua UI for scan counts per repository.
    • Fix: Implement more specific imageSelector rules. For example, target only production images, or images with specific critical labels.
      spec:
        imageSelector:
          matchLabels:
            environment: production
            criticality: high
      
      This limits scanning to images explicitly marked as both production and high criticality, reducing the number of scans.
    • Why it works: Less data to scan means fewer compute cycles and less storage for results.
  2. Excessive Scan Frequency: Scanning images more often than necessary (e.g., hourly for stable production images) is wasteful.

    • Diagnosis: Examine the scanInterval in your ImageScanPolicy.
    • Fix: Adjust scanInterval to a more appropriate frequency. For production, daily (24h) or even weekly (168h) might suffice if your vulnerability disclosure rate is low. Development or staging environments might warrant more frequent scans.
      spec:
        scanInterval: 72h # Scan every 3 days for less critical environments
      
      This reduces the number of full scans performed over time.
    • Why it works: Fewer scans mean less CPU and I/O usage by Aqua’s scanning engine.
  3. Unnecessary Compliance Checks: Running extensive compliance checks on non-sensitive workloads or transient build jobs adds overhead.

    • Diagnosis: Review your ImageCompliancePolicy and RuntimePolicy configurations. Identify policies with broad imageSelector or workloadSelector that apply to many resources.
    • Fix: Use imageSelector and workloadSelector to target compliance checks only to environments and workloads where they are critical (e.g., PCI-DSS, HIPAA regulated environments).
      apiVersion: security.aqua.io/v1
      kind: ImageCompliancePolicy
      metadata:
        name: pci-compliance
      spec:
        imageSelector:
          matchLabels:
            environment: production
            compliance: pci-dss
        checks:
          - id: CIS_Kubernetes_1_23
          - id: PCI_DSS_v3_2_1
      
      This ensures expensive, deep compliance checks are only run on the specific set of images that require them.
    • Why it works: Compliance checks are often more resource-intensive than vulnerability scans; limiting their scope directly reduces compute load.
  4. Inefficient Runtime Policy Configurations: Runtime policies that are too broad or trigger on benign activities can generate excessive logs and alerts, increasing processing and storage costs.

    • Diagnosis: Analyze RuntimePolicy configurations for overly permissive rules or high-frequency event monitoring. Check the volume of runtime alerts generated in the Aqua UI.
    • Fix: Tune runtime rules to be more specific. Use matchLabels or namespace selectors to apply policies only to sensitive pods. Consider disabling overly chatty rules or adjusting their severity.
      apiVersion: security.aqua.io/v1
      kind: RuntimePolicy
      metadata:
        name: sensitive-workload-monitoring
      spec:
        workloadSelector:
          matchLabels:
            app: database
            environment: production
        rules:
          - type: FileIntegrity
            path: /etc/passwd
            action: alert
          - type: Network
            action: alert
            params:
              direction: egress
              port: 22
      
      This focuses runtime monitoring on specific, high-value targets.
    • Why it works: Runtime analysis can be resource-intensive; narrower focus means less data processed and fewer alerts generated.
  5. Excessive Vulnerability Thresholds: Setting severityThreshold too low (e.g., LOW) for automatic actions like quarantine can lead to frequent, unnecessary blocking of legitimate deployments.

    • Diagnosis: Examine the severityThreshold in ImageScanPolicy and ImageCompliancePolicy.
    • Fix: Adjust severityThreshold to HIGH or CRITICAL for automated blocking actions. Use MEDIUM or LOW only for alerting.
      spec:
        severityThreshold: HIGH
        actions:
          - type: alert
          - type: quarantine
            ifSeverity: CRITICAL
      
      This prevents the system from blocking deployments due to minor vulnerabilities, reducing the operational overhead of unblocking them.
    • Why it works: Reduces the number of false positives that require manual intervention, saving engineering time and preventing deployment disruptions.
  6. Lack of Image Lifecycle Management Integration: Continuously scanning old, unmaintained, or soon-to-be-deprecated images consumes resources without adding significant value.

    • Diagnosis: Review scan history for images that have not been deployed or updated in months.
    • Fix: Integrate Aqua scanning with your image lifecycle policies. Consider excluding images older than a certain age from regular scans, or implement automated deletion of stale images after a grace period. This can be managed through CI/CD scripts or Kubernetes admission controllers that check image age before deployment.
      # Example script to identify old images (not Aqua specific, but illustrates concept)
      
      docker images --format "{{.Repository}}:{{.Tag}} {{.CreatedAt}}" | grep -v "my-repo/dev-" | while read -r img_info; do
      
        img_name=$(echo "$img_info" | awk '{print $1}')
        img_date_str=$(echo "$img_info" | awk '{print $2}')
        img_date=$(date -d "$img_date_str" +%s)
        cutoff_date=$(date -d "90 days ago" +%s)
        if [ "$img_date" -lt "$cutoff_date" ]; then
          echo "Deleting old image: $img_name"
          # docker rmi "$img_name" # Uncomment to actually delete
        fi
      done
      
      By proactively pruning old images, you reduce the attack surface and the number of images Aqua needs to manage and scan.
    • Why it works: Fewer images to track and scan means less persistent storage and fewer scan cycles over time.

The next challenge you’ll likely encounter is managing the performance impact of Aqua’s admission controller on your Kubernetes API server.

Want structured learning?

Take the full Aqua course →