Bash can actually parse YAML files without needing external tools like yq or jq by leveraging its built-in string manipulation capabilities, though it quickly becomes unwieldy.

Let’s say we have a simple YAML file named config.yaml:

database:
  host: localhost
  port: 5432
  username: admin
  enabled: true

api_keys:
  - abcdef12345
  - ghijkl67890

settings:
  timeout: 30s
  retries: 3

We can read this into Bash variables. To get the database.host, we can use grep and awk:

db_host=$(grep '^  host:' config.yaml | awk '{print $2}')
echo "Database host: $db_host"

Output:

Database host: localhost

To access nested values, we repeat the pattern. For database.port:

db_port=$(grep '^  port:' config.yaml | awk '{print $2}')
echo "Database port: $db_port"

Output:

Database port: 5432

Accessing array elements requires a bit more work. To get the first API key:

api_key_1=$(grep '^  -' config.yaml | sed -n '1p' | awk '{print $2}')
echo "First API key: $api_key_1"

Output:

First API key: abcdef12345

If you need to get the second API key, you’d adjust sed -n '1p' to sed -n '2p'. Boolean values are treated as strings. To get database.enabled:

db_enabled=$(grep '^  enabled:' config.yaml | awk '{print $2}')
echo "Database enabled: $db_enabled"

Output:

Database enabled: true

The core problem Bash solves here is treating unstructured text as structured data. It does this by making assumptions about the input format – specifically, that keys are indented with spaces and followed by a colon, and that list items are indented with spaces and start with a hyphen. The grep command finds lines matching a specific pattern (e.g., lines starting with two spaces and host:), and awk then splits that line by whitespace and takes the second field, which is assumed to be the value. For lists, sed is used to select specific lines before awk extracts the value.

The mental model is one of line-by-line processing and pattern matching. You’re not really "parsing" in the sense of building an abstract syntax tree; you’re essentially performing a series of text substitutions and extractions. Each level of nesting or each type of data structure (object, array) requires a new, specific command sequence. You control the output by refining the grep patterns, adjusting the sed line numbers, or modifying the awk field selection. It’s a brittle approach that relies heavily on the exact formatting of the YAML.

What most people don’t realize is how brittle this approach becomes with even minor YAML variations. If a key has a space in it, or if indentation changes from two spaces to four, or if a value itself contains a colon, these simple grep/awk/sed commands will break. YAML’s flexibility is its strength for humans, but it’s a significant challenge for fixed-pattern text processing in Bash.

The next hurdle you’ll encounter is handling more complex YAML structures, like nested lists or values that contain special characters, which will require significantly more convoluted Bash commands or, more practically, a dedicated YAML parsing tool.

Want structured learning?

Take the full Bash course →