awk and sed are your bash shell’s built-in text manipulation powerhouses, letting you slice, dice, and transform files without needing to write a full-blown script.

Let’s see them in action. Imagine you have a log file, access.log, with lines like this:

192.168.1.100 - - [10/Oct/2023:10:00:01 +0000] "GET /index.html HTTP/1.1" 200 1234 "http://example.com/" "Mozilla/5.0..."
10.0.0.5 - - [10/Oct/2023:10:00:05 +0000] "POST /submit HTTP/1.1" 201 56 "http://example.com/form" "Chrome/118.0..."
192.168.1.100 - - [10/Oct/2023:10:00:10 +0000] "GET /images/logo.png HTTP/1.1" 200 5678 "http://example.com/" "Mozilla/5.0..."

You want to extract just the IP addresses and the HTTP status codes.

With awk, you can do this easily:

awk '{ print $1, $9 }' access.log

This outputs:

192.168.1.100 200
10.0.0.5 201
192.168.1.100 200

awk treats each line as a record and splits it into fields based on whitespace by default. $1 refers to the first field (the IP address), and $9 refers to the ninth field (the status code). print then outputs these fields, separated by a space (the default output field separator).

Now, let’s say you want to find all lines where the status code is not 200 and extract the IP address and the requested URL.

awk '$9 != 200 { print $1, $7 }' access.log

This gives you:

10.0.0.5 /submit

Here, $9 != 200 is a pattern that awk evaluates for each line. If the condition is true (the 9th field is not equal to 200), then the action { print $1, $7 } is executed, printing the IP address ($1) and the requested URL ($7).

sed is more about stream editing – performing substitutions and transformations on text. Suppose you want to replace all occurrences of "HTTP/1.1" with "HTTP/2.0" in your log file.

sed 's/HTTP\/1.1/HTTP\/2.0/g' access.log

This outputs the modified log:

192.168.1.100 - - [10/Oct/2023:10:00:01 +0000] "GET /index.html HTTP/2.0" 200 1234 "http://example.com/" "Mozilla/5.0..."
10.0.0.5 - - [10/Oct/2023:10:00:05 +0000] "POST /submit HTTP/2.0" 201 56 "http://example.com/form" "Chrome/118.0..."
192.168.1.100 - - [10/Oct/2023:10:00:10 +0000] "GET /images/logo.png HTTP/2.0" 200 5678 "http://example.com/" "Mozilla/5.0..."

The command s/HTTP\/1.1/HTTP\/2.0/g is a substitution command. s denotes substitution, followed by the pattern to find (HTTP\/1.1), the replacement string (HTTP\/2.0), and flags. The g flag means "global," so it replaces all occurrences on a line, not just the first. Notice the forward slashes within the pattern and replacement are escaped with a backslash (\) because the slash itself is used as the delimiter in the s command.

You can combine awk and sed using pipes. For instance, to find lines with status code 404 and then remove the timestamp from those lines:

awk '$9 == 404 { print }' access.log | sed 's/\[.*\]//'

This would first filter for 404s, and then sed would remove the [date:time +offset] part. The pattern \[.*\] matches a literal opening bracket, followed by any character (.) zero or more times (*), followed by a literal closing bracket.

The mental model to hold onto is that awk excels at structured data where lines can be broken into columns, while sed is for character-level transformations and pattern-based replacements across streams of text. awk’s strength lies in its ability to process data row by row and field by field, making it ideal for tabular data. sed’s strength is its ability to perform complex string manipulations and conditional edits on lines of text, treating the input as a stream. Both are incredibly efficient because they are implemented in C and are typically built into your shell environment.

A common trap with sed is misinterpreting the g flag. If you use sed 's/foo/bar/' file and foo appears multiple times on a line, only the first instance will be replaced. You must include the g flag (sed 's/foo/bar/g' file) if you intend to replace all occurrences on that line. This is a fundamental difference in how stream editors often operate by default to prevent accidental widespread changes.

Once you’ve mastered basic field manipulation with awk and substitutions with sed, you’ll naturally want to explore more advanced pattern matching, including regular expressions in both, and awk’s ability to perform calculations and manage state across multiple records.

Want structured learning?

Take the full Bash course →