How to Awk

Not only a command but a full-blown scripting language, Awk is a powerful tool for text processing. It’s a great way to quickly search through text files, extract and format data, and even perform basic calculations.

This post leans heavily on this and this amazing works.

Keep in mind

Not all Awk implementations are created equal. This post references the GNU version, although hopefully most of the information is generic enough to be of use in most awk implementations.

Basics

Awk operates on records and fields. By default, a record is a line (uses \n as a separator) and a field is a “word” (uses \s as a separator). It performs an action based on a pattern, as in “if it matches this, do that”.

Your basic Awk command looks something like /himom/ {print $0}. In this example, /himom/ is a pattern and {print $0} is an action. Patterns will always be delimited by / while actions will be within {}.

This reads like “On each record (line) that contains a match for the pattern himom, run the action print $0 (which prints the whole line).”

To call it on a file run: awk '/himom/ {print $0}' file (always use single quotes!).

Omit the pattern to perform the action in all lines. Omit the action to print each matching line (awk '/himom/ {print $0}' file and awk '/himom/' file do the same thing).

Positions

As you might expect, changing $0 for $n will print the nth field (word) instead of the whole record (line).

Regex

The pattern '/himom/' is a shorthand for '$0 ~ /himom/'. This means that patterns can be applied on a per-column basis.

If you know your regex, you might expect the previous pattern to match only for lines containing only the word himom.

This is not the case. You can leverage the full power of extended regular expressions out of the box (unlike Sed or Grep, you don’t need -E here), but the default behavior is to match anything containing the given pattern.

Also, while ^ and $ usually designate beginning and end of line, here they indicate beginning and end of match. This means that for awk '$1 ~ /-01$/', the line 2016-03-01 94.580002 93.610002 would match

As mentioned before, you can skip the pattern altogether. The command awk '{print $1}' file will just print the first field (word) on each record (line) in the file.

Not so Basics

Logical operators

Patterns can be mixed and matched with your typical logical operators.

awk '/bilbo/ && /frodo/ {print "My Precious"}' file
awk '/bilbo/ || /frodo/ {print "Is it you mister Frodo?"}' file

Or you can negate the match, as in “only perform the action on lines that DON’T match the pattern”.

awk '! ~ /frodo/ { print "Pohtatoes" }' file

Variables

BEGIN and END

Awk allows you to run specific actions before and after it does the processing.

awk 'BEGIN {print "I'll be printed before the file is processed"}' file
awk 'END {print "I'll be printed after the file is processed"}' file

This can be used to create headers and footers for your output, although more often than not you’ll use it as a safe space to set other variables such as…

IGNORECASE

The match is case-sensitive by default. Change this behavior by setting this variable to 1:

awk 'BEGIN {IGNORECASE=1} /fooBar/ {print $1}' file

Of course, you can still print your pretty header!

awk 'BEGIN {IGNORECASE=1; print "A nice header!"} /fooBar/ {print $1}' file

Notice how we separate the two statements within the BEGIN action with a ;.

(Input) Record and Field separator (RS & FS)

As mentioned before, the default RS is \n while the default FS is \s. This might work for you, or it might not, but we can change these values!

Suppose you are working with a proper comma separated CSV.

Tonia,Ellerey,Tonia.Ellerey@yopmail.com,Tonia.Ellerey@gmail.com,firefighter
Joleen,Viddah,Joleen.Viddah@yopmail.com,Joleen.Viddah@gmail.com,police officer
Cherilyn,Kat,Cherilyn.Kat@yopmail.com,Cherilyn.Kat@gmail.com,firefighter
Janenna,Natica,Janenna.Natica@yopmail.com,Janenna.Natica@gmail.com,worker

Something like awk '{print $4}' file will not really work, but awk -v FS=, '{print $4}' file will:

Tonia.Ellerey@gmail.com
Joleen.Viddah@gmail.com
Cherilyn.Kat@gmail.com
Janenna.Natica@gmail.com

We simply use the -v flag to set the FS variable to ,.

There are multiple ways to set FS and RS. In fact, some versions of Awk might not have a -v flag available.

IMO however, this is the most reliable, simple and easy-to-read option when using Awk as a one liner.

Fancy things you can do

Record and Field number (NR & NF)

Just like we can change RS and FS, we can play around with NR and NF too.

Say you are working with a file like the following:

This Is A File With A Header And A Bunch Of Useless Fields
GARBAGE Tonia Ellerey Tonia.Ellerey@yopmail.com Tonia.Ellerey@gmail.com firefighter
101 Joleen Viddah Joleen.Viddah@yopmail.com Joleen.Viddah@gmail.com police officer
102 Cherilyn Kat Cherilyn.Kat@yopmail.com Cherilyn.Kat@gmail.com firefighter
103 Janenna Natica Janenna.Natica@yopmail.com Janenna.Natica@gmail.com worker
104 Fredericka Friede Fredericka.Friede@yopmail.com Fredericka.Friede@gmail.com doctor
105 Corina Susannah Corina.Susannah@yopmail.com Corina.Susannah@gmail.com doctor
106 Glenda Tyson Glenda.Tyson@yopmail.com Glenda.Tyson@gmail.com doctor 7 7 7 7 7 7
107 Ofilia Knowling Ofilia.Knowling@yopmail.com Ofilia.Knowling@gmail.com worker 8 8 8 8 8 8

You know that there is a useless line just under the header and that all lines with more than 10 fields are incorrect.

We can use NF to limit the number of fields and NR to limit the record number a line should have to be evaluated by awk:

awk 'NF<10 && NR>2 {print $4}' file

“Print the 4th field of all records whose NR is greater than 2 (3rd line onwards) and whose NF are less than 10 (9 or fewer fields)“:

Joleen.Viddah@yopmail.com
Cherilyn.Kat@yopmail.com
Janenna.Natica@yopmail.com
Fredericka.Friede@yopmail.com
Corina.Susannah@yopmail.com

Output Record and Field separator (ORS & OFS)

What if you want to format the output of your awk command?

Well for simple commands, something like awk 'NF<10 && NR>2 {print $3" <-> "$4}' file should do the trick (notice the <->):

Viddah <-> Joleen.Viddah@yopmail.com
Kat <-> Cherilyn.Kat@yopmail.com
Natica <-> Janenna.Natica@yopmail.com
Friede <-> Fredericka.Friede@yopmail.com
Susannah <-> Corina.Susannah@yopmail.com

There is another option that might be nicer for more complex commands:

awk -v OFS=<-> 'NF<10 && NR>2 {print $3, $4}' file

The output is the same, but you can probably imagine that the second option scales better when the commands start getting fancy. There is also printf support in Awk, so you can get as fancy as you like!

Range

If the file you are working with has some kind of sorting, you might want to operate based on that instead of the NR.

You can use multiple matches to create a range on which to perform the action. So on a file like:

first line
second line
third line
fourth line
fifth line

The command awk '/second/ , /fourth/ {print $0}' file outputs:

second line
third line
fourth line

If statements

Yes, you can even fit if statements in your Awk command.

Say you want to print the 9th field only if the 5th one is greater than 50:

awk '{if ($5<50) print $9}' file

But wait! There’s more.

You can use ternary operations for more complex behavior! (please consider if it makes sense, you might want to write a script at this point…)

awk '/frodo/ ? /ring/ : /orcs/ { print "Either frodo with the ring, or the orcs" }' file

In pseudocode this reads:

if matches(frodo)
  if matches(ring)
    print "Either frodo with the ring, or the orcs"
  else
    do nothing
else if matches(orcs)
  print "Either frodo with the ring, or the orcs"
else
  do nothing

So for a file:

frodo
ring
orcs
frodo ring
frodo orcs
ring orcs
frodo ring orcs

The command:

awk '/frodo/ ? /ring/ : /orcs/ { print $0" --> Either frodo with the ring, or the orcs" }' file

Would output:

orcs --> Either frodo with the ring, or the orcs
frodo ring --> Either frodo with the ring, or the orcs
ring orcs --> Either frodo with the ring, or the orcs
frodo ring orcs --> Either frodo with the ring, or the orcs

Some notes on Awk

Multi-file

Awk can read multiple files, but it’s behavior when doing so is not the most intuitive and some variables are slightly different.

This post only covers how to use Awk given a single file. If possible, I would advise using Awk in this way to keep things simple.

Scripting

This post only covers how to use it as a one liner from the command prompt, but Awk is much more than a command.

The {} you see are around actions and the ; that separate commands within an action are there for a reason: Awk is not just a command, it’s a fully featured scripting language.

This for example, is a valid Awk script:

#!/usr/bin/awk
/hi/ {
  if($1 > $2){
    print "mom!"
  }
  else print "there!"
}

As you might imagine, this post barely scratches the surface of what can be done with Awk. There’s support for user defined variables, arrays and flow control (with things like next and exit).

Have fun exploring!

How to Awk

Other posts you might like