Using awk command in Linux

What is awk?

Awk is a programming language which has loops, arrays, functions etc., but nowadays mostly used only for “one-liners”.

Awk is not used for creating complex programs, but it  is great when we need to add some additional functionality to our standard Linux commands.

We will have a look at some of its usage examples in this guide.

Text retrieval and processing.

awk is very useful when you have to find text strings in files or in output from another program.

Before we start writing awk commands let’s create a text file input.txt with the following content:

ID	Brand	Type	S/N
1	IBM	Laptop	123456
2	Apple	PC	654321
3	HP	PC	333777
4	Sony	Laptop	225588
5	Toshiba PC	123654
6	Toshiba Laptop	987654

We will use awk commands to filter data from this file without having to write any complex scripts in other programming languages like Python.

 

Find row in a file.

The most simple usage of awk is to find a row in a file based on a search string. Let’s find all rows in our file containing “Toshiba” string.

orkhans@matrix:~$ awk ' /Toshiba/ {print}' input.txt 
5	Toshiba PC	123654
6	Toshiba Laptop	987654

 

Command awk ‘ /Toshiba/ {print}’ input.txt  searches for Toshiba string and prints out all rows containing that sting. The last argument – input.txt  specifies the input file.

We might use regex /T.*a/ instead of a plain string:

orkhans@matrix:~$ awk ' /T.*a/ {print}' input.txt 
5	Toshiba PC	123654
6	Toshiba Laptop	987654

 

Format output.

Let’s say we want to find all laptops in our file but you don’t want to see their Serial Numbers (S/N column) in output.

awk allows you to do it very easily using the following command:

orkhans@matrix:~$ awk ' /Laptop/ {print $1 "\t" $2 "\t"  $3}' input.txt
 
1	IBM	Laptop
4	Sony	Laptop
6	Toshiba	Laptop

We added $1, $2, $3 to our command which tell awk to output only columns 1,2 and 3. The “\t” characters between them tell awk to put tabs between columns.

 

Find a string in a specific column.

As you have learned from the previous example awk allows us to perform manipulations based on columns. Besides formatting output we can also search for a string in a specific column.

Let’s say we want to find an item with ID=1 and we use this command:

orkhans@matrix:~$ awk ' /1/ {print}' input.txt
 
1	IBM	Laptop	123456
2	Apple	PC	654321
5	Toshiba PC	123654

 

The problem with this is that S/N fields of items #2 and #5 also contain “1”, therefore we should search for a string only in ID field and output matching row.

The correct command is  awk ‘ $1~/1/ {print}’ input.txt 

orkhans@matrix:~$ awk ' $1~/1/ {print}' input.txt 
1	IBM	Laptop	123456

The command instructs awk to find only rows where the first column ($1) matches (~) the string /1/

 

Specify a delimiter character.

In all our examples all columns are delimited(separated) by tab, but it is possible that we will have to deal with files delimited by other characters, for example “:”

In this case we should tell awk to use another delimiter when doing manipulations:

awk -F: ‘// {print $1}’ /etc/passwd

The -F: part of the command tells awk to use “:” as a delimiter. This command will output the first column (print $1)  of all rows from /etc/passwd file.

 

Arithmetic operations in awk.

We can also use arithmetic operations in awk commands. Let’s say we need to find all items with IDs greater than 2 and less than 6.

The command will be:

orkhans@matrix:~$ awk '$1>2 && $1<6 {print}' input.txt
 
3	HP	PC	333777
4	Sony	Laptop	225588
5	Toshiba PC	123654

We are not going to look at each arithmetic and boolean operator, but it is good to be aware that you can do this with awk.

Conclusion

This guide barely scratches the surface of awk’s capabilities, but if you are interested in other features of this programming language you can find a lot of great tutorials on the Internet.

 

Tags:,

Add a Comment