Using awk command in Linux
What is awk?
Awk is a programming language which has loops, arrays, functions etc., but nowadays mostly used only for “one-liners”.
Awk is not used for creating complex programs, but it is great when we need to add some additional functionality to our standard Linux commands.
We will have a look at some of its usage examples in this guide.
Text retrieval and processing.
awk is very useful when you have to find text strings in files or in output from another program.
Before we start writing awk commands let’s create a text file input.txt with the following content:
ID Brand Type S/N 1 IBM Laptop 123456 2 Apple PC 654321 3 HP PC 333777 4 Sony Laptop 225588 5 Toshiba PC 123654 6 Toshiba Laptop 987654
We will use awk commands to filter data from this file without having to write any complex scripts in other programming languages like Python.
Find row in a file.
The most simple usage of awk is to find a row in a file based on a search string. Let’s find all rows in our file containing “Toshiba” string.
orkhans@matrix:~$ awk ' /Toshiba/ {print}' input.txt 5 Toshiba PC 123654 6 Toshiba Laptop 987654
Command awk ‘ /Toshiba/ {print}’ input.txt searches for Toshiba string and prints out all rows containing that sting. The last argument – input.txt specifies the input file.
We might use regex /T.*a/ instead of a plain string:
orkhans@matrix:~$ awk ' /T.*a/ {print}' input.txt 5 Toshiba PC 123654 6 Toshiba Laptop 987654
Format output.
Let’s say we want to find all laptops in our file but you don’t want to see their Serial Numbers (S/N column) in output.
awk allows you to do it very easily using the following command:
orkhans@matrix:~$ awk ' /Laptop/ {print $1 "\t" $2 "\t" $3}' input.txt 1 IBM Laptop 4 Sony Laptop 6 Toshiba Laptop
We added $1, $2, $3 to our command which tell awk to output only columns 1,2 and 3. The “\t” characters between them tell awk to put tabs between columns.
Find a string in a specific column.
As you have learned from the previous example awk allows us to perform manipulations based on columns. Besides formatting output we can also search for a string in a specific column.
Let’s say we want to find an item with ID=1 and we use this command:
orkhans@matrix:~$ awk ' /1/ {print}' input.txt 1 IBM Laptop 123456 2 Apple PC 654321 5 Toshiba PC 123654
The problem with this is that S/N fields of items #2 and #5 also contain “1”, therefore we should search for a string only in ID field and output matching row.
The correct command is awk ‘ $1~/1/ {print}’ input.txt
orkhans@matrix:~$ awk ' $1~/1/ {print}' input.txt 1 IBM Laptop 123456
The command instructs awk to find only rows where the first column ($1) matches (~) the string /1/
Specify a delimiter character.
In all our examples all columns are delimited(separated) by tab, but it is possible that we will have to deal with files delimited by other characters, for example “:”
In this case we should tell awk to use another delimiter when doing manipulations:
awk -F: ‘// {print $1}’ /etc/passwd
The -F: part of the command tells awk to use “:” as a delimiter. This command will output the first column (print $1) of all rows from /etc/passwd file.
Arithmetic operations in awk.
We can also use arithmetic operations in awk commands. Let’s say we need to find all items with IDs greater than 2 and less than 6.
The command will be:
orkhans@matrix:~$ awk '$1>2 && $1<6 {print}' input.txt 3 HP PC 333777 4 Sony Laptop 225588 5 Toshiba PC 123654
We are not going to look at each arithmetic and boolean operator, but it is good to be aware that you can do this with awk.
Conclusion
This guide barely scratches the surface of awk’s capabilities, but if you are interested in other features of this programming language you can find a lot of great tutorials on the Internet.