Detailed explanation of how to use awk in Linux

Before learning awk, we should have learned sed, grep, tr, cut and other commands. These commands are all for the convenience of text and data processing under Linux, but we will find that many times these commands cannot completely meet our needs at once. Many times we need to use pipe symbols in combination with these commands. Today I will introduce a command awk to you, which can well solve our needs for text and data processing, allowing us to solve many problems with one command.

1. Introduction to awk command

Awk is known as one of the three musketeers of text processing. Its name comes from the first letters of the surnames of its founders Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: the AWK Programming Language, which its three creators have formally defined as a "pattern scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on the input, and generate reports, among countless other functions.
Therefore, awk is a powerful text analysis tool. Compared with grep's search and sed's editing, awk is particularly powerful when it comes to analyzing data and generating reports. Simply put, awk reads the file line by line, slices each line using spaces as the default delimiter, and performs various analyses and processing on the sliced parts.

2. awk command format and options

Grammatical form

awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)

Common command options

-F fs fs specifies the input separator, fs can be a string or a regular expression, such as -F:
-v var=value Assign a user-defined variable and pass the external variable to awk
-f scripfile Read awk commands from a script file
-m[fr] val sets an internal limit on the value of val. The -mf option limits the maximum number of blocks allocated to val; the -mr option limits the maximum number of records. These two functions are extended functions of the Bell Lab version of awk and are not applicable in the standard awk.

3. The principle of awk

awk 'BEGIN{ commands } pattern{ commands } END{ commands }'

Step 1: Execute the statements in the BEGIN{ commands } statement block;
Step 2: Read a line from the file or standard input (stdin), and then execute the pattern{commands} statement block, which scans the file line by line, repeating this process from the first line to the last line until the entire file has been read.
Step 3: When reading to the end of the input stream, execute the END{ commands } statement block.
The BEGIN statement block is executed before awk starts reading lines from the input stream. This is an optional statement block. Statements such as variable initialization and printing output table headers can usually be written in the BEGIN statement block.

The END block is executed after awk has read all the lines from the input stream. For example, information summarization such as printing the analysis results of all lines is completed in the END block. It is also an optional block.

The common commands in the pattern block are the most important part, and they are also optional. If the pattern statement block is not provided, { print } is executed by default, that is, each line read is printed, and the statement block will be executed for each line read by awk.

4. Basic usage of awk

There are three ways to call awk

1. Command line method

awk [-F field-separator] 'commands' input-file(s)

Among them, commands are real awk commands, and [-F field separator] is optional. input-file(s) are the files to be processed.
In awk, each item separated by a field separator in each line of a file is called a field. Normally, if you do not specify a field separator with -F, the default field separator is a space.

2. Shell script method

awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file

An awk script usually consists of three parts: a BEGIN statement block, a general statement block that can use pattern matching, and an END statement block. These three parts are optional. Either part need not appear in the script, which is usually enclosed in single or double quotes, for example:

awk 'BEGIN{ i=0 } { i++ } END{ print i }' filename
awk "BEGIN{ i=0 } { i++ } END{ print i }" filename

3. Insert all awk commands into a separate file and then call

awk -f awk-script-file input-file(s)

The -f option loads the awk script in awk-script-file, and input-file(s) is the same as the command line method above.
Let's take a look at some simple examples to further understand the usage of awk

[root@localhost ~]# awk '{print $0}' /etc/passwd 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
.........................................................................
[root@localhost ~]# echo 123|awk '{print "hello,awk"}'
hello,awk

[root@localhost ~]# awk '{print "hi"}' /etc/passwd
hi
hi
hi
hi
hi
hi
hi
hi
hi
.........................................................................

We specify /etc/passwd as the output file. When awk is executed, it will execute the print command for each line in /etc/passwd in turn.

The awk workflow is as follows: read a record separated by a '\n' newline character, then divide the record into fields according to the specified field separator, fill the fields, $0 represents all fields, $1 represents the first field, and $n represents the nth field. The default domain separator is "blank" or "[tab] key", so $1 represents the logged-in user, $3 represents the logged-in user IP, and so on. like

Print all usernames under /etc/passwd

[root@localhost ~]# awk -F: '{print $1}' /etc/passwd
root
bin
daemon
adm

........................................................................
Print all usernames and UIDs under /etc/passwd

[root@localhost ~]# awk -F: '{print $1,$3}' /etc/passwd
root 0
bin 1
daemon 2

........................................................................
Output in the format of username: XXX uid: XXX

[root@localhost ~]# awk -F: '{print "username: " $1 "\t\tuid: "$3}' /etc/passwd
username: root uid: 0
username: bin uid: 1
username: daemon uid: 2
........................................................................

5. awk built-in variables

variable	describe
\$n	The nth field of the current record, separated by FS
\$0	Complete input record
ARGC	The number of command line arguments
ARGIND	The position of the current file in the command line (starting from 0)
ARGV	An array containing the command line arguments
CONVFMT	Numeric conversion format (default value is %.6g) ENVIRON environment variable associative array
ERRNO	Description of the last system error
FIELDWIDTHS	List of field widths (separated by spaces)
FILENAME	Current file name
FNR	Line numbers counted separately for each file
FS	Field separator (default is any space)
IGNORECASE	If true, perform case-insensitive matching.
NF	The number of fields in a record
NR	The number of records that have been read, that is, the row number, starting from 1
OFMT	Output format for numbers (default is %.6g)
OFS	Output record separator (output line break), replace the line break with the specified symbol during output
ORS	Output record separator (default is a newline character)
RLENGTH	The length of the string matched by the match function
RS	Record separator (default is a newline character)
RSTART	The first position in the string matched by the match function
SUBSEP	Array subscript separator (default value is /034)

Example

[root@localhost ~]# echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | awk '{print "Line No:"NR", No of fields:"NF, "$0="$0, "$1="$1, "$2="$2, "$3="$3}'
Line No:1, No of fields:3 $0=line1 f2 f3 $1=line1 $2=f2 $3=f3
Line No:2, No of fields:3 $0=line2 f4 f5 $1=line2 $2=f4 $3=f5
Line No:3, No of fields:3 $0=line3 f6 f7 $1=line3 $2=f6 $3=f7

Use print $NF to print the last field in a line, use $(NF-1) to print the second to last field, and so on:

[root@localhost ~]# echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $NF}'
f3
f5
[root@localhost ~]# echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $(NF-1)}'
f2
f4

Statistics of /etc/passwd: file name, line number, number of columns per line, and corresponding complete line content:

[root@localhost ~]# awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:bin:x:1:1:bin:/bin:/sbin/nologin
filename:/etc/passwd,linenumber:3,columns:7,linecontent:daemon:x:2:2:daemon:/sbin:/sbin/nologin

Count the command line parameters ARGC, file line number FNR, field separator FS, number of fields in a record NF, number of records read (default is line number) NR in the /etc/passwd file

[root@localhost ~]# awk -F: 'BEGIN{printf "%4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR}' /etc/passwd
FILENAME ARGC FNR FS NF NR
---------------------------------------------
/etc/passwd 2 1 : 7 1
/etc/passwd 2 2 : 7 2
/etc/passwd 2 3 : 7 3

6. Advanced usage of awk

1.awk assignment operation

Assignment statement operators: = += -= *= /= %= ^= **=

For example: a+=5 is equivalent to a=a+5

[root@localhost ~]# awk 'BEGIN{a=5;a+=5;print a}'
10

2.awk regular operation output contains the line of root, and prints the user name and UID and the original line content

[root@localhost ~]# awk -F: '/root/ {print $1,$3,$0}' /etc/passwd
root 0 root:x:0:0:root:/root:/bin/bash
operator 11 operator:x:11:0:operator:/root:/sbin/nologin

We found two lines. If we want to find the line starting with root, we need to write it like this: awk -F: '/^root/' /etc/passwd

3.awk ternary operation

[root@localhost ~]# awk 'BEGIN{a="b";print a=="b"?"ok":"err"}'
OK
[root@localhost ~]# awk 'BEGIN{a="b";print a=="c"?"ok":"err"}'
err

The ternary operation is actually a judgment operation. If it is true, then output? If it is false, output:

4. Cyclic use of awk

Use of if statement

[root@localhost ~]# awk 'BEGIN{ test=100;if(test>90){ print "vear good";} else{print "no pass";}}'
wear good

Each command ends with ;
The while loop calculates the value from 1 to 100

[root@localhost ~]# awk 'BEGIN{test=100;num=0;while(i<=test){num+=i; i++;}print num;}'
5050
Use of for loop [root@localhost ~]# awk 'BEGIN{test=0;for(i=0;i<=100;i++){test+=i;}print test;}'
5050
Use of do loop [root@localhost ~]# awk 'BEGIN{test=0;i=0;do{test+=i;i++}while(i<=100)print test;}'
5050

5. Array application of awk

Array is the soul of awk. The most important thing in text processing is its array processing. Because array indices (subscripts) can be numbers and strings, arrays in awk are called associative arrays. Arrays in awk do not need to be declared in advance, nor do they need to have their size specified. Array elements are initialized with 0 or the empty string, depending on the context. Generally speaking, arrays in awk are used to collect information from records, which can be used to calculate sums, count words, track the number of times a template is matched, and so on.
Display the account in /etc/passwd

awk -F: 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd
0 root
1 bin
2 daemon
3 adm
4 lp
5 sync
........................................................................

6. Application of awk string functions

Function name description
sub matches the regular expression for the largest, leftmost substrings of the records and replaces them with the replacement string. If no target string is specified the entire record is used by default. Replacement only occurs on the first match.
sub (regular expression, substitution string):
sub (regular expression, substitution string, target string)

Examples:

     awk '{ sub(/test/, "mytest"); print }' testfile
     awk '{ sub(/test/, "mytest"); $1}; print }' testfile

The first example matches the entire record, and the replacement occurs only at the first occurrence of a match. If you want to match the entire file, you need to use gsub

The second example matches the first field in the entire record, and the replacement occurs only on the first match.
gsub matches the entire document
gsub (regular expression, substitution string)
gsub (regular expression, substitution string, target string)

Examples:

     awk '{ gsub(/test/, "mytest"); print }' testfile
     awk '{ gsub(/test/, "mytest" , $1) }; print }' testfile

The first example matches test in the entire document, and all matches are replaced with mytest.

The second example matches the first field in the entire document, and all matches are replaced with mytest.
index returns the position where the substring first matches, with offset starting at position 1
index(string, substring)

Examples:

awk '{ print index("test", "mytest") }' testfile

The example returns the position of test in mytest, and the result should be 3.
substr returns a substring starting at position 1. If the specified length exceeds the actual length, the entire string is returned.
substr( string, starting position )
substr( string, starting position, length of string )

Examples:

awk '{ print substr( "hello world", 7,11 ) }'

The above example extracts the world substring.
split can split a string into an array according to the given delimiter. If the delimiter is not provided, the data is split according to the current FS value.
split( string, array, field separator )
split( string, array )

Examples:

awk '{ split( "20:18:00", time, ":" ); print time[2] }'

The above example splits the time by colon into the time array and displays the second array element 18.
length Returns the number of characters in the record
length( string )
length

Examples:

     awk '{ print length( "test" ) }' 
     awk '{ print length }' testfile

The first example returns the length of the test string.

The second example returns the number of characters in the record in the testfile file.
match returns the index of the regular expression position in string, or 0 if the specified regular expression is not found. The match function sets the built-in variable RSTART to the starting position of the substring in string and RLENGTH to the number of characters to the end of the substring. substr can use these variables to intercept strings

match( string, regular expression )

Examples:

     awk '{start=match("this is a test",/[az]+$/); print start}'
     awk '{start=match("this is a test",/[az]+$/); print start, RSTART, RLENGTH }'

The first example prints the starting position of the sequence ending with consecutive lowercase characters, which is 11 in this case.

The second example also prints the RSTART and RLENGTH variables, which are 11(start), 11(RSTART), 4(RLENGTH).
toupper and tolower can be used to convert between string sizes. This function is only valid in gawk

toupper( string )
tolower( string )

Examples:

awk '{ print toupper("test"), tolower("TEST") }'

You may also be interested in:

Summary of the usage of split function in awk in Linux
Detailed explanation of Linux regular expression awk
One shell command a day Linux text content operation series - awk command detailed explanation
Detailed explanation of the usage of sed and awk in Linux
Linux awk time calculation script and awk command detailed explanation
Introduction to Linux shell awk to obtain external variables (variable value transfer)
Usage of awk command in linux
Linux awk advanced application examples
Linux awk example of separating a column of a file by commas

<<: vue-table implements adding and deleting

>>: MySQL 8.0.12 decompression version installation tutorial