Detailed explanation of how to use awk in Linux

Detailed explanation of how to use awk in Linux

Before learning awk, we should have learned sed, grep, tr, cut and other commands. These commands are all for the convenience of text and data processing under Linux, but we will find that many times these commands cannot completely meet our needs at once. Many times we need to use pipe symbols in combination with these commands. Today I will introduce a command awk to you, which can well solve our needs for text and data processing, allowing us to solve many problems with one command.

1. Introduction to awk command

Awk is known as one of the three musketeers of text processing. Its name comes from the first letters of the surnames of its founders Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: the AWK Programming Language, which its three creators have formally defined as a "pattern scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on the input, and generate reports, among countless other functions.
Therefore, awk is a powerful text analysis tool. Compared with grep's search and sed's editing, awk is particularly powerful when it comes to analyzing data and generating reports. Simply put, awk reads the file line by line, slices each line using spaces as the default delimiter, and performs various analyses and processing on the sliced ​​parts.

2. awk command format and options

Grammatical form

awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)

Common command options

-F fs fs specifies the input separator, fs can be a string or a regular expression, such as -F:
-v var=value Assign a user-defined variable and pass the external variable to awk
-f scripfile Read awk commands from a script file
-m[fr] val sets an internal limit on the value of val. The -mf option limits the maximum number of blocks allocated to val; the -mr option limits the maximum number of records. These two functions are extended functions of the Bell Lab version of awk and are not applicable in the standard awk.

3. The principle of awk

awk 'BEGIN{ commands } pattern{ commands } END{ commands }'

Step 1: Execute the statements in the BEGIN{ commands } statement block;
Step 2: Read a line from the file or standard input (stdin), and then execute the pattern{commands} statement block, which scans the file line by line, repeating this process from the first line to the last line until the entire file has been read.
Step 3: When reading to the end of the input stream, execute the END{ commands } statement block.
The BEGIN statement block is executed before awk starts reading lines from the input stream. This is an optional statement block. Statements such as variable initialization and printing output table headers can usually be written in the BEGIN statement block.

The END block is executed after awk has read all the lines from the input stream. For example, information summarization such as printing the analysis results of all lines is completed in the END block. It is also an optional block.

The common commands in the pattern block are the most important part, and they are also optional. If the pattern statement block is not provided, { print } is executed by default, that is, each line read is printed, and the statement block will be executed for each line read by awk.

4. Basic usage of awk

There are three ways to call awk

1. Command line method

awk [-F field-separator] 'commands' input-file(s)

Among them, commands are real awk commands, and [-F field separator] is optional. input-file(s) are the files to be processed.
In awk, each item separated by a field separator in each line of a file is called a field. Normally, if you do not specify a field separator with -F, the default field separator is a space.

2. Shell script method

awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file

An awk script usually consists of three parts: a BEGIN statement block, a general statement block that can use pattern matching, and an END statement block. These three parts are optional. Either part need not appear in the script, which is usually enclosed in single or double quotes, for example:

awk 'BEGIN{ i=0 } { i++ } END{ print i }' filename
awk "BEGIN{ i=0 } { i++ } END{ print i }" filename

3. Insert all awk commands into a separate file and then call

awk -f awk-script-file input-file(s)

The -f option loads the awk script in awk-script-file, and input-file(s) is the same as the command line method above.
Let's take a look at some simple examples to further understand the usage of awk

[root@localhost ~]# awk '{print $0}' /etc/passwd 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
.........................................................................
[root@localhost ~]# echo 123|awk '{print "hello,awk"}'
hello,awk

[root@localhost ~]# awk '{print "hi"}' /etc/passwd
hi
hi
hi
hi
hi
hi
hi
hi
hi
.........................................................................


We specify /etc/passwd as the output file. When awk is executed, it will execute the print command for each line in /etc/passwd in turn.

The awk workflow is as follows: read a record separated by a '\n' newline character, then divide the record into fields according to the specified field separator, fill the fields, $0 represents all fields, $1 represents the first field, and $n represents the nth field. The default domain separator is "blank" or "[tab] key", so $1 represents the logged-in user, $3 represents the logged-in user IP, and so on. like

Print all usernames under /etc/passwd

[root@localhost ~]# awk -F: '{print $1}' /etc/passwd
root
bin
daemon
adm

........................................................................
Print all usernames and UIDs under /etc/passwd

[root@localhost ~]# awk -F: '{print $1,$3}' /etc/passwd
root 0
bin 1
daemon 2

........................................................................
Output in the format of username: XXX uid: XXX

[root@localhost ~]# awk -F: '{print "username: " $1 "\t\tuid: "$3}' /etc/passwd
username: root uid: 0
username: bin uid: 1
username: daemon uid: 2
........................................................................


5. awk built-in variables

variable describe
\$n The nth field of the current record, separated by FS
\$0 Complete input record
ARGC The number of command line arguments
ARGIND The position of the current file in the command line (starting from 0)
ARGV An array containing the command line arguments
CONVFMT Numeric conversion format (default value is %.6g) ENVIRON environment variable associative array
ERRNO Description of the last system error
FIELDWIDTHS List of field widths (separated by spaces)
FILENAME Current file name
FNR Line numbers counted separately for each file
FS Field separator (default is any space)
IGNORECASE If true, perform case-insensitive matching.
NF The number of fields in a record
NR The number of records that have been read, that is, the row number, starting from 1
OFMT Output format for numbers (default is %.6g)
OFS Output record separator (output line break), replace the line break with the specified symbol during output
ORS Output record separator (default is a newline character)
RLENGTH The length of the string matched by the match function
RS Record separator (default is a newline character)
RSTART The first position in the string matched by the match function
SUBSEP Array subscript separator (default value is /034)

Example

[root@localhost ~]# echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | awk '{print "Line No:"NR", No of fields:"NF, "$0="$0, "$1="$1, "$2="$2, "$3="$3}'
Line No:1, No of fields:3 $0=line1 f2 f3 $1=line1 $2=f2 $3=f3
Line No:2, No of fields:3 $0=line2 f4 f5 $1=line2 $2=f4 $3=f5
Line No:3, No of fields:3 $0=line3 f6 f7 $1=line3 $2=f6 $3=f7

Use print $NF to print the last field in a line, use $(NF-1) to print the second to last field, and so on:

[root@localhost ~]# echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $NF}'
f3
f5
[root@localhost ~]# echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $(NF-1)}'
f2
f4

Statistics of /etc/passwd: file name, line number, number of columns per line, and corresponding complete line content:

[root@localhost ~]# awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:bin:x:1:1:bin:/bin:/sbin/nologin
filename:/etc/passwd,linenumber:3,columns:7,linecontent:daemon:x:2:2:daemon:/sbin:/sbin/nologin

Count the command line parameters ARGC, file line number FNR, field separator FS, number of fields in a record NF, number of records read (default is line number) NR in the /etc/passwd file

[root@localhost ~]# awk -F: 'BEGIN{printf "%4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR}' /etc/passwd
FILENAME ARGC FNR FS NF NR
---------------------------------------------
/etc/passwd 2 1 : 7 1
/etc/passwd 2 2 : 7 2
/etc/passwd 2 3 : 7 3


6. Advanced usage of awk

1.awk assignment operation

Assignment statement operators: = += -= *= /= %= ^= **=

For example: a+=5 is equivalent to a=a+5

[root@localhost ~]# awk 'BEGIN{a=5;a+=5;print a}'
10

2.awk regular operation output contains the line of root, and prints the user name and UID and the original line content

[root@localhost ~]# awk -F: '/root/ {print $1,$3,$0}' /etc/passwd
root 0 root:x:0:0:root:/root:/bin/bash
operator 11 operator:x:11:0:operator:/root:/sbin/nologin

We found two lines. If we want to find the line starting with root, we need to write it like this: awk -F: '/^root/' /etc/passwd

3.awk ternary operation

[root@localhost ~]# awk 'BEGIN{a="b";print a=="b"?"ok":"err"}'
OK
[root@localhost ~]# awk 'BEGIN{a="b";print a=="c"?"ok":"err"}'
err

The ternary operation is actually a judgment operation. If it is true, then output? If it is false, output:

4. Cyclic use of awk

Use of if statement

[root@localhost ~]# awk 'BEGIN{ test=100;if(test>90){ print "vear good";} else{print "no pass";}}'
wear good

Each command ends with ;
The while loop calculates the value from 1 to 100

[root@localhost ~]# awk 'BEGIN{test=100;num=0;while(i<=test){num+=i; i++;}print num;}'
5050
Use of for loop [root@localhost ~]# awk 'BEGIN{test=0;for(i=0;i<=100;i++){test+=i;}print test;}'
5050
Use of do loop [root@localhost ~]# awk 'BEGIN{test=0;i=0;do{test+=i;i++}while(i<=100)print test;}'
5050

5. Array application of awk

Array is the soul of awk. The most important thing in text processing is its array processing. Because array indices (subscripts) can be numbers and strings, arrays in awk are called associative arrays. Arrays in awk do not need to be declared in advance, nor do they need to have their size specified. Array elements are initialized with 0 or the empty string, depending on the context. Generally speaking, arrays in awk are used to collect information from records, which can be used to calculate sums, count words, track the number of times a template is matched, and so on.
Display the account in /etc/passwd

awk -F: 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd
0 root
1 bin
2 daemon
3 adm
4 lp
5 sync
........................................................................


6. Application of awk string functions

Function name description
sub matches the regular expression for the largest, leftmost substrings of the records and replaces them with the replacement string. If no target string is specified the entire record is used by default. Replacement only occurs on the first match.
sub (regular expression, substitution string):
sub (regular expression, substitution string, target string)

Examples:

     awk '{ sub(/test/, "mytest"); print }' testfile
     awk '{ sub(/test/, "mytest"); $1}; print }' testfile

The first example matches the entire record, and the replacement occurs only at the first occurrence of a match. If you want to match the entire file, you need to use gsub

The second example matches the first field in the entire record, and the replacement occurs only on the first match.
gsub matches the entire document
gsub (regular expression, substitution string)
gsub (regular expression, substitution string, target string)

Examples:

     awk '{ gsub(/test/, "mytest"); print }' testfile
     awk '{ gsub(/test/, "mytest" , $1) }; print }' testfile

The first example matches test in the entire document, and all matches are replaced with mytest.

The second example matches the first field in the entire document, and all matches are replaced with mytest.
index returns the position where the substring first matches, with offset starting at position 1
index(string, substring)

Examples:

awk '{ print index("test", "mytest") }' testfile

The example returns the position of test in mytest, and the result should be 3.
substr returns a substring starting at position 1. If the specified length exceeds the actual length, the entire string is returned.
substr( string, starting position )
substr( string, starting position, length of string )

Examples:

awk '{ print substr( "hello world", 7,11 ) }'

The above example extracts the world substring.
split can split a string into an array according to the given delimiter. If the delimiter is not provided, the data is split according to the current FS value.
split( string, array, field separator )
split( string, array )

Examples:

awk '{ split( "20:18:00", time, ":" ); print time[2] }'

The above example splits the time by colon into the time array and displays the second array element 18.
length Returns the number of characters in the record
length( string )
length

Examples:

     awk '{ print length( "test" ) }' 
     awk '{ print length }' testfile


The first example returns the length of the test string.

The second example returns the number of characters in the record in the testfile file.
match returns the index of the regular expression position in string, or 0 if the specified regular expression is not found. The match function sets the built-in variable RSTART to the starting position of the substring in string and RLENGTH to the number of characters to the end of the substring. substr can use these variables to intercept strings

match( string, regular expression )

Examples:

     awk '{start=match("this is a test",/[az]+$/); print start}'
     awk '{start=match("this is a test",/[az]+$/); print start, RSTART, RLENGTH }'


The first example prints the starting position of the sequence ending with consecutive lowercase characters, which is 11 in this case.

The second example also prints the RSTART and RLENGTH variables, which are 11(start), 11(RSTART), 4(RLENGTH).
toupper and tolower can be used to convert between string sizes. This function is only valid in gawk

toupper( string )
tolower( string )

Examples:

awk '{ print toupper("test"), tolower("TEST") }'

You may also be interested in:
  • Summary of the usage of split function in awk in Linux
  • Detailed explanation of Linux regular expression awk
  • One shell command a day Linux text content operation series - awk command detailed explanation
  • Detailed explanation of the usage of sed and awk in Linux
  • Linux awk time calculation script and awk command detailed explanation
  • Introduction to Linux shell awk to obtain external variables (variable value transfer)
  • Usage of awk command in linux
  • Linux awk advanced application examples
  • Linux awk example of separating a column of a file by commas

<<:  vue-table implements adding and deleting

>>:  MySQL 8.0.12 decompression version installation tutorial

Recommend

js implementation of verification code case

This article example shares the specific code of ...

Solution to MySQL master-slave delay problem

Today we will look at why master-slave delay occu...

Common usage of regular expressions in Mysql

Common usage of Regexp in Mysql Fuzzy matching, c...

MySQL 5.7.15 installation and configuration method graphic tutorial (windows)

Because I need to install MySQL, I record the ins...

JS achieves five-star praise effect

Use JS to implement object-oriented methods to ac...

Docker Detailed Illustrations

1. Introduction to Docker 1.1 Virtualization 1.1....

A brief analysis of the differences between px, rem, em, vh, and vw in CSS

Absolute length px px is the pixel value, which i...

Use of MySQL triggers

Triggers can cause other SQL code to run before o...

React High-Order Component HOC Usage Summary

One sentence to introduce HOC What is a higher-or...

Practice of el-cascader cascade selector in elementui

Table of contents 1. Effect 2. Main code 1. Effec...

JavaScript adds prototype method implementation for built-in objects

The order in which objects call methods: If the m...

Summary of several principles that should be followed in HTML page output

1. DOCTYPE is indispensable. The browser determin...

MySQL 5.7.17 latest installation tutorial with pictures and text

mysql-5.7.17-winx64 is the latest version of MySQ...