awk学习笔记
1、Awk Command Syntax
Basic Awk Syntax:
awk -F '/pattern/ {action}' input_file
In the above syntax:
- -F is the field separator. If you don't specify, it will use an empty space as field delimiter.
- The /pattern/ and the {action} should be enclosed inside single quotes.
- /pattern/ is optional. If you don't provide it, awk will process all the records from the input-file. If you specify a pattern, it will process only those records from the input-file that match the given pattern.
- {action} - These are the awk programming commands, which can be one or multiple awk commands. The whole action block (including all the awk commands together) should be closed between { and }
- input-file - The input file that needs to be processed.
举例:
input_file如下:
101,John Doe,CEO 102,Jason Smith,IT Manager 103,Raj Reddy,Sysadmin 104,Anand Ram,Developer 105,Jane Miller,Sales Manager
命令行输入:
awk -F"," '/CEO/ {print $2,$3}' input_file
打印结果:
John Doe CEO
2、Awk Program Structure (BEGIN, body, END block)
awk workflow

举例:
将awk命令写入脚本demo:
#!/bin/awk -f BEGIN{ print "begin" FS="," } /CEO/ {print $2,$3} END{ print "end" }
执行命令:
awk -f demo test
打印结果:
begin
John Doe CEO
end
3、Print Command
默认情况下,print命令(没有参数)会打印输出整条记录,也可以通过传递特定的域号给print命令以只打印特定的域,添加匹配模式可以选择做特定规则打印
举例:
4、FS - Input Field Separator
awk处理文档时,默认的域分隔符为空格,可以通过-F选项来指定分隔符,如下所示:
awk -F ',' '{print $2, $3}' test
也可以使用awk内置变量FS来设置分隔符,需要在BEGIN块里设置:
awk 'BEGIN {FS=","} {print $2, $3}' test
还可以指定多个域分隔符,例如存在以下记录文件,其中的每条记录包含3个不同的域分隔符:逗号、冒号和百分号:
101,John Doe:CEO%10000 102,Jason Smith:IT Manager%5000 103,Raj Reddy:Sysadmin%4500 104,Anand Ram:Developer%4500 105,Jane Miller:Sales Manager%3000
You can specify MULTIPLE field separators using a regular expression. For example FS = "[,:%]" indicates that the field separator can be , or : or %
脚本demo:
#!/bin/awk -f BEGIN{ FS="[,:%]" } {print $2,$3}
打印结果:
John Doe CEO
Jason Smith IT Manager
Raj Reddy Sysadmin
Anand Ram Developer
Jane Miller Sales Manager
#简单的分隔符可用FS=“正则表达式”实现,复杂的分隔符不妨用python简单处理成空格
5、OFS - Output Field Separator
OFS表示输出分隔符,用以在输出时作为连续域之间的分隔符。默认的域分隔符为空格。
脚本demo:
#!/bin/awk -f BEGIN{ FS="[,:%]" OFS="--" } {print $2,$3}
kl@ubuntu:~/scripts$ awk -f demo test John Doe--CEO Jason Smith--IT Manager Raj Reddy--Sysadmin Anand Ram--Developer Jane Miller--Sales Manager
如果输出不想有间隔符间隔:
脚本demo:
#!/bin/awk -f BEGIN{ FS="[,:%]" OFS="--" } {print $2$3}#或者{print $2 $3}
kl@ubuntu:~/scripts$ awk -f demo test John DoeCEO Jason SmithIT Manager Raj ReddySysadmin Anand RamDeveloper Jane MillerSales Manager
6、RS - Record Separator
如果有以下文本,冒号代替换行符区分词条,逗号为分隔符
101,John Doe:102,Jason Smith:103,Raj Reddy:104,Anand Ram:105,Jane Miller
要提取姓名,需要用变量RS(默认为换行符)
kl@ubuntu:~/scripts$ awk -F"," 'BEGIN { RS=":" } {print $2}' test John Doe Jason Smith Raj Reddy Anand Ram Jane Miller
7、ORS - Output Record Separator
默认情况下,awk在输出记录时使用换行来分隔每条记录,可以通过指定变量ORS来显示的指定输出记录分隔符:
kl@ubuntu:~/scripts$ awk -F"," 'BEGIN {RS=":";ORS="--\n" } {print $2}' test John Doe-- Jason Smith-- Raj Reddy-- Anand Ram-- Jane Miller --
8、NR - Number of Records
NR is very helpful. When used inside the loop, this gives the line number. When used in the END block, this gives the total number of records in the file.
The following example shows how NR works in the body block,and in the END block:
文本test:
101,John Doe,CEO 102,Jason Smith,IT Manager 103,Raj Reddy,Sysadmin 104,Anand Ram,Developer 105,Jane Miller,Sales Manager
kl@ubuntu:~/scripts$ awk 'BEGIN{FS=","} {print "Id of record number",NR,"is",$1} END{print "Total number:",NR}' test Id of record number 1 is 101 Id of record number 2 is 102 Id of record number 3 is 103 Id of record number 4 is 104 Id of record number 5 is 105 Total number: 5
9、FNR - File "Number of Record"
NR keeps growing between multiple files. When the body block starts processing the 2nd file, NR will not be reset to 1, instead it will continue from the last NR number value of the previous file.
FNR will give you record number within the current file. So, when awk finishes executing the body block for the 1st file and starts the body block the next file, FNR will start from 1 again.
The following example shows both NR and FNR:
kl@ubuntu:~/scripts$ awk -F"," '{printf "%s---FILENAME=%s NR=%s FNR=%s\n",$1,FILENAME,NR,FNR}' test1 test2 this is test1 line1---FILENAME=test1 NR=1 FNR=1 this is test1 line2---FILENAME=test1 NR=2 FNR=2 this is test2 line1---FILENAME=test2 NR=3 FNR=1 this is test2 line2---FILENAME=test2 NR=4 FNR=2
10、FILENAME – Current File Name
FILENAME is helpful when you are specifying multiple input-files to the awk program. This will give you the name of the file Awk is currently processing.
kl@ubuntu:~/scripts$ awk '{print $0,"---",FILENAME}' test1 test2 this is test1 --- test1 this is test2 --- test2
11、ARGC,ARGV -Aarguments
ARGC :是一个整数,代表命令行上除了选项-v, -f 及其对应的参数之外所有参数的个数。 ARGV[ ] 是一个字符串数组,ARGV[0]到ARGV[ARGC-1]分别代表命令行上相对应的参数。
kl@ubuntu:~/scripts$ awk -F"," '{for(i=0;i<ARGC;i++) printf "ARGV[%d]=%s\n",i,ARGV[i]}' test ARGV[0]=awk ARGV[1]=test
12、 Awk Variables and Operators
You don't need to declare an variable to use it. If you wish to initialize an awk variable, it is better to do it in the BEGIN section, which will be executed only once.
unary operator: 正(+),负(-),自加(++),自减(--)
arithmetric operator:加(+),减(-),乘(*),除(/),求余(%)
string operator:连字符(空格)
comparison operators:大于(>),大于等于(>=),小于(<),小于等于(<=),等于(==),不等于(!=),相与(&&),相或(||)
regular expression operators:匹配(~),不匹配(!~)
举例:
文本test为:
101,John Doe,CEO,10000 102,Jason Smith,IT Manager,5000 103,Raj Reddy,Sysadmin,4500 104,Anand Ram,Developer,4500 105,Jane Miller,Sales Manager,3000
字符连接:
kl@ubuntu:~/scripts$ awk -F"," '{print $1 $1 $2}' test 101101John Doe 102102Jason Smith 103103Raj Reddy 104104Anand Ram 105105Jane Miller
匹配与不匹配:
匹配J开头的行:
kl@ubuntu:~/scripts$ awk -F"," '$2~/^J/' test 101,John Doe,CEO,10000 102,Jason Smith,IT Manager,5000 105,Jane Miller,Sales Manager,3000
匹配不以J开头的行的部分:
kl@ubuntu:~/scripts$ awk -F"," '$2!~/^J/ {print $1,$2}' test 103 Raj Reddy 104 Anand Ram
匹配全字:
kl@ubuntu:~/scripts$ awk -F"," '$2=="John Doe"' test 101,John Doe,CEO,10000
加入比较符:
kl@ubuntu:~/scripts$ awk -F"," '$4<4000 || $4>5000' test 101,John Doe,CEO,10000 105,Jane Miller,Sales Manager,3000
13、Awk Variables andAwk Conditional Statements and Loops
if语句:
脚本demo:
#!/bin/awk BEGIN{ FS="[,]" } {if($2=="John Doe") print "Hello CEO" else if($1==104) print "Hello Developer" else print "Hello" }
kl@ubuntu:~/scripts$ awk -f demo test Hello CEO Hello Hello Hello Developer Hello
while/do while循环:
#!/bin/awk BEGIN{ i = 0 while(1){ print i; i++; if(i>3) break; } }
kl@ubuntu:~/scripts$ awk -f demo 0 1 2 3
do while执行相同动作:
#!/bin/awk BEGIN{ i = 0 do{ print i; i++; }while(i<4) }
for循环:
实现上例的动作:
#!/bin/awk BEGIN{ for(i=0;i<4;i++){ print i; } }
break/continue语句:
break语句不再赘述,continue语句:
#!/bin/awk BEGIN{ for(i=0;i<4;i++){ if(i==2) continue; print i; } }
kl@ubuntu:~/scripts$ awk -f demo 0 1 3
exit语句:
退出且不执行后边的语句
#!/bin/awk BEGIN{ for(i=0;i<4;i++){ if(i==2) exit; print i; } }
kl@ubuntu:~/scripts$ awk -f demo 0 1
14、Awk Associative Arrays
In Awk, arrays are associative, i.e. an array contains multiple index/value pairs. The index doesn't need to be a continuous set of numbers; in fact it can be a string or a number, and you don't need to specify the size of the array.
Syntax:
arrayname[string]=value
- arrayname is the name of the array.
- string is the index of an array.
- value is any value assigning to the element of the array.
The index of the array is always a string.Even when you pass a number for the index, awk will treat it as string index. Both of the following are the same.
#!/bin/awk -f BEGIN { array[101]=3; print array["101"]; }
kl@ubuntu:~/scripts$ awk -f demo 3
对于联合数组的读取:
{for (item in array) print array[item]} # 输出的顺序是随机的
{for(i=1;i<=len;i++) print array[i]} # Len 是数组的长度
多维数组,格式为:array[index1,index2,……]
SUBSEP是数组下标分割符,默认为“\034”。可以直接在SUBSEP的位置输入用的分隔符:
kl@ubuntu:~/scripts$ awk 'BEGIN{SUBSEP=":";array["a","b"]=1;for(i in array) printf "array[%s]=%d\n",i,array[i]}' array[a:b]=1
删除数组或数组元素,使用delete 函数:
delete array #删除整个数组
delete array[item] #删除某个数组元素
排序函数:
asort:对数组的值进行排序,排序之后数组下标为1到数组的长度,例如:
对给定test中的元素排序:
a 1 0 b 20 8 100 cc
脚本demo:
#!/bin/awk -f {a[$0]=$0} #建立数组a,下标为$0,赋值也为$0 END{ len=asort(a)#利用asort函数对数组a的值排序,同时获得数组长度len for(i=1;i<=len;i++) print i "\t"a[i] #打印 }
kl@ubuntu:~/scripts$ awk -f demo test 1 0 2 1 3 8 4 20 5 100 6 a 7 b 8 cc
asorti函数:对数组的下标排序,即asorti(array)后,会用1到数组长度作为下标,但是数组值为原数组下标:
文本test:
cd
ab
cd
cad
cd
sun
ab
kl@ubuntu:~/scripts$ awk '{a[$0]}END{l=asorti(a);for(i=1;i<=l;i++)print i,a[i]}' test 1 ab 2 cad 3 cd 4 sun
asorti函数可加入另一个参数,即asorti(array1,array2),其中array1的值是value,array2的值是string:
kl@ubuntu:~/scripts$ awk '{a[$0]++}END{l=asorti(a,b);for(i=1;i<=l;i++)print i,b[i],a[b[i]]}' test 1 ab 2 2 cad 1 3 cd 3 4 sun 1
两种数组方法去除test文本中重复行:
awk '!($0 in a){a[$0];print}' test
awk '!a[$0]++' test
15、Awk string function
文本test:
M.Tansley 05/99 48311 Green 8 40 44 J.Lulu 06/99 48317 green 9 24 26 P.Bunny 02/99 48 Yellow 12 35 28 J.Troll 07/99 4842 Brown-3 12 26 26 L.Tansley 05/99 4712 Brown-2 12 30 28
gusb(r,s):
kl@ubuntu:~/scripts$ awk 'gsub(4842,4899) {print $0}' test J.Troll 07/99 4899 Brown-3 12 26 26
gusb(r,s,t):
kl@ubuntu:~/scripts$ awk 'gsub(9,6,$2) {print $0}' test M.Tansley 05/66 48311 Green 8 40 44 J.Lulu 06/66 48317 green 9 24 26 P.Bunny 02/66 48 Yellow 12 35 28 J.Troll 07/66 4842 Brown-3 12 26 26 L.Tansley 05/66 4712 Brown-2 12 30 28
index(s,t):
kl@ubuntu:~/scripts$ awk '{print index($0,"n")}' test 5 37 5 37 5
length(s):
kl@ubuntu:~/scripts$ awk '{print length($1)}' test 9 6 7 7 9
match(s,r):
找不到返回0,找到返模式串r在匹配串s中的位置
kl@ubuntu:~/scripts$ awk '$1=="J.Lulu" {print match($1,/u/)}' test 4
split(s,a,fs):
kl@ubuntu:~/scripts$ awk 'BEGIN {print split("123#456#789",myarray,/#/);print myarray[1],myarray[2],myarray[3]}' 3 123 456 789
sprint(fmt,exp):
sub(r,s):
如下例子有第三个参数
kl@ubuntu:~/scripts$ awk '$1=="J.Troll" {sub(/26/,29,$0)} {print $0}' test M.Tansley 05/99 48311 Green 8 40 44 J.Lulu 06/99 48317 green 9 24 26 P.Bunny 02/99 48 Yellow 12 35 28 J.Troll 07/99 4842 Brown-3 12 29 26 L.Tansley 05/99 4712 Brown-2 12 30 28 kl@ubuntu:~/scripts$ awk '$1=="J.Troll" {sub("26",29,$7)} {print $0}' test M.Tansley 05/99 48311 Green 8 40 44 J.Lulu 06/99 48317 green 9 24 26 P.Bunny 02/99 48 Yellow 12 35 28 J.Troll 07/99 4842 Brown-3 12 26 29 L.Tansley 05/99 4712 Brown-2 12 30 28
substr(s,p) :
kl@ubuntu:~/scripts$ awk '$1=="L.Tansley" {print substr($1,1)}' test L.Tansley
kl@ubuntu:~/scripts$ awk '$1=="L.Tansley" {print substr($1,1,5)}' test L.Tan
浙公网安备 33010602011771号