Awk基本入门[2] Awk Built-in Variables
1、FS - Input Field Separator
awk处理文档时,默认的域分隔符为空格,所以如果你的输入文件的域分隔符不是空格,可以通过-F选项来指定分隔符,如下所示:
awk -F ',' '{print $2, $3}' employee.txt
我们也可以使用awk内置变量FS来设置分隔符,需要在BEGIN块里设置:
awk 'BEGIN {FS=","} {print $2, $3}' employee.txt
我们还可以指定多个域分隔符,例如存在以下记录文件,其中的每条记录包含3个不同的域分隔符:逗号、冒号和百分号:
$ vi employee-multiple-fs.txt 101,John Doe:CEO%10000 102,Jason Smith:IT Manager%5000 103,Raj Reddy:Sysadmin%4500 104,Anand Ram:Developer%4500 105,Jane Miller:Sales Manager%3000
You can specify MULTIPLE field separators using a regular expression. For example FS = "[,:%]" indicates that the field separator can be , or : or %
So, the following example will print the name and the title from the employee-multiple-fs.txt file that contains different field separators.
$ awk 'BEGIN {FS="[,:%]"} {print $2, $3}' \ employee-multiple-fs.txt John Doe CEO Jason Smith IT Manager Raj Reddy Sysadmin Anand Ram Developer Jane Miller Sales Manager
2、FIELDWIDTHS
awk默认使用FS指定的字符(串或正则表达式)作为输入域分隔依据,但是也可以使用FIELDWIDTHS指定每一列的宽度以分隔输入域,例如:
$ echo abcdefghigk | awk 'BEGIN{FIELDWIDTHS="1 2"} {$1=$1;print $0}' a bc
$ echo abcdefghigk | awk 'BEGIN{FIELDWIDTHS="1 2 3"} {$1=$1;print $0}' a bc def
参考:http://www.gnu.org/software/gawk/manual/html_node/Constant-Size.html
3、FPAT
假设存在以下的scv文件(逗号分隔值),内容为如下格式:
$ cat addresses.csv Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
注意到其中的地址字段("1234 A Pretty Street, NE")中包含了一个“,”,如果采用了FS=","来分隔输入域,则地址会被拆分成两部分:
"1234 A Pretty Street和 NE
这不是我们想要的结果
针对这样的场景,我么可以使用内置变量FPAT来解决问题。FPAT的值是一个正则表达式,该正则表达式描述了每一个域的内容。
针对上述场景中的csv文件,每个域或者是不包含","的字符串,或者是由一对双引号括起来的字符串。
因此,我们可以这样来解决:
$ cat simple-csv.awk BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")" } { print "NF = ", NF for (i = 1; i <= NF; i++) { printf("$%d = <%s>\n", i, $i) } }
$ gawk -f simple-csv.awk addresses.csv NF = 7 $1 = <Robbins> $2 = <Arnold> $3 = <"1234 A Pretty Street, NE"> $4 = <MyTown> $5 = <MyState> $6 = <12345-6789> $7 = <USA>
可以看到地址被作为一个域而存在了。
参考:http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
3、OFS - Output Field Separator
OFS表示输出分隔符,用以在输出时作为连续域之间的分隔符。默认的域分隔符为空格。
When you use a single print statement to print two
variables by separating them with comma (as shown below), it will print the values of those two variables separated by space.
$ awk -F ',' '{print $2, $3}' employee.txt John Doe CEO Jason Smith IT Manager Raj Reddy Sysadmin Anand Ram Developer Jane Miller Sales Manager
The following print statement is printing two variables ($2 and $4) separated by comma, however the output will have colon separating them (instead of space), as our OFS is set to colon.
$ awk -F ',' 'BEGIN { OFS=":" } \ { print $2, $3 }' employee.txt John Doe:CEO Jason Smith:IT Manager Raj Reddy:Sysadmin Anand Ram:Developer Jane Miller:Sales Manager
When you specify a comma in the print statement between different print values, awk will use the OFS. In the following example, the default OFS is used, so you'll see a space between the values in the output.
$ awk 'BEGIN { print "test1","test2" }' test1 test2
When you don't separate values with a comma in the print statement, awk will not use the OFS; instead it will print the values with nothing in between.
$ awk 'BEGIN { print "test1" "test2" }' test1test2
4、RS - Record Separator
假设存在以下的数据文件:
$ vi employee-one-line.txt 101,John Doe:102,Jason Smith:103,Raj Reddy:104,Anand Ram:105,Jane Miller
在这个文件中,每条记录由两部分组成(编号和姓名),记录之间用冒号分隔而非换行,而每条记录中的两个域则由逗号分隔。
awk默认使用换行作为记录分隔符,如果你试图只打印所有员工的姓名,则以下方法是行不通的:
$ awk -F, '{print $2}' employee-one-line.txt John Doe:102
这是因为awk将整行文本作为一条记录,而且逗号作为域分隔符,所以第二个域就是John Doe:102。所以如果想要将整行文本作为5条记录来处理,需要显示的指定记录分隔符:
$ awk -F, 'BEGIN { RS=":" } \ { print $2 }' employee-one-line.txt John Doe Jason Smith Raj Reddy Anand Ram Jane Miller
5、ORS - Output Record Separator
默认情况下,awk在输出记录时使用换行来分隔每条记录,可以通过指定变量ORS来显示的指定输出记录分隔符:
$ awk 'BEGIN { FS=","; ORS="\n---\n" } \ {print $2, $3}' employee.txt John Doe CEO --- Jason Smith IT Manager --- Raj Reddy Sysadmin --- Anand Ram Developer --- Jane Miller Sales Manager ---
6、NR - Number of Records
NR is very helpful. When used inside the loop, this gives the line number. When used in the END block, this gives the total number of records in the file.
The following example shows how NR works in the body block,and in the END block:
$ awk 'BEGIN {FS=","} \ {print "Emp Id of record number",NR,"is",$1;} \ END {print "Total number of records:",NR}' employee.txt Emp Id of record number 1 is 101 Emp Id of record number 2 is 102 Emp Id of record number 3 is 103 Emp Id of record number 4 is 104 Emp Id of record number 5 is 105 Total number of records: 5
7、FILENAME – Current File Name
FILENAME is helpful when you are specifying multiple input-files to the awk program. This will give you the name of the file Awk is currently processing.
$ awk '{ print FILENAME }' \ employee.txt employee-multiple-fs.txt employee.txt employee.txt employee.txt employee.txt employee.txt employee-multiple-fs.txt employee-multiple-fs.txt employee-multiple-fs.txt employee-multiple-fs.txt employee-multiple-fs.txt
8、FNR - File "Number of Record"
NR keeps
growing between multiple files. When the body block starts processing the 2nd file, NR will not be reset to 1, instead it will continue from the last NR number value of the previous file.
FNR will give you record number within the current file. So, when awk finishes executing the body block for the 1st file and starts the body block the next file, FNR will start from 1 again.
The following example shows both NR and FNR:
$ vi fnr.awk BEGIN { FS="," } { printf "FILENAME=%s NR=%s FNR=%s\n", FILENAME, NR, FNR; } END { printf "END Block: NR=%s FNR=%s\n", NR, FNR } $ awk -f fnr.awk employee.txt employee-multiple-fs.txt FILENAME=employee.txt NR=1 FNR=1 FILENAME=employee.txt NR=2 FNR=2 FILENAME=employee.txt NR=3 FNR=3 FILENAME=employee.txt NR=4 FNR=4 FILENAME=employee.txt NR=5 FNR=5 FILENAME=employee-multiple-fs.txt NR=6 FNR=1 FILENAME=employee-multiple-fs.txt NR=7 FNR=2 FILENAME=employee-multiple-fs.txt NR=8 FNR=3 FILENAME=employee-multiple-fs.txt NR=9 FNR=4 FILENAME=employee-multiple-fs.txt NR=10 FNR=5 END Block: NR=10 FNR=5
 
                     
                    
                 
                    
                 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号