Awk基本入门[6] Additional Awk Commands 3

1、Argument Processing (ARGC, ARGV, ARGIND)


 

The built-in variables we discussed earlier, FS, NFS, RS, NR, FILENAME, OFS, and ORS, are all available on all versions of awk (including nawk, and gawk).

• The environment variables discussed in this hack are available only on nawk and gawk.
• Use ARGC and ARGV to pass some parameters to the awk script from the command line.
• ARGC contains the total number of arguments passed to the awk script.
• ARGV is an array contains all the arguments passed to the awk script in the index from 0 through ARGC
• When you pass 5 arguments, ARGC will contain the value of 6.
• ARGV[0] will always contain awk.

The following simple arguments.awk shows how ARGC and ARGV behave:

$ cat arguments.awk
BEGIN {
    print "ARGC=",ARGC
    for (i = 0; i < ARGC; i++)
        print ARGV[i]
}

$ awk -f arguments.awk arg1 arg2 arg3 arg4 arg5
ARGC= 6
awk
arg1
arg2
arg3
arg4
arg5

 

In gawk the file that is currently getting processed is stored in the ARGV array that is accessed from the body loop. The ARGIND is the index to this ARGV array to retrieve the current file.
When you are processing only one file in an awk script, the ARGIND will be 1, and ARGV[ARGIND] will give the file name that is currently getting processed.

The following example contains only the body block, that prints the value of the ARGIND, and the current file name from the ARGV[ARGIND]

 

$ cat argind.awk
{
    print "ARGIND:", ARGIND
    print "Current file:", ARGV[ARGIND]
}

2、GAWK Built-in Environment Variables


 

The built-in variables discussed in this section are available only in GAWK.

ENVIRON

ENVIRON is an array that contains all the environment values. The index to the ENVIRON array is the environment variable name.
For example, the array element ENVIRON["PATH"] will contain the value of the PATH environment variable.

$ cat environ.awk
BEGIN {
    OFS="="
    for(x in ENVIRON)
        print x,ENVIRON[x];
}        

Partial output is shown below.

$ awk -f environ.awk
SHELL=/bin/bash
PATH=/home/ramesh/bin:/usr/local/sbin:/usr/local/bin:/u
sr/sbin:/usr/bin:/sbin:/bin:/usr/games
HOME=/home/ramesh
TERM=xterm
USERNAME=ramesh
DISPLAY=:0.0
AWKPATH=.:/usr/share/awk

 

IGNORECASE

By default IGNORECASE is set to 0. So, the awk program is case sensitive.
When you set IGNORECASE to 1, the awk program becomes case insensitive. This will affect regular expression and string comparisons.

The following will not print anything, as it is looking for "video" with lower case "v". But, the items.txt file contains only "Video" with upper case "V".

awk '/video/ {print}' items.txt

However when you set IGNORECASE to 1, and search for "video", it will print the line containing "Video", as it will not do a case sensitive pattern match.

$ awk 'BEGIN{IGNORECASE=1} /video/ {print}' items.txt
101,HD Camcorder,Video,210,10

As you see in the example below, this works for both string and regular expression comparisons.

$ cat ignorecase.awk
BEGIN {
    FS=",";
    IGNORECASE=1;
}
{
    if ($3 == "video") print $0;
    if ($2 ~ "TENNIS") print $0;
}


$ awk -f ignorecase.awk items.txt
101,HD Camcorder,Video,210,10
104,Tennis Racket,Sports,190,20

 

ERRNO

When there is an error while using I/O operations (for example: getline), the ERRNO variable will contain the corresponding error message.

 3、Awk Profiler - pgawk


 

The pgawk program is used to create an execution profile of your awk program. Using pgawk you can view how many time each awk statement (and custom user defined functions) were executed.

First, create a sample awk program that we'll run through the pgawk to see how the profiler output looks like.

$ cat profiler.awk
BEGIN {
    FS=",";
    print "Report Generated On:" strftime("%a %b %d %H:%M:%S %Z %Y",systime());
}
{
    if ( $5 <= 5 )
        print "Buy More: Order", $2, "immediately!"
    else
        print "Sell More: Give discount on", $2, "immediately!"
}
END {
    print "----"
}

Next, execute the sample awk program using pgawk (instead of just calling awk).

$ pgawk -f profiler.awk items.txt
Report Generated On:Mon Jan 31 08:35:59 PST 2011
Sell More: Give discount on HD Camcorder immediately!
Buy More: Order Refrigerator immediately!
Sell More: Give discount on MP3 Player immediately!
Sell More: Give discount on Tennis Racket immediately!
Buy More: Order Laser Printer immediately!
----

By default pgawk creates a file called profiler.out (or awkprof.out). You can specify your own profiler output file name using --profiler option as shown below.

$ pgawk --profile=myprofiler.out -f profiler.awk items.txt

View the default awkprof.out to understand the execution counts of the individual awk statements.

$ cat awkprof.out
# gawk profile, created Mon Jan 31 08:35:59 2011
# BEGIN block(s)
BEGIN {
1 FS = ","
1 print ("Report Generated On:" strftime("%a %b %d %H:%M:%S %Z %Y", systime()))
}
# Rule(s)
5 {
5if ($5 <= 5) { # 2
2 print "Buy More: Order", $2,"immediately!"
3} else {
3 print "Sell More: Give discount on", $2,"immediately!"
}
}
# END block(s)
END {
1 print "----"
}

While reading the awkprof.out, please keep the following in mind:

• The column on the left contains a number. This indicates how many times that particular awk command has executed. For example, the print statement in begin executed only once (duh!). The while lop executed 6 times.
• For any condition checking, one on the left side, another on the right side after the parenthesis. The left side indicates how many times the pattern was checked. The right side indicate how many times it was successful. In the above example, if
was executed 5 times, but it was successful 2 times as indicated by ( # 2 ) next to the if statement.

 

posted @ 2013-09-09 17:03  风*依旧  阅读(378)  评论(0编辑  收藏  举报