Awk基本入门[6] Additional Awk Commands 2

4、 Generic String Functions


 Index Function

The index function can be used to get the index (location) of the given string (or character) in an input string.

You can also use index to check whether a given string (or character) is present in an input string. If the given string is not present, it will return the location as 0, which means the given string doesn't exist, as shown below.

$ cat index.awk
BEGIN {
    state="CA is California"
    print "String CA starts at location",index(state,"CA");
    print "String Cali starts at location",index(state,"Cali");
    if (index(state,"NY")==0)
        print "String NY is not found in:", state
}

$ awk -f index.awk
String CA starts at location 1
String Cali starts at location 7
String NY is not found in: CA is California

 

Length Function

The length function returns the length of a string. In the following example, we print the total number of characters in each record of 
the items.txt file.

$ awk '{print length($0)}' items.txt
29
32
27
31
30

 

Split Function

Syntax:

split(input-string,output-array,separator)

This split function splits a string into individual array elements. It takes following three arguments.
• input-string: This is the input string that needs to be split into multiple strings.
• output-array: This array will contain the split strings as individual elements.
• separator: The separator that should be used to split the input-string.

For this example, the original items-sold.txt file is slightly changed to have different field delimiters, i.e. a colon to separate the item number and the quantity sold. Within quantity sold, the individual quantities are separated by comma.

So, in order for us to calculate the total number of items sold for a particular item, we should take the 2nd field (which is all the quantities sold delimited by comma), split them using comma separator and store the substrings in an array, then loop through the array to add the quantities.

$ cat items-sold1.txt
101:2,10,5,8,10,12
102:0,1,4,3,0,2
103:10,6,11,20,5,13
104:2,3,4,0,6,5
105:10,2,5,7,12,6


$ cat split.awk
BEGIN {
    FS=":"
}
{
    split($2,quantity,",");
    total=0;
   for (x in quantity)
    total=total+quantity[x];
    print "Item", $1, ":", total, "quantities sold";
}
$ awk -f split.awk items-sold1.txt
Item 101 : 47 quantities sold
Item 102 : 10 quantities sold
Item 103 : 65 quantities sold
Item 104 : 20 quantities sold
Item 105 : 42 quantities sold

 

Substr Function
Syntax:

substr(input-string, location, length)

The substr function extracts a portion of a given string. In the above syntax:
• input-string: The input string containing the substring.
• location: The starting location of the substring.
• length: The total number of characters to extract from the starting location. This parameter is optional. When you don't specify it extracts the rest of the characters from the starting location.

Start from the 1st character (of the 2nd field) and prints 5 characters:

$ awk -F"," '{print substr($2,1,5)}' items.txt
HD Ca
Refri
MP3 P
Tenni
Laser

 

2、GAWK/NAWK String Functions


 These string functions are available only in GAWK and NAWK flavors.

Sub Function
syntax:

sub(original-string,replacement-string,string-variable)

• sub stands for substitution.
• original-string: This is the original string that needs to be replaced. This can also be a regular expression.
• replacement-string: This is the replacement string.
• string-variable: This acts as both input and output string variable. You have to be careful with this, as after the successful substitution, you lose the original value in this string-variable.

 

In the following example:

• original-string: This is the regular expression C[Aa], which matches either "CA" or "Ca"
• replacement-string: When the original-string is found, replace it with "KA"
• string-variable: Before executing the sub, the variable contains the input string. Once the replacement is done, the variable contains the output string.

Please note that sub replaces only the 1st occurrence of the match.

$ cat sub.awk
BEGIN {
    state="CA is California"
    sub("C[Aa]","KA",state);
    print state;
}


$ awk -f sub.awk
KA is California

The 3rd parameter string-variable is optional. When it is not specified, awk will use $0 (the current line), as shown below. This example changes the first 2 characters of the record from "10" to "20". So, the item number 101 becomes 201, 102 becomes 202, etc.

$ awk '{ sub("10","20"); print $0; }' items.txt
201,HD Camcorder,Video,210,10
202,Refrigerator,Appliance,850,2
203,MP3 Player,Audio,270,15
204,Tennis Racket,Sports,190,20
205,Laser Printer,Office,475,5

When a successful substitution happens, the sub function returns 1, otherwise it returns 0.

Print the record only when a successful substitution occurs:

$ awk '{ if (sub("HD","High-Def")) print $0; }'  items.txt
101,High-Def Camcorder,Video,210,10

 

Gsub Function

gsub stands for global substitution. gsub is exactly same as sub, except that all occurrences of original-string are changed to replacement-string.

In the following example, both "CA" and "Ca" are changed to "KA":

$ cat gsub.awk
BEGIN {
    state="CA is California"
    gsub("C[Aa]","KA",state);
    print state;
}

$ awk -f gsub.awk
KA is KAlifornia

As with sub, the 3rd parameter is optional. When it is not specified, awk will use $0 just as sub.

 

Match Function () and RSTART, RLENGTH variables

Match function searches for a given string (or regular expression) in the input-string, and returns a positive value when a successful match occurs.

Syntax:

match(input-string,search-string)

• input-string: This is the input-string that needs to be searched.
• search-string: This is the search-string, that needs to be search in the input-string. This can also be a regular expression.

The following example searches for the string "Cali" in the state string variable. If present, it prints a successful message.

$ cat match.awk
BEGIN {
    state="CA is California"
    if (match(state,"Cali")) {
        print substr(state,RSTART,RLENGTH),"is present in:", state;
    }
}


$ awk -f match.awk
Cali is present in: CA is California

Match sets the following two special variables. The above example uses these in the substring function call, to print the pattern in the success message.
• RSTART - The starting location of the search-string
• RLENGTH - The length of the search-string.

index(string1, subStr) == match(string1, subStr)

 

3、GAWK String Functions


 

tolower and toupper are available only in Gawk. As the name suggests the function converts the given string to lower case or upper case as shown below.

$ awk '{print tolower($0)}' items.txt
101,hd camcorder,video,210,10
102,refrigerator,appliance,850,2
103,mp3 player,audio,270,15
104,tennis racket,sports,190,20
105,laser printer,office,475,5


$ awk '{print toupper($0)}' items.txt
101,HD CAMCORDER,VIDEO,210,10
102,REFRIGERATOR,APPLIANCE,850,2
103,MP3 PLAYER,AUDIO,270,15
104,TENNIS RACKET,SPORTS,190,20
105,LASER PRINTER,OFFICE,475,5

 

 

 

 

posted @ 2013-09-06 15:56  风*依旧  阅读(344)  评论(0)    收藏  举报