awk数组

简介：

数组

数组: 一个个元素按一定顺序排列的集合。
     把有限个类型相同的变量用一个名字命名，然后用编号区分他们的变量的集合，这个名字称为数组名，编号称为数组下标。
     组成数组的各个变量称为数组的分量，也叫数组的元素，有时也称为 下标变量。

在awk中数组叫关联数组因为下标记可以是数也可以是串。awk中的数组不必提前声明，也不必声明大小。数组元素用 0或空串来初始化，这根据上下文而定。

一.定义方法
1.用数值作数组索引(下标)

array[1]="Tom"
array[2]="Jerry"

2.用字符串做数组索引(下标)

array_test["one"]="Hello World"
array_test["two"]="This is the cat"

使用：

使用${array_name[@]} 或者 ${array_name[*]} 都可以全部显示数组中的元素

同样${#array_name[@]} 或者 ${#array_name[*]}都可以用来求数组的长度

3.打印数组

[root@host ~]# awk 'BEGIN{array[1]="Tom";array[2]="Jerry";for(key in array) print key,array[key]}'
1 Tom
2 Jerry


[root@host test]# awk 'BEGIN{arr[1]="tom";arr[2]="jerry";} END{for(key in arr) print key,arr[key]}' /etc/hosts
1 tom
2 jerry

BEGIN和END区别：

是给程序赋予初始状态 和 在程序结束之后执行一些扫尾的工作。
任何在BEGIN之后列出的操作(在{}内)将在awk开始扫描输入之前执行，而END之后列出
的操作将在扫描全部的输入之后执行。通常使用BEGIN来显示变量和预置(初始化)变量，
使用END来输出最终结果。

View Code

4. awk脚本

[root@host test]# cat t.awk
#!/bin/awk
BEGIN{
    arr[1]="tom"
    arr[2]="jerry"
    for(key in arr)
    print key,arr[key]
}
[root@host test]# awk -f t.awk
1 tom
2 jerry

转为命令行：
[root@host test]# awk 'BEGIN{arr[1]="tom";arr[2]="jerry";for(key in arr) print key,arr[key]}'
1 tom
2 jerry

View Code

5.把文件内容第一列作为下标k，第二列作为值S[k]，放入数组S[]，然后输出

a.
[root@host test]# cat /etc/passwd |awk 'BEGIN{arr[1]="tom";arr[2]="jerry";} END{for(key in arr) print key,arr[key]}' > t.log
[root@host test]# cat t.log
1 tom
2 jerry

[root@host test]# awk '{S[$1]=$2}END{for(k in S) print k,S[k]}' t.log
1 tom
2 jerry


b.
[root@host test]# cat t1.log
1 ABC
2 QWE
[root@host test]# awk '{S[$1]=$2}END{for(k in S) print k,S[k]}' t1.log
1 ABC
2 QWE

View Code

例1：处理文件内容，将域名取出并根据域名进行计数排序处理

dom.log
http://www.tomcat.org/index.html
http://www.tomcat.org/1.html
http://post.tomcat.org/index.html
http://mp3.tomcat.org/index.html
http://www.tomcat.org/3.html
http://post.tomcat.org/2.html

方法1：
[root@host test]# cut -d / -f3 dom.log |sort -r |uniq -c
      3 www.tomcat.org
      2 post.tomcat.org
      1 mp3.tomcat.org

方法2：
[root@host test]# awk -F / '{print $3}' dom.log |sort -r |uniq -c
      3 www.tomcat.org
      2 post.tomcat.org
      1 mp3.tomcat.org
      
     
方法3：
[root@host test]# awk -F "/" '{S[$3]++} END{for(k in S) print k,S[k]}' dom.log |sort -rn -k2 |head   #选此
www.tomcat.org 3
post.tomcat.org 2
mp3.tomcat.org 1

View Code

例2.统计体重报表

[root@host test]# cat test.txt
001 name Tom 100kg
002 name Jerry 80kg
003 name Hon 111kg
004 name Dii 80kg
005 name Rain 80kg
006 name Yom 100kg

方法1
[root@C1 monitor_man]# awk '{S[$4]++} END{for(a in S) print a,S[a]}' t.sh 
100kg 2
111kg 1
80kg 3


方法2
[root@host test]# awk '/^00/ {++S[$NF]} END {for (a in S) print a,S[a]}' test.txt
100kg 2
80kg 3
111kg 1


#提示：
key=100kg S[100kg]=2
key=80kg S[80kg]=3
key=111kg S[111kg]=1

View Code

例3：统计Apache日志单ip访问请求数排名

[root@host test]# cat a.log 
10.0.0.41 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.43 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.42 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.46 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.42 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.41 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.41 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.46 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -


方法1：
[root@host test]# awk '{++S[$1]} END{for(key in S) print key,S[key]}' a.log |sort -rn -k2
10.0.0.41 3
10.0.0.47 2
10.0.0.46 2
10.0.0.42 2
10.0.0.43 1

提示：$1为第一个域的内容。-k2 为对第二个字段排序，即对数量排序。


方法2：
[root@host test]# awk '{print $1}' a.log|sort|uniq -c|sort -rn -k1
      3 10.0.0.41
      2 10.0.0.47
      2 10.0.0.46
      2 10.0.0.42
      1 10.0.0.43
      
      
提示：简单易用



方法3：
[root@host test]# sed 's/- -.*$//g' a.log|sort|uniq -c|sort -rn -k1
      3 10.0.0.41 
      2 10.0.0.47 
      2 10.0.0.46 
      2 10.0.0.42 
      1 10.0.0.43 
      
提示：sed管道后的第一个sort是让所有一样的IP挨着，因为uniq -c只能对相邻的IP行去重计数

View Code

例4：查看状态连接数

[root@master ~]# netstat -n |awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'


命令拆分：

/^tcp/     过滤出以tcp开头的行

S[]         定义了 S 数组，在awk中，数组下标通常从 1 开始，而不是 0。
 
NF          当前记录里域个数，默认以空格分隔，如上所示的记录，NF域个数等于6

$NF        表示一行的最后一个域的值，如上所示的记录，$NF也就是$6，表示第6个字段的值，也就是SYN_RECV或TIME_WAIT等。

S[$NF]     表示数组元素的值，如上所示的记录，就是S[TIME_WAIT]状态的连接数

++S[$NF]   表示把某个数加一，如上所示的记录，就是把S[TIME_WAIT]状态的连接数加 1

View Code

awk例子：
https://blog.51cto.com/oldboy/1184177

posted @ 2018-08-01 17:36 shadow3 阅读(726) 评论(0) 收藏举报

刷新页面返回顶部

shadow3

awk数组

公告