linux的统计实现

场景:

将下面的数据里category里的分类统计计数

数据源

es_ip10000.json

{"_index":"order","_type":"service","_id":"107.151.83.180:22","_score":1,"_source":{"ip":"107.151.83.180","parent_category":["支撑系统"],"category":["其他支撑系统"]}}
{"_index":"order","_type":"service","_id":"107.151.84.167:22","_score":1,"_source":{"ip":"107.151.84.167","parent_category":["支撑系统"],"category":["其他支撑系统"]}}
{"_index":"order","_type":"service","_id":"107.151.84.177:22","_score":1,"_source":{"ip":"107.151.84.177","parent_category":["支撑系统"],"category":["其他支撑系统"]}}
{"_index":"order","_type":"service","_id":"107.152.188.252:1723","_score":1,"_source":{"ip":"107.152.188.252","parent_category":["网络产品"],"category":["路由器"]}}
{"_index":"order","_type":"service","_id":"107.151.89.125:1025","_score":1,"_source":{"ip":"107.151.89.125"}}
{"_index":"order","_type":"service","_id":"107.152.58.217:22","_score":1,"_source":{"ip":"107.152.58.217","parent_category":["支撑系统"],"category":["服务"]}}
{"_index":"order","_type":"subdomain","_id":"107.15.221.83:443","_score":1,"_source":{"ip":"107.15.221.83","parent_category":["办公外设","系统软件"],"category":["打印机","操作系统"]}}

_source下的category字段

cat es_ip10000.json | jq ._source.category > category.txt

输出结果

[
  "其他支撑系统"
]
[
  "其他支撑系统"
]
[
  "其他支撑系统"
]
[
  "路由器"
]
null
[
  "服务"
]
[
  "打印机",
  "操作系统"
]

用编辑器,去除 , []

处理后的结果


  "其他支撑系统"


  "其他支撑系统"


  "其他支撑系统"


  "路由器"

null

  "服务"


  "打印机"
  "操作系统"

排序 > 去重->统计->再排序

cat category.txt | sort | uniq -c | sort -n >category_count.txt

说明:

uniq -c #去重并统计

sort -n # 正序排序

sort -r # 倒序排序

输出结果:

      1 null
      1   "操作系统"
      1   "打印机"
      1   "服务"
      1   "路由器"
      3   "其他支撑系统"
     12 
[Haima的博客] http://www.cnblogs.com/haima/
posted @ 2021-08-09 15:21  HaimaBlog  阅读(15)  评论(0编辑  收藏  举报