三个脚本

这三个例子很棒，终于感觉能做出点东西了。

统计日志文件占用磁盘空间大小

案例

import os
#使用推导式将/var/log/mongodb目录下以mongod.log开头的文件名放入mongod_logs列表中
mongod_logs = [item for item in os.listdir('/var/log/mongodb') if item.startswith('mongod.log')]
#统计大小总和
sum_size = sum(os.path.getsize(os.path.join('/var/log/mongodb', item)) for item in mongod_logs)
#保留两位小数
print('sum_size: %.2f kb' %(sum_size/1024))
#out: sum_size: 5.81 kb

分析Apache访问日志

参考资料:安装Apache

注意关闭防火墙跟SELinux

$ systemctl stop firewalld
$ setenforce 0

apache日志存放路径为:/etc/httpd/logs

默认日志格式为

$ tail -1 /etc/httpd/logs/access_log 
10.154.0.2 - - [25/Dec/2020:14:44:51 +0800] "GET / HTTP/1.1" 403 4897 "-" "Mozilla/5.0 
(Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0"

'''
Apache默认格式的日志包含12列，分别为
1) 客户端IP
2) 远程登录名
3) 认证的远程用户
4) 请求的时间
5) UTC时间差
6) 请求的HTTP方法
7) 请求的资源
8) HTTP协议
9) HTTP状态码
10) 服务端发送的字节数
11) 访问来源
12) 客户浏览器信息
'''

案例

import os
#执行linux命令返回结果
line=os.popen('tail /etc/httpd/logs/access_log').read()
print(line.split()[0])
'''
out:
$ python test.py
10.154.0.2
'''

#获取网站PV，即网站访问请求数
ips = []

with open('/etc/httpd/logs/access_log') as f:
    for line in f:
        #统计所有访问的IP
        ips.append(line.split()[0])
    #数组长度即为访问请求数
    print('PV is {0}'.format(len(ips)))
    #集合去重，得到独立访客数
    print('UV is {0}'.format(len(set(ips))))

'''
out:
$ python test.py
PV is 11
UV is 2
'''

统计资源热度

使用Counter类来计数，Counter是Dict的子类。

from collections import Counter
c = Counter('abcba')
#可统计出出现字母的次数
print(c)
#增加字母a的次数
c['a'] += 1
#增加原来没有的字母
c['d'] +=1
print(c)
#统计次数出现在2以及2以上的字母
print(c.most_common(2))
'''
out:
Counter({'a': 2, 'b': 2, 'c': 1})
Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
[('a', 3), ('b', 2)]
'''

统计网站最热门的十项资源

from collections import Counter

c = Counter()
with open('/etc/httpd/logs/access_log') as f:
    for line in f:
        c[line.split()[6]] += 1
    #Popular热门 resources资源，得到出现次数大于等于10次以上的资源
    print('Popular resources : {0}'.format(c.most_common(10)))
'''
out:
$ python test.py
Popular resources : [('/', 10), ('/noindex/css/bootstrap.min.css', 11)]
'''

统计网站出错比例

案例

d = {}
with open('/etc/httpd/logs/access_log') as f:
    for line in f:
        key = line.split()[8]
        #给字典d设置value值为0
        d.setdefault(key, 0)
        #出现相同状态码累计value值加1
        d[key] += 1
        sum_requests = 0
        error_requests = 0
    #此时的d值为{'403': 2, '200': 4, '404': 5}
        #注意python3中iteritems改为items方法了
    for key,val in d.items():
        if int(key) >= 400:
            #将状态码大于400的value值相加
            error_requests += val
        #加上所有状态码的value值
        sum_requests += val
    #此时的error_requests值为7，sum_requests值为11
    #计算错误率百分比，取两位小数
    print('error rate: {0:.2f}%'.format(error_requests * 100.0 / sum_requests))

'''
out:
$ python test.py
error rate: 63.64%
'''

学习来自：《python linux系统管理与自动化运维》第四章，使用python执行linux命令，items()跟iteritems()的用法

posted @ 2020-12-25 16:57 努力吧阿团阅读(102) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

努力吧阿团

闭关中...

三个脚本

统计日志文件占用磁盘空间大小

分析Apache访问日志

统计资源热度

统计网站出错比例

公告