4.1 字符串常量（python）

《Python Linux系统管理与自动化运维》学习之路:

1、字符串介绍

定义字符串，单引号，双引号
转义字符，反斜杠‘\’
原始字符串，‘r’，抑制转义
字符串较长较负责，可使用三引号定义，''' '''或""" """,三引号内的引号，换行符，制表符等特殊字符，都被认为是普通字符，多行字符串也不受代码块缩进规则限制，因为它本身就不是代码，而是普通字符串。
两个相连的字符串会自动组成一个新的字符串：

In [1]: s = 'hello' 'world'
In [3]: s
Out[3]: 'helloworld'

字符串不可变，是字符的有序组合
下标访问，分片操作

列表反序 s[::-1]
内置函数reversed(seq)
使用reversed（）返回一个迭代器，需要使用循环来访问

In [5]: s
Out[5]: 'hello,world'
In [8]: ''.join(reversed(s))
Out[8]: 'dlrow,olleh'
In [15]: for i in reversed(s):
   ....:     print(i)
   ....:     
d
l
r
o
w
,
o
l
l
e
h

a.sort()是对列表a进行原地修改，而且只能修改列表
sorted（a）对字符串、列表、元组都能排序，该函数返回一个排序好的列表

2、字符串函数

通用操作
获取字符串长度 len(x)
判断元素是否存在于集合中： 'x' in s
都可应用于元组，列表等有序集合中

与大小写有关的方法：
upper 将字符串转换为大写
lower 将字符串转换为小写
isupper 判断字符串是否都为大写
islower 判断字符串是否都为小写
swapcase 将字符串中的大写转小写，小写转大写
capitalize 将首字母转大写
istitle 判断字符串是不是一个标题

判断类方法
s.isalpha 只包含字母，非空
s.isalnum 只包含字母和数字，非空
s.isspace 包含空格、制表符、换行符、非空
s.isdecimal 只包含数字，非空

字符串方法
判断参数是否为字符串的前缀或后缀
startwith
endswith
实例：

[item  for item in os.listdir('.') if item.startswith('index')]

In [28]: index = [item  for item in os.listdir('.') if item.startswith('index')]
In [29]: size = [os.path.getsize(os.path.join('/root', item)) for item in index]
In [30]: print(size)
[20810, 20810, 2381, 20810, 20810, 20810, 20810, 2381, 20810]

查找类函数
find 查找字串在字符串中的位置，查找失败，返回-1
index 与find类似，查找失败，抛出ValueError异常
rfind 与find类似，区别在于从后查找
rindex 与index类似，区别在于从后查找
实例：

In [31]: s = 'Return the lower index in S where substring sub is found'
In [32]: s.find('in')
Out[32]: 17
可以指定查找范围，如从下标18开始：
In [33]: s.find('in', 18)
Out[33]: 23
In [34]: s.find('not exist')
Out[34]: -1

判断一个字符串是另一个字符串的字串，正确应使用in和not in

字符串操作方法
join 接受任何可迭代的对象，不止列表
实例：

In [38]: with open('/etc/passwd') as fd:
   ....:     print('###'.join(fd))
   ....:     
root:x:0:0:root:/root:/bin/bash
###bin:x:1:1:bin:/bin:/sbin/nologin
###daemon:x:2:2:daemon:/sbin:/sbin/nologin
###adm:x:3:4:adm:/var/adm:/sbin/nologin

字符串拼接：

>>> print('root', '/root', 100, sep=':')
root:/root:100
# 适合python3

拆分函数split()，默认是空白字符（空格。换行符，制表符）进行拆分
裁剪函数 strip(), rstrip(), lstrip()
实例：

In [4]: s = 'root:x:0:0:root:/root:/bin/bash'
In [5]: s.split(':')
Out[5]: ['root', 'x', '0', '0', 'root', '/root', '/bin/bash']
In [7]: s = 'a b c d'
In [8]: s.split()
Out[8]: ['a', 'b', 'c', 'd']

In [9]: s = ' \thello, \tworld \n'
In [12]: s.strip()
Out[12]: 'hello, \tworld'
In [13]: s.rstrip()
Out[13]: ' \thello, \tworld'
In [14]: s.lstrip()
Out[14]: 'hello, \tworld \n

可以给strip函数传入参数，参数是需要裁剪的字符集和，字符串的顺序不重要，重复字符没有任何效果

In [15]: s = '##hello, world##'
In [16]: s.strip('#')
Out[16]: 'hello, world'
In [17]: s.strip('###')
Out[17]: 'hello, world'
In [18]: s.strip('h#d')
Out[18]: 'ello, worl'
In [19]: s.strip('dh#')
Out[19]: 'ello, worl'

3、实例
使用python分析Apache的访问日志
（1）统计PV，UV

#!/usr/bin/python
#-*- coding: UTF-8 -*-
from __future__ import print_function

ips = []
with open('access.log') as f:
    for line in f:
        ips.append(line.split()[0])

print('PV is {0}'.format(len(ips)))
print('UV is {0}'.format(len(set(ips))))

(2 )统计热门资源
使用collections.Couter，使用方法与字典类似，对于普通的计数功能，比字典更加好用

In [26]: from collections import Counter

In [27]: c = Counter('abcba')
In [28]: c
Out[28]: Counter({'a': 2, 'b': 2, 'c': 1})
In [29]: c['a'] += 1
In [30]: c
Out[30]: Counter({'a': 3, 'b': 2, 'c': 1})
In [31]: c['a'] += 1
In [32]: c
Out[32]: Counter({'a': 4, 'b': 2, 'c': 1})
In [33]: c
Out[33]: Counter({'a': 4, 'b': 2, 'c': 1})
In [34]: c.most_common(2)
Out[34]: [('a', 4), ('b', 2)]
In [35]: c['d'] += 1
In [36]: c
Out[36]: Counter({'a': 4, 'b': 2, 'c': 1, 'd': 1})
In [37]: c.most_common(3)
Out[37]: [('a', 4), ('b', 2), ('c', 1)]

如果一个键不存在计数器中，直接对这个键操作运算也不会报错，会添加进去
most_common 显示Counter中取值最大的几个元素

#!/usr/bin/python
#-*- coding: UTF-8 -*-
from __future__ import print_function
from collections import Counter

c = Counter()
with open('access.log') as f:
    for line in f:
        c[line.split()[6]] += 1
 
print('Popular resources : {0}'.format(c.most_common(10)))

（3）分析错误请求数

#!/usr/bin/python
#-*- coding: UTF-8 -*-
from __future__ import print_function

d = {}
with open('access.log') as f:
    for line in f:
        key = line.split()[8]
        d.setdefault(key, 0)
        d[key] += 1
sum_requests = 0
error_requests = 0

for key, val in d.iteritems():
    if int(key) >=400:
        error_requests += val
    sum_requests += val
 
print(error_requests, sum_requests)
print('error rate : {0:.2f}%'.format(error_requests * 100.0 / sum_requests))

4、字符串格式化 format
（1）占位符或下标形式访问

In [6]: '{} is apple'.format('apple')
Out[6]: 'apple is apple'

In [7]: '{0} is apple'.format('apple')
Out[7]: 'apple is apple'

（2）关键字参数形式访问

In [2]: dic1 = {'a':1, 'b':2, 'c':3}
In [5]: '{a} is 1, {b} is 2, {c} is 3, {a} little {c}'.format(**dic1)
Out[5]: '1 is 1, 2 is 2, 3 is 3, 1 little 3'

（3）可直接访问对象的属性

（4）format功能

精度：
In [8]: '{:.2f}'.format(3.1415926)
Out[8]: '3.14'
显示正数符合：
In [9]: '{:+.2f}'.format(3.1415926)
Out[9]: '+3.14'
宽度：
In [10]: '{:10.2f}'.format(3.1415926)
Out[10]: '      3.14'
对其方式：
In [11]: '{:^10.2f}'.format(3.1415926)
Out[11]: '   3.14   '
填充符号：
In [12]: '{:_^10.2f}'.format(3.1415926)
Out[12]: '___3.14___'
千位分隔符：
In [13]: '{:,}'.format(31415926)
Out[13]: '31,415,926'
综合显示：
In [14]: '{:_^+20,.2f}'.format(31415926)
Out[14]: '___+31,415,926.00___'

posted @ 2017-11-08 15:30 男孩别哭阅读(2644) 评论(0) 收藏举报

刷新页面返回顶部

男孩别哭

4.1 字符串常量（python）

公告