PYTHON 字符串

1.1 字符串

字符串类型是：str，是有顺序的字符序列，索引从0开始。

1.2 字符串的表示方法

1.2.1 普通字符串

用单引号或双引号括起来的字符序列

a = "hello"
b = 'hello'

1.2.2 原始字符串

在普通字符串前面加r ，字符串中特殊的字符不需要转义。

>>> a = "c:\abc\fbc\nbc"
>>> print(a)
c:bc
    bc
bc
>>> b = r"c:\abc\fbc\nbc"
>>> print(b)
c:\abc\fbc\nbc

可以发现：\a, \f, \n 都有特殊的意义，被转议了。而原始字符串未转义

1.2.3 长字符串（多行字符串）

用三引号表示

s = """hello
world """

y = '''hello
world '''

1.2.4 Unicode字符串

和普通字符串一样，在前面加一个u

>>> a = u'Hello\nworld!'
>>> print(a)
Hello
world!

>>> a = u"你好\u4e2d\u56fd"
>>> print(a)
你好中国

1.2.5 转义字符

字符	说明
\t	水平制表，就是table键
\n	换行符
\r	回车符
\"	双引号
\'	单引号
\\	反斜杠
\a	响铃
\b	退格
\f	换页
\v	垂直制表
\yyy	三位八进制ASCII字符编码表示的字符串，如：print("\101") 表示字符'A'
\xyy	二位十六进制ASCII字符编码表示的字符串，如：print("\x41") 表示字符'A'
\uyyyy	四位十六进制unicode字符编码表示的字符串，如：print("\u0041") 表示字符'A'

1.3 字符串格式化（一）

基本的思想是把一个值插入到一个字符串中。

"字符串 ... %s .... %d" % (值1, 值2)

其中：%s是格式符，后面的值按顺序插入到字符串中。

符号	描述
%c	格式化字符及其ASCII码
%s	格式化字符串
%d	格式化整数
%u	格式化无符号整型
%o	格式化无符号八进制数
%x	格式化无符号十六进制数
%X	格式化无符号十六进制数（大写）
%f	格式化浮点数字，可指定小数点后的精度
%e	用科学计数法格式化浮点数
%E	作用同%e，用科学计数法格式化浮点数
%g	%f和%e的简写
%G	%f 和 %E 的简写
%p	用十六进制数格式化变量的地址

char_name = 65
age = 10
score = 153

s = "姓名：%c, 年龄：%d, 成绩：%E。" % (char_name, age, score)
print(s)

格式化操作符辅助指令:

符号	功能
-	用做左对齐，默认右对齐
+	在正数前面显示加号( + )
	在正数前面显示空格
#	在八进制数前面显示零('0')，在十六进制前面显示'0x'或者'0X'(取决于用的是'x'还是'X')
0	显示的数字前面填充'0'而不是默认的空格
%	'%%'输出一个单一的'%'
(var)	映射变量(字典参数)
m.n	m 是显示的最小总宽度,n 是小数点后的位数(如果可用的话)

>>> a = "%+010.2f"%(98.7)
>>> a
'+000098.70'

>>> print("% 5d"%(98))
   98
>>> print("%#010x"%(98))
0x00000062

>>> a = {'age':10, 'name':'张三'}
>>> print("姓名：%(name)-5s, 年龄：%(age)3d" % (a))
姓名：张三   , 年龄： 10

1.3 字符串格式化（二）

基本的思想是把一个值插入到一个字符串中。

"{}...{}".format(值1，值2, ....)

其中{}是占位符，可以标记顺序号，比如：{0},{1}，和后面的值对应。

1.3.1 顺序插入：

>>> "{},{},{}".format(1,2,3)
'1,2,3'

1.3.2 指定顺序插入：

>>> "{2},{1},{0}".format(1,2,3)
'3,2,1'

1.3.3 用参数的名字插入：

>>> info = "{title}\n{url}".format(url='https://www.cnblogs.com/three-sheep', title='三只小羊')
>>> print(info)
三只小羊
https://www.cnblogs.com/three-sheep

1.3.4 用字典的方式插入：

与上一种方法类似

>>> p = {'title':'三只小羊', 'url':'https://www.cnblogs.com/three-sheep'}
>>> print("{title}\n{url}".format(**p))
三只小羊
https://www.cnblogs.com/three-sheep

1.3.5 列表的方式插入：

>>> p_list = ['三只小关', 'https://www.cnblogs.com/three-sheep']
>>> print("{0[0]}\n{0[1]}".format(p_list))
三只小关
https://www.cnblogs.com/three-sheep

0表示第一个列表(p_list)，1表示第二个列表 ...

1.3.6 对象的方式插入：

class Info:
    def __init__(self) -> None:
        self.title = '三只小关'
        self.url = 'https://www.cnblogs.com/three-sheep'


myweb = Info()
print("{0.title}\n{0.url}".format(myweb))

其中0表示第一个对象（myweb）， 1表示第二个对象, ...

1.3.7 数字的格式化

{:格式符}

其中： d, o, x, X, #, e, E, f, 0, +, m.n 等与 '格式化(一)'的意义一样。

b,d,o,x 表示以二进制，十进制，八进制，十六进制输出。

格式符	说明
	居中对齐
	左对齐
	右对齐
+	显示正负号
	保留2位小数
	右对齐，左侧补零，长度为5
	左对齐，右侧补x，x可以是其它任意字符
	每三位加一个逗号，如："{:,}".format(123456)
	百分比，保留2位小数
	科学计算法

1.4 字符串格式化（三）

f-string格式化，在format格式的基础上作一些变更，支持format的大部分的功能。

这种方法可以把变量，表达式，函数直接放到{}中。

# 变量v放在{}中，居中对齐，宽度是10
v = "hello"
a = f"{v:^10}"
print(a)

# 表达式放在{}中
b = f"{1 * 2 * 3}"
print(b)


# 函数放在{}中
def score_sum(chinese: float, math: float, english: float) -> float:
    return chinese + math + english

c = f"{score_sum(98.5,78,100):.2f}"
print(c)

1.5 字符串的相关运算

1.5.1 转换

# 转换为字符串
>>> str(100)
'100'
>>> str([1,2,3])
'[1, 2, 3]'

# 字符串转换为整数，小数
>>> int("35")
35
>>> float("35.3")
35.3

# 把编码转换为字符
>>> chr(20013)
'中'

# 把字符转换为unicode编码
>>> ord('中')
20013

1.5.2 拼接

>>> "a" + "b"
'ab'

1.5.3 重复

>>> "a"*5
'aaaaa'

1.5.4 通过索引获取字符

[index]里面是下标，下标从0开始，到长度-1，下标也可以是负数，-1表示最后一个，-2表示倒数第二个。

>>> a = "12345"
>>> a[0]
'1'
>>> a[-1]
'5'
>>> a[4]
'5'

1.5.5 字符串的截取（切片）

[开始位置:结束位置:步长]

结束位置的下标是：结束位置-1 ,即不包含结束位置。

步长默认是1

默认从左向右进行截取（正方向）：

# 截取0-3
>>> a = "123456789"
>>> a[0:3]
'123'

# 从下标2开始到最后（省略结束位置）
>>> a[2:]
'3456789'

# 从开头到下标2
>>> a[:3]
'123'

索引可以是正的：0 ~ ...n

索引可以是负的：-n ... -1

>>> a = "123456789"
>>> a[-5:-1]  # 不包括最后的位置，所以没有9
'5678'

连续截取：

>>> a = "123456789"
>>> a[1:5][0:2]  # 先取出：2345, 再从2345中取出23
'23'

指定步长截取：

>>> a = "123456789"
>>> a[::2]  # 取1个，之后向下走两步再取一个，依次。
'13579'

反方向截取（从右到左）：

# 就是指定步长为负数
>>> a = "123456789"
>>> a[-1:-5:-1]  # 开始位置为-1， 结束位置为-5（同样不包括结束位置）
'9876'

>>> a[::-1]  # 反转
'987654321'

>>> a[5:0:-1] # 下标正负都可以
'65432'

1.5.6 判断字符串中是否包含另一个字符串

# 成员运算符
# x in y      x在y中测试
# x not in y  x不在y中测试

>>> names = "Jim, Lily, Kate, Join"
>>> "Jim" in names
True
>>> "Kevin" not in names
True
>>> "Lily" not in names
False

1.6 字符串相关函数

首字母变大写：capitalize

>>> "hello world".capitalize()
'Hello world'

大写变小写，不限于a-z，比lower更强大: casefold

>>> "HELLO".casefold()
'hello'

字符串居中，左右填充指定的串：center(width, fillchar=' ')

>>> "hello".center(30,'*')
'************hello*************'

统计子串出现了几次：count(sub[, start[, end]]) -> int

>>> a = "1234567811,2345"
>>> a.count("45")
2
>>> a.count("45",10)  # 从10的位置开始
1

以指定的字符集编码：encode(encoding='utf-8', errors='strict')

以 encoding 指定的编码格式编码 string，如果出错默认报一个ValueError 的异常，除非 errors 指定的是'ignore'或者'replace'

>>> a = "中国"
>>> a.encode(encoding='gbk')
b'\xd6\xd0\xb9\xfa'

以某个字符串结尾：S.endswith(suffix[, start[, end]]) -> bool

>>> s = "hello world"
>>> s.endswith("rld")
True

把制表符替换成空格: expandtabs(tabsize=8)

>>> a = "a\tb\tc\td\te"
>>> print(a)
a       b       c       d       e
>>> a.expandtabs(tabsize=2)
'a b c d e'

查找子串第一次出现的位置：S.find(sub[, start[, end]]) -> int

也可以指定在某个范围中查找，如果未找到返回-1

>>> a = "abcdefg"
>>> a.find("de")
3

字符串格式化：string.format(...)

参考之前的笔记

>>> print("{:*>10}".format("hello"))
*****hello

字符串格式化：string.format_map(...)

同format，不过字典不用**引用了

>>> info = {"title":"三只小羊", "url":"https://www.cnblogs.com/three-sheep/"}
>>> "我的博客名称是:{title}, 地址是：{url}".format_map(info)
'我的博客名称是:三只小羊, 地址是：https://www.cnblogs.com/three-sheep/'

查找子串第一次出现位置： S.index(sub[, start[, end]]) -> int

同find，不同的是如果没有找到抛出异常。

>>> "hello".index("e")
1

测试是字母或数字：string.isalnum()

>>> "abc123".isalnum()
True

测试是字母：string.isalpha()

>>> "abcADF?".isalpha()
False

测试是ascii码中的字符：string.ascii()

>>> "a".isascii()
True
>>> "中".isascii()
False

测试只包含十进制字符(包括全角)：string.isdecimal()

>>> "12".isdecimal()
True
>>> "12.5".isdecimal()
False
>>> "１２".isdecimal()
True

测试只包含数字（包括全角）: string.isdigit()

>>> "12".isdigit()
True
>>> b"123423".isdigit()  # 对于byte字节串，isdecimal没有这个方法。
True

测试是有效的标识符：string.isidentifier()

>>> "a".isidentifier()
True
>>> "3a".isidentifier()
False
>>> "if".isidentifier()
True

测试是小写字母： string.islower()

>>> "aT".islower()
False
>>> "at".islower()
True
>>> "at中国".islower()  # 至少有一个是小写字母
True
>>> "中".islower()
False

测试是数字（包括中文数字，罗马数字）：string.isnumeric()

>>> "123一伍五".isnumeric()
True

测试是可打印字符：isprintable()

>>> "asdf\t".isprintable()   # 有制表符
False
>>> "asdf sdf".isprintable()
True

测试只包含空格(空格，回车符，制表符等)： string.isspace()

>>> "\n  \t \v \f".isspace()
True

测试是标题(每个单词首字母大写，其它小写)：string.istitle()

>>> "Hello World!".istitle()
True

测试是大写字母：string.isupper()

>>> "HELLO".isupper()
True

以某字符为分隔符，把序列中的元素（字符型）合并成一个字符串：string.join(iterable)

>>> "*".join("hello")
'h*e*l*l*o'
>>> "*".join(["1","2","3","4","5"])
'1*2*3*4*5'

左对齐，右侧填充： string.ljust(width, fillchar=' ')

>>> "hello".ljust(10,"*")
'hello*****'

转换为小写字母： string.lower()

>>> "ABCD".lower()
'abcd'

去掉左空格(可指定字符)：string.lstrip(chars=None)

>>> "     aaa   ".lstrip("\t\n *")
'aaa   '
>>> "**  aaa   ".lstrip("\t\n *")
'aaa   '

按字符串分隔成三部分：string.partition( sep)

>>> a = "name = 张三"
>>> a.partition("=")
('name ', '=', ' 张三')
>>> a.partition(":")
('name = 张三', '', '')   # 不可分，后两个元素为空
>>> "a=张三=李四".partition("=")   # 从第一个开始分
('a', '=', '张三=李四')

删除前缀：string.removeprefix(prefix)

>>> "hello world".removeprefix("hell")
'o world'

删除后缀： string.removesuffix(suffix)

>>> "hello world".removesuffix("rld")
'hello wo'

替换：string.replace(old, new, count=-1)

>>> "hello world".replace('o','*')  # 全替换
'hell* w*rld'
>>> "hello world".replace('o','*',1)  # 替换一次
'hell* world'

从后向前查找，找不到返回-1： S.rfind(sub[, start[, end]]) -> int

>>> "hello world".rfind("o")
7
>>> "hello world".find("o")  # 与find方向不同
4

从后向前查找，找不到抛出异常：S.rindex(sub[, start[, end]]) -> int

>>> "hello world".rindex("o")
7

右对齐，左侧填充： rjust(width, fillchar=' ')

>>> "hello".rjust(10,"*")
'*****hello'

与 partition相同，从右开始：rpartition(sep)

>>> "a=张三=李四".rpartition("=")
('a=张三', '=', '李四')

字符串分割（从右向左）：rsplit(sep=None, maxsplit=-1)

>>> "aa,bb,cc,dd".rsplit(",")
['aa', 'bb', 'cc', 'dd']
>>> "aa,bb,cc,dd".rsplit(",",2)  # 指定分几次
['aa,bb', 'cc', 'dd']

去掉右空格(可指定字符)：string.rstrip(chars=None)

>>> " aaaaa    ".rstrip()
' aaaaa'

字符串分割（从右向右）：split(sep=None, maxsplit=-1)

>>> "aa,bb,cc".split(",")
['aa', 'bb', 'cc']

按行分割： splitlines(keepends=False)

>>> a = """a
... b
... c
... d
... e"""
>>> a.splitlines()
['a', 'b', 'c', 'd', 'e']
>>> a.splitlines(True)  # 保留换行符(\n, \r\n, \r)
['a\n', 'b\n', 'c\n', 'd\n', 'e']

测试以前缀开头：S.startswith(prefix[, start[, end]]) -> bool

>>> "abcdef".startswith("abc")
True

删除左右空白：string. strip(chars=None)

>>> "   aa   ".strip()
'aa'

翻转大小写： string.swapcase()

>>> "AbcDEfg".swapcase()
'aBCdeFG'

每个单词首字母大写：string.title()

>>> "aaa,bbb".title()
'Aaa,Bbb'
>>> "aAA,bB".title()
'Aaa,Bb'

翻译功能：string. translate(table)

根据table设定的翻译对照表进行翻译

f = "1324576"
r = "iloveyu"

table = str.maketrans(f,r)  # 创建对照表(str类的静态方法)

a = "1 3245 726"
print(a.translate(table))   # i love you

转换为大写：string.upper()

>>> "abc".upper()
'ABC'

左侧填充0： string.zfill(self, width)

>>> "5".zfill(2)
'05'

posted @ 2022-11-28 20:52 叁只小羊阅读(131) 评论(0) 收藏举报

刷新页面返回顶部

three-sheep