一起学Python：正则表达式概述

re模块操作

在Python中需要通过正则表达式对字符串进行匹配的时候，可以使用一个模块，名字为re

1. re模块的使用过程

    #coding=utf-8

    # 导入re模块
    import re    # 使用match方法进行匹配操作
    result = re.match(正则表达式,要匹配的字符串)        # 如果上一步匹配到数据的话，可以使用group方法来提取数据
    result.group()

2. re模块示例(匹配以itcast开头的语句)

    #coding=utf-8

    import re

    result = re.match("itcast","itcast.cn")

    result.group()

运行结果为：

itcast

3. 说明

re.match() 能够匹配出以xxx开头的字符串

匹配单个字符

在上一小节中，了解到通过re模块能够完成使用正则表达式来匹配字符串

本小节，将要讲解正则表达式的单字符匹配

字符	功能
.	匹配任意1个字符（除了\n）
[ ]	匹配[ ]中列举的字符
\d	匹配数字，即0-9
\D	匹配非数字，即不是数字
\s	匹配空白，即空格，tab键
\S	匹配非空白
\w	匹配单词字符，即a-z、A-Z、0-9、_
\W	匹配非单词字符

示例1：

#coding=utf-8import re

ret = re.match(".","M")
print(ret.group())

ret = re.match("t.o","too")
print(ret.group())

ret = re.match("t.o","two")
print(ret.group())

运行结果：

M
too
two

示例2：

#coding=utf-8import re# 如果hello的首字符小写，那么正则表达式需要小写的hret = re.match("h","hello Python") 
print(ret.group())# 如果hello的首字符大写，那么正则表达式需要大写的Hret = re.match("H","Hello Python") 
print(ret.group())# 大小写h都可以的情况ret = re.match("[hH]","hello Python")
print(ret.group())
ret = re.match("[hH]","Hello Python")
print(ret.group())
ret = re.match("[hH]ello Python","Hello Python")
print(ret.group())# 匹配0到9第一种写法ret = re.match("[0123456789]Hello Python","7Hello Python")
print(ret.group())# 匹配0到9第二种写法ret = re.match("[0-9]Hello Python","7Hello Python")
print(ret.group())

ret = re.match("[0-35-9]Hello Python","7Hello Python")
print(ret.group())# 下面这个正则不能够匹配到数字4，因此ret为Noneret = re.match("[0-35-9]Hello Python","4Hello Python")# print(ret.group())

运行结果：

h
H
h
H
Hello Python7Hello Python7Hello Python7Hello Python

示例3：

#coding=utf-8import re# 普通的匹配方式ret = re.match("嫦娥1号","嫦娥1号发射成功") 
print(ret.group())

ret = re.match("嫦娥2号","嫦娥2号发射成功") 
print(ret.group())

ret = re.match("嫦娥3号","嫦娥3号发射成功") 
print(ret.group())# 使用\d进行匹配ret = re.match("嫦娥\d号","嫦娥1号发射成功") 
print(ret.group())

ret = re.match("嫦娥\d号","嫦娥2号发射成功") 
print(ret.group())

ret = re.match("嫦娥\d号","嫦娥3号发射成功") 
print(ret.group())

运行结果：

嫦娥1号
嫦娥2号
嫦娥3号
嫦娥1号
嫦娥2号
嫦娥3号

说明

其他的匹配符参见后面章节的讲解

匹配多个字符

匹配多个字符的相关格式

字符	功能
*	匹配前一个字符出现0次或者无限次，即可有可无
+	匹配前一个字符出现1次或者无限次，即至少有1次
?	匹配前一个字符出现1次或者0次，即要么有1次，要么没有
{m}	匹配前一个字符出现m次
{m,n}	匹配前一个字符出现从m到n次

示例1：

需求：匹配出，一个字符串第一个字母为大小字符，后面都是小写字母并且这些小写字母可有可无

#coding=utf-8import re

ret = re.match("[A-Z][a-z]*","M")
print(ret.group())

ret = re.match("[A-Z][a-z]*","MnnM")
print(ret.group())

ret = re.match("[A-Z][a-z]*","Aabcdef")
print(ret.group())

运行结果：

M
Mnn
Aabcdef

示例2：

需求：匹配出，变量名是否有效

#coding=utf-8import re
names = ["name1", "_name", "2_name", "__name__"]for name in names:
ret = re.match("[a-zA-Z_]+[\w]*",name) if ret:
print("变量名 %s 符合要求" % ret.group()) else:
print("变量名 %s 非法" % name)

运行结果：

变量名 name1 符合要求
变量名 _name 符合要求
变量名 2_name 非法
变量名 __name__ 符合要求

示例3：

需求：匹配出，0到99之间的数字

#coding=utf-8import re
ret = re.match("[1-9]?[0-9]","7")
print(ret.group())
ret = re.match("[1-9]?\d","33")
print(ret.group())
ret = re.match("[1-9]?\d","09")
print(ret.group())

运行结果：

7330 # 这个结果并不是想要的，利用$才能解决

示例4：

需求：匹配出，8到20位的密码，可以是大小写英文字母、数字、下划线

#coding=utf-8import re
ret = re.match("[a-zA-Z0-9_]{6}","12a3g45678")
print(ret.group())
ret = re.match("[a-zA-Z0-9_]{8,20}","1ad12f23s34455ff66")
print(ret.group())

运行结果：

12a3g41ad12f23s34455ff66

匹配开头结尾

字符	功能
^	匹配字符串开头
$	匹配字符串结尾

示例1：

需求：匹配163.com的邮箱地址


#coding=utf-8import re

email_list = ["xiaoWang@163.com", "xiaoWang@163.comheihei", ".com.xiaowang@qq.com"]for email in email_list:
    ret = re.match("[\w]{4,20}@163\.com", email)   
     if ret:
        print("%s 是符合规定的邮件地址,匹配后的结果是:%s" % (email, ret.group()))  
          else:
        print("%s 不符合要求" % email)

运行结果:


xiaoWang@163.com 
是符合规定的邮件地址,匹配后的结果是:
xiaoWang@163.com
xiaoWang@163.comheihei
是符合规定的邮件地址,匹配后的结果是:
xiaoWang@163.com
.com.xiaowang@qq.com 不符合要求

完善后


email_list = ["xiaoWang@163.com", "xiaoWang@163.comheihei", ".com.xiaowang@qq.com"]

for email in email_list:
    ret = re.match("[\w]{4,20}@163\.com$", email)
    if ret:
        print("%s 是符合规定的邮件地址,匹配后的结果是:%s" % (email, ret.group()))
    else:
        print("%s 不符合要求" % email)

运行结果：

xiaoWang@163.com 是符合规定的邮件地址,
匹配后的结果是:xiaoWang@163.com
xiaoWang@163.comheihei 不符合要求
.com.xiaowang@qq.com 不符合要求

匹配分组

字符	功能
(ab)	将括号中字符作为一个分组
\num	引用分组num匹配到的字符串

示例1：

需求：匹配出0-100之间的数字


#coding=utf-8

import re

ret = re.match("[1-9]?\d","8")
print(ret.group())  # 8

ret = re.match("[1-9]?\d","78")
print(ret.group())  # 78

# 不正确的情况
ret = re.match("[1-9]?\d","08")
print(ret.group())  # 0

# 修正之后的
ret = re.match("[1-9]?\d$","08")
if ret:
    print(ret.group())
else:
    print("不在0-100之间")

# 添加|
ret = re.match("[1-9]?\d$|100","8")
print(ret.group())  # 8

ret = re.match("[1-9]?\d$|100","78")
print(ret.group())  # 78

ret = re.match("[1-9]?\d$|100","08")
# print(ret.group())  # 不是0-100之间

ret = re.match("[1-9]?\d$|100","100")
print(ret.group())  # 100

示例2：

需求：匹配出163、126、qq邮箱


#coding=utf-8

import re

ret = re.match("\w{4,20}@163\.com", "test@163.com")
print(ret.group())  # test@163.com

ret = re.match("\w{4,20}@(163|126|qq)\.com", "test@126.com")
print(ret.group())  # test@126.com

ret = re.match("\w{4,20}@(163|126|qq)\.com", "test@qq.com")
print(ret.group())  # test@qq.com

ret = re.match("\w{4,20}@(163|126|qq)\.com", "test@gmail.com")
if ret:
    print(ret.group())
else:
    print("不是163、126、qq邮箱")  # 不是163、126、qq邮箱

不是以4、7结尾的手机号码(11位)


import re

tels = ["13100001234", "18912344321", "10086", "18800007777"]for tel in tels:
    ret = re.match("1\d{9}[0-35-68-9]", tel) 
       if ret:
        print(ret.group())   
         else:
        print("%s 不是想要的手机号" % tel)

提取区号和电话号码


>>> ret = re.match("([^-]*)-(\d+)","010-12345678")
>>> ret.group()
'010-12345678'
>>> ret.group(1)
'010'
>>> ret.group(2)
'12345678'

示例3：

需求：匹配出<html>hh</html>


#coding=utf-8

import re

# 能够完成对正确的字符串的匹配
ret = re.match("<[a-zA-Z]*>\w*</[a-zA-Z]*>", "<html>hh</html>")
print(ret.group())

# 如果遇到非正常的html格式字符串，匹配出错
ret = re.match("<[a-zA-Z]*>\w*</[a-zA-Z]*>", "<html>hh</htmlbalabala>")
print(ret.group())

# 正确的理解思路：如果在第一对<>中是什么，按理说在后面的那对<>中就应该是什么

# 通过引用分组中匹配到的数据即可，但是要注意是元字符串，即类似 r""这种格式
ret = re.match(r"<([a-zA-Z]*)>\w*</\1>", "<html>hh</html>")
print(ret.group())

# 因为2对<>中的数据不一致，所以没有匹配出来
test_label = "<html>hh</htmlbalabala>"
ret = re.match(r"<([a-zA-Z]*)>\w*</\1>", test_label)
if ret:
    print(ret.group())
else:
    print("%s 这是一对不正确的标签" % test_label)

运行结果：


<html>hh</html>
<html>hh</htmlbalabala>
<html>hh</html>
<html>hh</htmlbalabala> 这是一对不正确的标签

示例4：

需求：匹配出<html><h1>www.itcast.cn</h1></html>


#coding=utf-8

import re

labels = ["<html><h1>www.itcast.cn</h1></html>", "<html><h1>www.itcast.cn</h2></html>"]

for label in labels:
    ret = re.match(r"<(\w*)><(\w*)>.*</\2></\1>", label)
    if ret:
        print("%s 是符合要求的标签" % ret.group())
    else:
        print("%s 不符合要求" % label)

运行结果：


<html><h1>www.itcast.cn</h1></html> 是符合要求的标签
<html><h1>www.itcast.cn</h2></html> 不符合要求

示例5：

需求：匹配出<html><h1>www.itcast.cn</h1></html>


#coding=utf-8

import re

ret = re.match(r"<(?P<name1>\w*)><(?P<name2>\w*)>.*</(?P=name2)></(?P=name1)>", "<html><h1>www.itcast.cn</h1></html>")
ret.group()

ret = re.match(r"<(?P<name1>\w*)><(?P<name2>\w*)>.*</(?P=name2)></(?P=name1)>", "<html><h1>www.itcast.cn</h2></html>")
ret.group()

注意：`(?P<name>)`和`(?P=name)`中的字母p大写

运行结果：

image

原文链接：做最专业最懂你的python开发者交流平台，提供你最需要的开发学习资源。我们专注于python开发技术的学习与交流，我们坚持，每天进步一小步，人生进步一大步！关注【Python开发者交流平台】，与我们一起学习进步。https://www.jianshu.com/u/05f416aefbe1

posted @ 2018-02-02 10:50 前端视听阅读(140) 评论(0) 收藏举报

刷新页面返回顶部

前端视听

一起学Python：正则表达式概述

re模块操作

1. re模块的使用过程

2. re模块示例(匹配以itcast开头的语句)

3. 说明

匹配单个字符

示例1：

示例2：

示例3：

说明

匹配多个字符

示例1：

示例2：

示例3：

示例4：

匹配开头结尾

示例1：

完善后

匹配分组

示例1：

示例2：

不是以4、7结尾的手机号码(11位)

提取区号和电话号码

示例3：

示例4：

示例5：

注意：(?P<name>)和(?P=name)中的字母p大写

公告

注意：`(?P<name>)`和`(?P=name)`中的字母p大写