以此站点为例:
from bs4 import BeautifulSoup
import re
from urllib.request import urlopen

html=urlopen("http://www.pythonscraping.com/pages/page3.html")
bsobj=BeautifulSoup(html)
1)精确打印图像链接,剔除那些logo文件,其他隐藏的图像文件,中间用正则表达式匹配
tag_list=bsobj.find_all("img",{"src":re.compile("\.\./img\/gifts\/img.*\.jpg")})
print(tag_list)
2)打印img1图像中父标签的之前一个标签的文本
namelist=bsobj.find("img",{"src":"../img/gifts/img1.jpg"}).parent.previous_sibling.get_text()
print(namelist)
3)打印页面列表中除了标题行外的所有行的源代码
for sibling in bsobj.find("table",{"id":"giftList"}).tr.next_siblings:
print(sibling)