5.14 个人作业2

公告

Posted on 2021-05-14 17:01 ***Pepsi*** 阅读(25) 评论(0) 编辑收藏举报

源代码：

#unicoding=utf-8
import re
import urllib

def gethtml(url):
html=urllib.urlopen(url)
page=html.read()
return page
def img(page):
reg=r'src="(.+?\jpg)" alt'
imgre=re.compile(reg)
imglist=re.findall(imgre,page)
x=0
for imgurl in imglist:
urllib.urlretrieve(imgurl,'%s.jpg' % x)
x+=1
#page=gethtml("http://www.51tietu.net/tp/")
page=gethtml("http://mm.51tietu.net/qingchun/90/")
img(page)

这样执行的话，会出现IOError 大致意思时文件操作时，出现错误

在这里可以看到IOError后跟着你抓取到的jip文件的路径，但是这个路径不是整个url的路径，所以才会在urlretrieve调用imgurl的时候报错。

去网站查看整个URL

因此可以根据图中的url进行修改代码，想法：可以在urlretrieve中把url补完整，之后代码如下

#unicoding=utf-8
import re
import urllib

def gethtml(url):
html=urllib.urlopen(url)
page=html.read()
return page
def img(page):
reg=r'src="(.+?\jpg)" alt'
imgre=re.compile(reg)
imglist=re.findall(imgre,page)
x=0
for imgurl in imglist:
urllib.urlretrieve('http://mm.51tietu.net'+imgurl,'%s.jpg' % x)
x+=1
#page=gethtml("http://www.51tietu.net/tp/")
page=gethtml("http://mm.51tietu.net/qingchun/90/")
img(page)

会员力量，点亮园子希望

刷新页面返回顶部

我的语言系统被粉碎了

公告

5.14 个人作业2