爬虫作业
(2)请用requests库的get()函数访问如下一个网站20次,打印返回状态,text()内容,计算text()属性和content属性所返回网页内容的长度。(不同学号选做如下网页,必做及格)
import requests for i in range (20): print("第",i+1,"次访问") r=requests.get("https://cn.bing.com/") r.encoding='utf-8' print("返回状态:",r.status_code) print(r.text) print("text属性长度:",len(r.text)) print("content属性长度:",len(r.content))

(3)这是一个简单的html页面,请保持为字符串,完成后面的计算要求。(良好)
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>菜鸟教程(runoob.com)</title> </head> <body> <h1>欢迎你的加入123</h1> <p>有你想不到的意外哦!</p> </body> <table border="1"> <tr> <td>班级</td> <td>17信计</td> </tr> <tr> <td>学号</td> <td>20</td> </tr> </table> </html>

(4) 爬中国大学排名网站内容,http://www.zuihaodaxue.com/zuihaodaxuepaiming2018.html
要求:
爬取大学排名(学号尾号1,2,爬取年费2020,a,爬取大学排名(学号尾号3,4,爬取年费2016,)a,爬取大学排名(学号尾号5,6,爬取年费2017,)a,爬取大学排名(学号尾号7,8,爬取年费2018,))a,爬取大学排名(学号尾号9,0,爬取年费2019,)
把爬取得数据,存为csv文件
import requests from lxml import etree import csv url='https://www.shanghairanking.cn/rankings/bcur/201911' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3823.400 QQBrowser/10.7.4307.400' } req=requests.get(url=url,headers=headers) req.encoding='utf-8' # print(req.text) html=etree.HTML(req.text) rank=html.xpath("//td[@class='align-left']/a/text()") r=1 with open(r'C:\Program Files\Python38\test.csv', 'w', newline='')as f: csv_write = csv.writer(f, dialect='excel') csv_write.writerow(['rank','name']) for i in rank: item=[] item.append(r) item.append(i) r = r + 1 print(item) csv_write.writerow(item)


浙公网安备 33010602011771号