爬虫作业

(2)请用requests库的get()函数访问如下一个网站20次,打印返回状态,text()内容,计算text()属性和content属性所返回网页内容的长度。(不同学号选做如下网页,必做及格)

import requests
for i in range (20):
    print("",i+1,"次访问")
    r=requests.get("https://cn.bing.com/")
    r.encoding='utf-8'
    print("返回状态:",r.status_code)
    print(r.text)
    print("text属性长度:",len(r.text))
    print("content属性长度:",len(r.content))

 

(3)这是一个简单的html页面,请保持为字符串,完成后面的计算要求。(良好)

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"> 
<title>菜鸟教程(runoob.com)</title> 
</head>
<body>
    <h1>欢迎你的加入123</h1>
    <p>有你想不到的意外哦!</p>
</body>
        <table border="1">
    <tr>
        <td>班级</td>
        <td>17信计</td>
    </tr>
    <tr>
        <td>学号</td>
        <td>20</td>
    </tr>
</table>
</html>

 

(4) 爬中国大学排名网站内容,http://www.zuihaodaxue.com/zuihaodaxuepaiming2018.html‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬

要求:

爬取大学排名(学号尾号1,2,爬取年费2020,a,爬取大学排名(学号尾号3,4,爬取年费2016,)a,爬取大学排名(学号尾号5,6,爬取年费2017,)a,爬取大学排名(学号尾号7,8,爬取年费2018,))a,爬取大学排名(学号尾号9,0,爬取年费2019,)‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬

把爬取得数据,存为csv文件

 

import requests
from lxml import etree
import csv

url='https://www.shanghairanking.cn/rankings/bcur/201911'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3823.400 QQBrowser/10.7.4307.400'
}
req=requests.get(url=url,headers=headers)
req.encoding='utf-8'
# print(req.text)
html=etree.HTML(req.text)
rank=html.xpath("//td[@class='align-left']/a/text()")
r=1

with open(r'C:\Program Files\Python38\test.csv', 'w', newline='')as f:
    csv_write = csv.writer(f, dialect='excel')
    csv_write.writerow(['rank','name'])
    for i in rank:
        item=[]
        item.append(r)
        item.append(i)
        r = r + 1
        print(item)
        csv_write.writerow(item)

 

 

posted @ 2020-12-13 17:01  fangxiaolog  阅读(113)  评论(0)    收藏  举报