爬2015年中国大学排名和写一个简单的HTML页面

一、爬中国大学排名

1、要爬取的网站 : http://www.zuihaodaxue.com/zuihaodaxuepaiming2015_0.html

2、

 

3代码如下:

 1 import requests
 2 from bs4 import BeautifulSoup 
 3 import bs4
 4 info = []#用来存放爬取信息
 5 url = "http://www.zuihaodaxue.com/zuihaodaxuepaiming2015_0.html"
 6 try:
 7     r=requests.get(url,timeout=100)
 8     r.raise_for_status()#检查链接是否正确正常返回200,异常返回404
 9     r.encoding=r.apparent_encoding
10     soup  = BeautifulSoup(r.text,"html.parser")#r.text,网页的代码text形式返回,用beautifulsoup解析text,解析的方式是html.parse
11     for tr in soup.find("tbody").children:#遍历tbody标签里的所有子标签
12         if isinstance(tr,bs4.element.Tag):
13             tds=tr.find_all("td") #找到每一行的td标签
14             #tds[0].string不能写成tds[0].String,否则会返回None
15             info.append([tds[0].string,tds[1].string,tds[2].string,tds[3].string,tds[4].string,tds[5].string,tds[6].string])
16     print("{0:^10}\t{1:^10}\t{2:^10}\t{3:^10}\t{4:^10}\t{5:^10}\t{6:^2}".format("排名","学校名称","省市","总分","人才培养得分","科学研究得分","社会服务得分",chr(12288)))
17     for i in range(50):#爬取排名前五十
18         print("{0:^10}\t{1:^10}\t{2:^10}\t{3:^10}\t{4:^10}\t{5:^10}\t{6:^2}".format(info[i][0],info[i][1],info[i][2],info[i][3],info[i][4],info[i][5],info[i][6],chr(12288)))
19 except Exception as e : #捕获异常,网址是否正常,后期调试
20     print(e)
21     

结果如下

 

 

 内容太多,没有全部截出来。。。。

4、保存为csv文件,完整代码👇

 1 import requests
 2 from bs4 import BeautifulSoup 
 3 import bs4
 4 import os
 5 import csv
 6 info = []#用来存放爬取信息
 7 url = "http://www.zuihaodaxue.com/zuihaodaxuepaiming2015_0.html"
 8 try:
 9     r=requests.get(url,timeout=100)
10     r.raise_for_status()#检查链接是否正确正常返回200,异常返回404
11     r.encoding=r.apparent_encoding
12     soup  = BeautifulSoup(r.text,"html.parser")#r.text,网页的代码text形式返回,用beautifulsoup解析text,解析的方式是html.parse
13     for tr in soup.find("tbody").children:#遍历tbody标签里的所有子标签
14         if isinstance(tr,bs4.element.Tag):
15             tds=tr.find_all("td") #找到每一行的td标签
16             #tds[0].string不能写成tds[0].String,否则会返回None
17             info.append([tds[0].string,tds[1].string,tds[2].string,tds[3].string,tds[4].string,tds[5].string,tds[6].string])
18     print("{0:^10}\t{1:^10}\t{2:^10}\t{3:^10}\t{4:^10}\t{5:^10}\t{6:^2}".format("排名","学校名称","省市","总分","人才培养得分","科学研究得分","社会服务得分",chr(12288)))
19     for i in range(50):#爬取排名前五十
20         print("{0:^10}\t{1:^10}\t{2:^10}\t{3:^10}\t{4:^10}\t{5:^10}\t{6:^2}".format(info[i][0],info[i][1],info[i][2],info[i][3],info[i][4],info[i][5],info[i][6],chr(12288)))
21 except Exception as e : #捕获异常,网址是否正常,后期调试
22     print(e)
23 save_road="C:\\pyhton编程\\中国最好大学排名csv"
24 if os.path.isfile(save_road):
25     with open(save_road,'a',newline='')as f:
26         csv_write=csv.writer(f,dialect='excel')
27         for i in range(50):
28             u=info[i]
29             csv_write.writerow(u)
30 else:
31     with open(save_road,'w',newline='')as f:
32         csv_write=csv.write(f,dialect='excel')
33         title=['排名','学校名称','省市','总分','人才培养得分','科学研究得分','社会服务得分']
34         csv_write.writerow(title)
35         for i in range(50):
36             u=info[i]
37             csv_write.writerow(u)
38             

结果

 

二、写一个简单的html页面

代码:

 

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>菜鸟教程(runoob.com)</title>
</head>
<body>

    <h1>11号同学,学习python要努力噢</h1>
    <p>努力才有收获噢!</p>
</body>
        <table boeder="1">
    <tr>
        <td>班级</td>
        <td>1班</td>
    </tr>
</table>
</html>

 

结果👇

 

 

posted @ 2020-05-16 21:16  slayer~  阅读(376)  评论(0)    收藏  举报