爬虫作业

（２）请用requests库的get()函数访问如下一个网站20次，打印返回状态，text()内容，计算text()属性和content属性所返回网页内容的长度。（不同学号选做如下网页，必做及格）‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬

d: 360搜索主页（尾号７，８学号做）

from requests import *
try:
    for i in range(20):
        r=get("https://www.so.com/")
        r.raise_for_status()
        r.encoding='utf-8'
        print(r)
    print(len(r.text))
    print(len(r.content))
except:
    print("Error")

结果：

runfile('C:/Users/燃/untitled0.py', wdir='C:/Users/燃')
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
4986
5294

（３）这是一个简单的html页面,请保持为字符串，完成后面的计算要求。（良好）

题目：
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>菜鸟教程（runoob.com)</title> 
</head>
<body>
         <hl>我的第一个标题</hl>
         <p id="first">我的第一个段落。</p> 
</body>
                  <table border="1">
          <tr>
                  <td>row 1, cell 1</td> 
                  <td>row 1, cell 2</td> 
         </tr>
         <tr>
                  <td>row 2, cell 1</td>
                  <td>row 2, cell 2</td>
         <tr>
</table>
</html>

a 打印head标签内容和你的学号后两位‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬

# -*- encoding:utf-8 -*-
from requests import get
def getText(url):
    try:
        r = get(url, timeout=5)
        r.raise_for_status()
        r.encoding = 'utf-8'
        return r.text
    except Exception as e:
        print("Error:", e)
        return ''

from bs4 import BeautifulSoup
url = "http://www.runoob.com/"
html = getText(url)
soup = BeautifulSoup(html)


print("head:", soup.head)
print("head:", len(soup.head))
print("学号后两位：24=7")

print("body:", soup.body)
print("body:", len(soup.body))


print("title:", soup.title)


print("title_string:", soup.title.string)


print("special_id", soup.find(id='cd-login'))

posted @ 2020-12-13 23:20 whispe 阅读(93) 评论(0) 收藏举报

刷新页面返回顶部

whispe

爬虫作业

公告