Python练习六十:网页分析,找出里面的正文与链接

网页分析,找出里面的正文与链接

代码如下:

from urllib import request
from bs4  import BeautifulSoup

request = request.urlopen('https://www.baidu.com/')
request_text = request.read().decode('utf-8')
soup = BeautifulSoup(request_text,'lxml')
# print(soup.prettify)
url = soup.findAll('a')
contents1 = soup.contents  #全部子节点
href1 = [] #链接
string1 = []  #正文
for i in url:
    href1.append(i['href'])
for string in soup.stripped_strings:
    string1.append(repr(string))
print(href1)
print('-----------------------------')
print(contents1)
print('-----------------------------')
print(string1)

执行结果忽略

网页分析可具体查看:https://www.cnblogs.com/pinpin/p/10260405.html

posted @ 2019-01-21 10:46  阳光宝贝-沐沐  阅读(185)  评论(0编辑  收藏  举报