python爬虫简单实验

安装pip，配置环境变量

pip安装requests库

安装beautifulsoup4 使用bs4

安装lxml 解析器

使用浏览器复制css路劲得到所需内容的路劲

enumerate() 遍历数据对象

代码：

import requests
import re
from bs4 import BeautifulSoup
url='http://www.cntour.cn/'
strhtml=requests.get(url)
soup=BeautifulSoup(strhtml.text,'lxml')
data=soup.select('html body div#main div.wrapper div.mtop.firstMod.clearfix div.leftBox div.ui-tabs-panel ul.news li a')
res=[]
for i,item in enumerate(data):
    rest={
        'title':item.get_text(),
        'id':item.get('href'),
        'ID':re.findall('\d+',item.get('href'))
    }
    print(i,rest)

初步实现爬取数据

posted @ 2020-06-12 16:12 why_set 阅读(482) 评论(0) 收藏举报

刷新页面返回顶部

WHY

python爬虫简单实验

公告