2018-12-03-Python全栈开发-day91-爬虫基础知识
1.爬虫
从网站获取数据,并且从中筛选自己想要的数据。
2.基本流程
下载页面(reques)--筛选(beautifulsoup)
import requests from bs4 import BeautifulSoup data1=requests.get(url='http://www.baidu.com') data1.encoding=data1.apparent_encoding print(data1.text)
基础的从网址下载页面
import requests from bs4 import BeautifulSoup data1=requests.get(url='http://www.autohome.com.cn/news/') data1.encoding=data1.apparent_encoding # soup = BeautifulSoup(data1.text,features='html.parser') tag = soup.find(id='auto-channel-lazyload-article') li_list = soup.find_all('li')#查找所有的li类型,得到列表 for i in li_list: '''循环的元素是每个对象''' a= i.find('a') if a: print(a.attrs.get('href'))#打印a标签里面的连接 try: text1=a.find('h3').text#find the title img_url=a.find('img').attrs.get('src') img_file = requests.get(url=img_url) import uuid file_name = str(uuid.uuid4()) with open(file_name,'wb') as f: f.write(img_file.content) except Exception as e: pass 未完待续
                    
                
                
            
        
浙公网安备 33010602011771号