摘要: ```python import requests from bs4 import BeautifulSoup import re from mysql_control import MySQL # 爬虫三部曲 # 1.发送请求 def get_html(url): response = requests.get(url) return response # 2.解析数据 def parse_da 阅读全文
posted @ 2020-01-02 19:10 chanyuli 阅读(166) 评论(0) 推荐(0)
摘要: ```python import requests import re import uuid from concurrent.futures import ThreadPoolExecutor pool = ThreadPoolExecutor(50) # 爬虫三部曲 # 1.发送请求 def get_html(url): print(f'start: {url}...') response = 阅读全文
posted @ 2020-01-02 19:09 chanyuli 阅读(208) 评论(0) 推荐(0)
摘要: ```python import requests import re headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36' } # 爬虫三部曲 # 1.发送请求 def get 阅读全文
posted @ 2020-01-02 19:06 chanyuli 阅读(263) 评论(2) 推荐(0)
摘要: reques补充 Response的属性 bs4五种过滤器 阅读全文
posted @ 2020-01-02 19:05 chanyuli 阅读(232) 评论(0) 推荐(0)