随笔分类 - python爬虫
摘要:1.配置信息 3.spider 4.中间件 5.管道(存储到mongo中)
        阅读全文
                
摘要:import re from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait fr...
        阅读全文
                
摘要:1 from pyquery import PyQuery as pq 2 3 # url初始化 4 # html = '' 5 # doc = pq(html) 6 url = 'https://www.baidu.com' 7 doc = pq(url=url) 8 print(doc('hea
        阅读全文
                
摘要:1 import requests 2 import re 3 import json 4 from requests.exceptions import RequestException 5 from multiprocessing import Pool 6 7 # 获取网页 8 def get_one_page(url): 9 headers = { 10 ...
        阅读全文
                
摘要:1.re实现 1 import requests 2 from requests.exceptions import RequestException 3 import re,json 4 import xlwt,xlrd 5 6 # 数据 7 DATA = [] 8 KEYWORD = 'pyth
        阅读全文
                
                    posted @ 2018-07-27 02:24  
Ray_chen
    
                
            
摘要:1.re实现 1 import re,os 2 import requests 3 from requests.exceptions import RequestException 4 5 MAX_PAGE = 10 #最大页数 6 KEYWORD = 'python' 7 headers = { 
        阅读全文
                
 
                    
                     
                    
                 
                    
                
 
         浙公网安备 33010602011771号
浙公网安备 33010602011771号