随笔分类 - 爬虫及Scrapy
摘要:import requests,json from bs4 import BeautifulSoup import pandas aa=['''http://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=direct&pcevaname=pc4.1&qt=con&from=webmap&c=131&wd=%E5%8...
阅读全文
摘要:import requests from bs4 import BeautifulSoup import pymongo from multiprocessing.dummy import Pool as ThreadPool headers = {'User-Agent':'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) Apple...
阅读全文
摘要:http://cuiqingcai.com/4020.html
阅读全文
摘要:import requests from bs4 import BeautifulSoup import pandas as pd import gevent from gevent import monkey;monkey.patch_all() import time import re import random UA_list = [ 'Mozilla/5.0 (Windows NT ...
阅读全文
摘要:import scrapy import urllib.request from scrapy.http import Request,FormRequest class LoginspdSpider(scrapy.Spider): name = "loginspd" allowed_domains = ["douban.com"] start_urls = ['htt...
阅读全文
摘要:item.py pipeline.py spd.py
阅读全文
摘要:rules = [ Rule(SgmlLinkExtractor(allow=('/u012150179/article/details'), restrict_xpaths=('//li[@class="next_article"]')), callback='parse_ite...
阅读全文
摘要:放入项目目录,配置setting.py
阅读全文
摘要:class CsvspiderSpider(CSVFeedSpider): name = 'csvspider' allowed_domains = ['iqianyue.com'] start_urls = ['http://iqianyue.com/feed.csv'] headers = ['id', 'name', 'description', 'imag...
阅读全文
摘要:from scrapy.spiders import XMLFeedSpider from myxml.items import MyxmlItem class XmlspiderSpider(XMLFeedSpider): name = 'xmlspider' allowed_domains = ['sina.com.cn'] start_urls = ['http:...
阅读全文
摘要:import scrapy from Autopjt.items import myItem from scrapy.http import Request class AutospdSpider(scrapy.Spider): name = "fulong_spider" start_urls =
阅读全文
摘要:Session操作
阅读全文
摘要:setting.py main.py items.py dbbook.py
阅读全文
摘要:1 #!/usr/bin/env python 2 # -*- coding:utf-8 -*- 3 import requests 4 from bs4 import BeautifulSoup 5 import pandas 6 def gethousedetail(url): 7 info ={} 8 res = requests.get(url) 9 ...
阅读全文
摘要:1 import requests 2 import json 3 import time 4 url = 'http://weibo.com/aj/message/add?ajwvr=6' 5 headers = { 6 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, ...
阅读全文
摘要:1 from selenium import webdriver 2 import time 3 4 driver = webdriver.PhantomJS(executable_path="D:/phantomjs/bin/phantomjs.exe") 5 driver.get("http://study.163.com/course/courseMain.htm?course...
阅读全文
摘要:1 from email.header import Header 2 from email.mime.text import MIMEText 3 from email.utils import parseaddr,formataddr 4 import smtplib 5 from email.mime.multipart import MIMEMultipart 6 from e...
阅读全文

浙公网安备 33010602011771号