爬虫及Scrapy - 随笔分类(第2页) - Erick-LONG

百度地图商家爬虫

摘要：import requests,json from bs4 import BeautifulSoup import pandas aa=['''http://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=direct&pcevaname=pc4.1&qt=con&from=webmap&c=131&wd=%E5%8... 阅读全文

posted @ 2017-08-04 12:39 Erick-LONG 阅读(2661) 评论(0) 推荐(0)

顺企网爬取16W数据保存到Mongodb

摘要：import requests from bs4 import BeautifulSoup import pymongo from multiprocessing.dummy import Pool as ThreadPool headers = {'User-Agent':'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) Apple... 阅读全文

posted @ 2017-06-16 16:13 Erick-LONG 阅读(734) 评论(0) 推荐(0)

Scrapy 分布式爬虫

摘要：http://cuiqingcai.com/4020.html 阅读全文

posted @ 2017-06-12 18:09 Erick-LONG 阅读(167) 评论(0) 推荐(0)

慧聪网爬虫

摘要：import requests from bs4 import BeautifulSoup import pandas as pd import gevent from gevent import monkey;monkey.patch_all() import time import re import random UA_list = [ 'Mozilla/5.0 (Windows NT ... 阅读全文

posted @ 2017-06-05 15:01 Erick-LONG 阅读(693) 评论(0) 推荐(0)

scrapy 模拟登陆

摘要：import scrapy import urllib.request from scrapy.http import Request,FormRequest class LoginspdSpider(scrapy.Spider): name = "loginspd" allowed_domains = ["douban.com"] start_urls = ['htt... 阅读全文

posted @ 2017-05-11 16:10 Erick-LONG 阅读(293) 评论(1) 推荐(0)

scrapy 博客爬取

摘要：item.py pipeline.py spd.py 阅读全文

posted @ 2017-05-11 15:13 Erick-LONG 阅读(231) 评论(0) 推荐(0)

scrapy 数据存储mysql

摘要：pipeline item 阅读全文

posted @ 2017-05-10 17:29 Erick-LONG 阅读(1732) 评论(0) 推荐(0)

scrapy crawl rules设置

摘要：rules = [ Rule(SgmlLinkExtractor(allow=('/u012150179/article/details'), restrict_xpaths=('//li[@class="next_article"]')), callback='parse_ite... 阅读全文

posted @ 2017-05-10 16:05 Erick-LONG 阅读(786) 评论(0) 推荐(0)

scrapy 避免被ban

摘要：UA池阅读全文

posted @ 2017-05-10 15:05 Erick-LONG 阅读(524) 评论(0) 推荐(0)

scrapy crawl 源码修改爬虫多开

摘要：放入项目目录，配置setting.py 阅读全文

posted @ 2017-05-10 14:19 Erick-LONG 阅读(660) 评论(0) 推荐(0)

scrapy csvfeed spider

摘要：class CsvspiderSpider(CSVFeedSpider): name = 'csvspider' allowed_domains = ['iqianyue.com'] start_urls = ['http://iqianyue.com/feed.csv'] headers = ['id', 'name', 'description', 'imag... 阅读全文

posted @ 2017-05-10 13:51 Erick-LONG 阅读(320) 评论(0) 推荐(0)

scrapy crawl xmlfeed spider

摘要：from scrapy.spiders import XMLFeedSpider from myxml.items import MyxmlItem class XmlspiderSpider(XMLFeedSpider): name = 'xmlspider' allowed_domains = ['sina.com.cn'] start_urls = ['http:... 阅读全文

posted @ 2017-05-10 13:35 Erick-LONG 阅读(217) 评论(0) 推荐(0)

scrapy 修改URL爬取起始位置

摘要：import scrapy from Autopjt.items import myItem from scrapy.http import Request class AutospdSpider(scrapy.Spider): name = "fulong_spider" start_urls = 阅读全文

posted @ 2017-05-10 13:15 Erick-LONG 阅读(1692) 评论(0) 推荐(0)

scrapy 爬取当当网产品分类

摘要：pipeline部分 item部分阅读全文

posted @ 2017-05-10 13:01 Erick-LONG 阅读(558) 评论(0) 推荐(0)

Python3发送post请求，自动记住cookie

摘要：Session操作阅读全文

posted @ 2017-05-03 17:55 Erick-LONG 阅读(2500) 评论(0) 推荐(0)

豆瓣爬虫

摘要：setting.py main.py items.py dbbook.py 阅读全文

posted @ 2017-04-20 17:26 Erick-LONG 阅读(266) 评论(0) 推荐(0)

房天下爬虫

摘要：1 #！/usr/bin/env python 2 # -*- coding:utf-8 -*- 3 import requests 4 from bs4 import BeautifulSoup 5 import pandas 6 def gethousedetail(url): 7 info ={} 8 res = requests.get(url) 9 ... 阅读全文

posted @ 2017-04-19 21:46 Erick-LONG 阅读(1033) 评论(0) 推荐(0)

微博群发私信

摘要：1 import requests 2 import json 3 import time 4 url = 'http://weibo.com/aj/message/add?ajwvr=6' 5 headers = { 6 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, ... 阅读全文

posted @ 2017-04-19 13:52 Erick-LONG 阅读(2447) 评论(0) 推荐(0)

selenium 爬虫

摘要：1 from selenium import webdriver 2 import time 3 4 driver = webdriver.PhantomJS(executable_path="D:/phantomjs/bin/phantomjs.exe") 5 driver.get("http://study.163.com/course/courseMain.htm?course... 阅读全文

posted @ 2017-04-19 13:46 Erick-LONG 阅读(139) 评论(0) 推荐(0)

python自动发邮件

摘要：1 from email.header import Header 2 from email.mime.text import MIMEText 3 from email.utils import parseaddr,formataddr 4 import smtplib 5 from email.mime.multipart import MIMEMultipart 6 from e... 阅读全文

posted @ 2017-04-19 13:44 Erick-LONG 阅读(233) 评论(0) 推荐(0)

Erick - LONG

Be Patient! Be Positive! Be Persistence!

随笔分类 - 爬虫及Scrapy

公告