爬虫 - 随笔分类 - 糖饼好吃

Scrapy——1

摘要：python3安装scrapy:http://www.cnblogs.com/Wananbo/p/6093969.html 运行scrapy时，弹出win32错误，然后用pip install pywin32 不行，实际上是缺少了pypiwin32 阅读全文

posted @ 2017-03-21 11:40 糖饼好吃阅读(142) 评论(0) 推荐(0)

爬虫笔记

摘要：1.chorm浏览器F12-检查-复制想要的东东的copy selector 大概长这样：#page_list > ul > li:nth-child(1) > a > img 要使用Beautifulsoup： imgs = soup.select('#page_list > ul > li:nt 阅读全文

posted @ 2017-01-20 16:34 糖饼好吃阅读(135) 评论(0) 推荐(0)

爬虫练习

摘要：3. 4. 阅读全文

posted @ 2017-01-19 16:02 糖饼好吃阅读(322) 评论(0) 推荐(0)

爬虫视频

摘要：链接: http://pan.baidu.com/s/1geHABar 密码: 3tc6 解压密码：2cifang来自为知笔记(Wiz) 阅读全文

posted @ 2017-01-17 14:42 糖饼好吃阅读(89) 评论(0) 推荐(0)

多线程实例：

摘要：# -*- coding: utf-8 -*- import requests import urllib import os import threading import datetime gImageList = [] gCondition = threading.Condition() cl 阅读全文

posted @ 2017-01-17 14:41 糖饼好吃阅读(93) 评论(0) 推荐(0)

python实现简单爬虫功能（爬取一个页面的图片）

摘要：参考：http://www.cnblogs.com/fnng/p/3576154.html 一，获取整个页面数据首先我们可以先获取要下载图片的整个页面信息。 getjpg.py #coding=utf-8import urllib def getHtml(url): page = urllib.u 阅读全文

posted @ 2017-01-17 14:34 糖饼好吃阅读(407) 评论(0) 推荐(0)

用pyspider爬淘宝MM照片

摘要：#!/usr/bin/env python # -*- encoding: utf-8 -*- # Created on 2016-12-09 15:24:54 # Project: taobaomm from pyspider.libs.base_handler import * PAGE_START = 1 PAGE_END = 30 DIR_PATH = 'D:\mzitu\... 阅读全文

posted @ 2016-12-09 16:37 糖饼好吃阅读(463) 评论(0) 推荐(0)

爬虫实现模拟登陆豆瓣

摘要：一：获取页面然后返回验证码，自己填写验证码来模拟登陆（相当于手动模拟登陆）二：需要先登陆一次，获得你的登陆cookie，然后粘贴过来（cookie会过期，只能维持一小段时间）阅读全文

posted @ 2016-12-07 11:48 糖饼好吃阅读(505) 评论(0) 推荐(0)

pyspider爬豆瓣电影实例

摘要：直接copy官网实例会出现599的错误，百度了很久发现是因为证书的问题添加这一句忽略证书 validate_cert = False 代码如下： +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #!/usr/bin/e 阅读全文

posted @ 2016-12-06 11:51 糖饼好吃阅读(425) 评论(0) 推荐(0)

完善爬取糗百的段子

摘要：最后下载完第一页，第二页还是出不来。郁闷的是第一页能出来，说明调用 download()是成功执行了估计是糗百的还有其他的防爬虫吧，等以后再回来完善爬取”百思不得其姐”段子的前50页：阅读全文

posted @ 2016-11-24 17:06 糖饼好吃阅读(190) 评论(0) 推荐(0)

为自己的爬虫更换代理和HTML头部

摘要：import requestsimport reimport randomimport time class download(): def __init__(self): self.iplist = [] ##初始化一个list用来存放我们获取到的IP html = requests.get("h 阅读全文

posted @ 2016-11-24 16:56 糖饼好吃阅读(473) 评论(0) 推荐(0)

我的第一个爬虫（爬取糗百的段子）

摘要：写个这么简答的东西踩的坑有： os.chdir("D:\mzitu")f = open("111.txt", 'a') 一开始下面写的是D:\mzitu.111.txt，拼命的保存，就是看不到数据，我这暴脾气，简直不能忍。还有直接打印文本内容会出现乱码然后type()了一下发现是unicode，阅读全文

posted @ 2016-11-24 15:33 糖饼好吃阅读(261) 评论(0) 推荐(0)

糖饼好吃

随笔分类 - 爬虫

公告