爬虫 - 随笔分类 - 夜晚的潜水艇

摘要：1 先将MongoDB的bin路径添加到环境变量中 2 打开cmd输入mongod 开启MongoDB服务器 3 输入mongo开启MongoDB客户端阅读全文

posted @ 2019-06-03 08:32 夜晚的潜水艇阅读(3032) 评论(0) 推荐(0)

摘要：selenium: 是自动化测试工具，我们可以用它来进行爬虫。可以驱动浏览器，执行自定义好的任务。可以执行js代码执行速度慢，效率低。一般用于做登录的认证基本选择器: find_element_by_id() # 根据id查找标签 find_element_by_class_name() 阅读全文

posted @ 2019-06-01 21:00 夜晚的潜水艇阅读(189) 评论(0) 推荐(0)

beautifhulsoup4的使用

摘要：Beautiful: - 基本使用 from bs4 import BeautifulSoup 解析库: BeautifulSoup4 - 安装: - 解析库安装 pip3 install beautifulsoup4 - 解析器安装 pip3 install lxml - 基本使用 - 导入模块阅读全文

posted @ 2019-06-01 20:58 夜晚的潜水艇阅读(201) 评论(0) 推荐(0)

浅谈scrapy框架安装使用

摘要：Scrapy笔记: 一安装: pip3 install wheel pip3 install lxml pip3 install pyopenssl pip3 install -i https://mirrors.aliyun.com/pypi/simple/ pypiwin32 下载文件(twi 阅读全文

posted @ 2019-06-01 20:57 夜晚的潜水艇阅读(173) 评论(0) 推荐(0)

自动登录点赞评论抽屉网

摘要：import requests from bs4 import BeautifulSoup #先向主页发送请求，获取点赞目标用户id headers = { "user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537 阅读全文

posted @ 2019-05-30 16:28 夜晚的潜水艇阅读(324) 评论(0) 推荐(0)

破解博客园滑动验证码

摘要：from selenium import webdriver from selenium.webdriver import ChromeOptions from selenium.webdriver import ActionChains from selenium.webdriver.common.keys import Keys import random from PIL import I... 阅读全文

posted @ 2019-05-29 20:50 夜晚的潜水艇阅读(403) 评论(2) 推荐(2)

爬取某东商品信息

摘要：from selenium import webdriver from selenium.webdriver import ChromeOptions from selenium.webdriver import ActionChains from selenium.webdriver.common 阅读全文

posted @ 2019-05-29 18:29 夜晚的潜水艇阅读(171) 评论(0) 推荐(0)

异步线程池爬取校花网视频

摘要：import re import requests response = requests.get("http://www.xiaohuar.com/v/") url_s = re.findall('.*?href="(.*?)"',response.text,re.S) for url in url_s: res = requests.get(url) result =... 阅读全文

posted @ 2019-05-27 20:22 夜晚的潜水艇阅读(178) 评论(0) 推荐(0)

爬虫之爬取求职小网站

摘要：import requestsform bs4 import BeautifulSoup 阅读全文

posted @ 2019-05-07 11:12 夜晚的潜水艇阅读(522) 评论(0) 推荐(0)

飞翔的浮士德

随笔分类 - 爬虫

公告