随笔分类 - 网络爬虫
摘要:①进入要创建项目的路径 ②scrapy startproject qsbk(项目名),创建爬虫scrapy genspider -t crawl wxapp_spider(爬虫名) "http://www.wxapp-union.com/" (域名) ③pycahrm打开项目
阅读全文
摘要:# 2-快速入门 ## 安装和文档: 1. 安装:通过`pip install scrapy`即可安装。 2. Scrapy官方文档:http://doc.scrapy.org/en/latest 3. Scrapy中文文档:http://scrapy-chs.readthedocs.io/zh_C
阅读全文
摘要:1.安装scrapy框架:pip install scrapy 2.使用cmd窗口命令创建项目: ①进入要创建项目的路径 ②scrapy startproject qsbk(项目名),创建爬虫scrapy genspider qsbk_sqider ③pycharm下打开刚才创建的项目 ④修改set
阅读全文
摘要:1.tesseract import pytesseract from PIL import Image pytesseract.pytesseract.tesseract_cmd=r"H:\Python\Tesseract_dev20170510\Tesseract-OCR\tesseract.e
阅读全文
摘要:1.正常使用cookie爬取拉勾网ajax数据 import requests from lxml import etree import time import re headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) A
阅读全文
摘要:1.获取cookie信息 from selenium import webdriver driver=webdriver.Firefox() driver.get("https://www.baidu.com") for cookie in driver.get_cookies(): print(c
阅读全文
摘要:from selenium import webdriver from selenium.webdriver.common.by import By #下载后的驱动放到火狐浏览器的根目录 #设置环境变量后就可以引用 driver=webdriver.Firefox() driver.get("htt
阅读全文
摘要:import threading import random import time gMoney = 1000 gLock = threading.Lock() gTotalTimes = 10 gTimes = 0 class Producer(threading.Thread): def ru
阅读全文
摘要:import requests from lxml import etree from urllib import request import os from queue import Queue import threading class Procuder(threading.Thread):
阅读全文
摘要:1.dump import json persons=[ { 'username':"wangchenyang", 'age':14, 'country':"china" }, { 'username':"王晨阳", 'age':14, 'country':"china" } ] # json_st
阅读全文
摘要:1.读取 import csv def read_csv_demo1(): with open('stock.csv','r') as fp: # reader是一个迭代器 reader=csv.reader(fp) next(reader) for x in reader: name=x[3] v
阅读全文
摘要:import requests import re from lxml import etree headers={ "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) C
阅读全文
摘要:from bs4 import BeautifulSoup import requests from pyecharts import Bar headers={ "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.3
阅读全文
摘要:from bs4 import BeautifulSoup html=""" <html> <head> <title>表格标签学习</title> <meta charset="UTF-8"/> <pre> 表格标签学习: table :声明一个表格 tr:声明一行,设置行高及改行所有单元格的高度
阅读全文
摘要:from bs4 import BeautifulSoup html=""" <html> <head> <title>表格标签学习</title> <meta charset="UTF-8"/> <pre> 表格标签学习: table :声明一个表格 tr:声明一行,设置行高及改行所有单元格的高度
阅读全文
摘要:import requests from lxml import etree url_domain="https://www.dytt8.net" headers={ "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537
阅读全文
摘要:import requests from lxml import etree url="https://www.piaohua.com/" headers={ "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
阅读全文
摘要:from lxml import etree parser=etree.HTMLParser(encoding="utf-8") html=etree.parse("test.html",parser=parser) html2=etree.parse("lagou.html",parser=par
阅读全文
摘要:from lxml import etree text=""" <html> <head> <title>表格标签学习</title> <meta charset="UTF-8"/> <pre> 表格标签学习: table :声明一个表格 tr:声明一行,设置行高及改行所有单元格的高度. th:声明
阅读全文
摘要:import requests#1.获取cookiesresp=requests.get("http://www.baidu.com")print(resp.cookies.get_dict())#2.sessiondapeng_url="http://www.renren.com/88015124
阅读全文

浙公网安备 33010602011771号