python脚本学习 - 随笔分类 - 行之间

python爬虫学习笔记3

摘要：爬虫笔记3 设置日志 import logging # 设置日志输出样式 logging.basicConfig(level=logging.DEBUG,format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message 阅读全文

posted @ 2021-03-27 21:38 行之间阅读(716) 评论(0) 推荐(0)

python爬虫scrapy框架学习笔记2

摘要：scrapy框架学习课程概要 1.scrapy的基础概念 2.scrapy的工作流程 3.scrapy的入门使用 4.scrapy的深入 5.cralspider的使用为什么要学习scrapy？ requests+selenium可以解决90%的需求 scrapy不能解决剩下的10%的需求，但是它阅读全文

posted @ 2021-03-12 18:42 行之间阅读(586) 评论(0) 推荐(0)

python爬虫学习笔记1

摘要：爬虫概念：模拟浏览器，发送请求，获取相应作用：数据采集、软件测试、抢票、网站上的投票、网络安全（漏洞扫描）分类：爬取网站数量不同：通用爬虫，搜索引擎；聚焦爬虫，专门抓取一个或某一类网站数据是否以获取数据为目的：功能性爬虫，投票，点赞；数据增量爬虫，比如获取招聘信息；而数据增量爬虫又可以分为阅读全文

posted @ 2021-03-10 15:31 行之间阅读(1098) 评论(0) 推荐(0)

python字符串拼接顺序从快到慢

摘要：s1 = 'Hello' s2 = 'Python' f'{s1} {s2}'#fast，f-string s1 + ' ' + s2 ' '.join(s1, s2) '%s %s' % (s1, s2) '{} {}'.format(s1, s2) Template('$s1 $s2').sub 阅读全文

posted @ 2021-03-09 22:11 行之间阅读(174) 评论(0) 推荐(0)

scrapy框架post模拟登录Github

摘要：class GithubSpider(scrapy.Spider): name = 'github' allowed_domains = ['github.com'] start_urls = ['https://github.com/login'] def parse(self, response 阅读全文

posted @ 2021-03-09 21:20 行之间阅读(124) 评论(0) 推荐(0)

hash算法MD5

摘要：import hashlib hash算法其实就是给指定字符串一个唯一身份标识 data = 'python38' 创建hash对象 md5 = hashlib.md5() 向hash对象中添加需要做hash运算的字符串 md5.update(data.encode()) 获取字符串的hash值 r 阅读全文

posted @ 2021-03-08 20:56 行之间阅读(467) 评论(0) 推荐(0)

Python爬虫常用请求头User-Agent

摘要：USER_AGENTS = ['Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20130331 Firefox/21.0', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML 阅读全文

posted @ 2021-03-08 19:43 行之间阅读(603) 评论(0) 推荐(0)

python使用清华镜像源安装工具包selenium，解决直接pip install安装慢的问题

摘要：pip install --index https://mirrors.ustc.edu.cn/pypi/web/simple/ selenium # 清华镜像源阅读全文

posted @ 2021-03-06 13:01 行之间阅读(9962) 评论(0) 推荐(0)

python数据分析高频词提取，pyecharts词云制作并保存

摘要：import pandas as pd import jieba import jieba.analyse filename = "E:\\数据处理\\隐患类型.txt" #载入数据 df_data = pd.read_csv(filename, header=0, encoding='gbk', 阅读全文

posted @ 2020-06-11 10:40 行之间阅读(3974) 评论(0) 推荐(0)

pyecharts V1.x版本使用Map绘制地图修改主题背景色等

摘要：```python# -*- coding: utf-8 -*-"""@author: Dell Created on Mon Feb 3 11:22:25 2020"""from pyecharts.charts import Mapfrom pyecharts import options as optsfrom pyecharts.globals import ThemeType#主题# ... 阅读全文

posted @ 2020-02-03 12:42 行之间阅读(7742) 评论(0) 推荐(0)

设置随机请求头和使用代理

摘要：```python # -*- coding: utf-8 -*- """ 所有请求头的USER_AGENTS网址 http://www.useragentstring.com/pages/useragentstring.php?name=All """ import json import random import requests USER_AGENTS = [ 'Mozilla/5.0 ( 阅读全文

posted @ 2020-01-04 17:01 行之间阅读(1186) 评论(0) 推荐(0)

宝马5系图片分类下载自动创建文件夹并保存

摘要：```python import os import requests from lxml import etree from urllib import request headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en', 阅读全文

posted @ 2020-01-04 13:42 行之间阅读(233) 评论(0) 推荐(0)

opencv操作视频python

摘要：```python# -*- coding: utf-8 -*-"""@author: Dell Created on Fri Jan 3 13:00:41 2020opencv-python安装：pip install --default-timeout=1000 opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple/安装openc... 阅读全文

posted @ 2020-01-03 17:03 行之间阅读(275) 评论(0) 推荐(0)

利用协程框架，无界面浏览器爬取上海高院开庭数据

摘要：```python # -*- coding: utf-8 -*- """ @author: Dell Created on Thu Jan 2 11:16:08 2020 """ import gevent from gevent import monkey monkey.patch_all() from lxml import etree from selenium import webdri 阅读全文

posted @ 2020-01-02 13:02 行之间阅读(201) 评论(0) 推荐(0)

协程框架

摘要：```python import requests import gevent from gevent import monkey monkey.patch_all() headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en', ' 阅读全文

posted @ 2020-01-01 23:03 行之间阅读(289) 评论(0) 推荐(0)

多线程抓取邮箱

摘要：```python# -*- coding: utf-8 -*-"""@author: Dell Created on Sun Dec 29 17:26:43 2019"""import reimport timeimport queueimport threadingimport requestsdef getpagesource(url): """获取网页源码""" try: ... 阅读全文

posted @ 2019-12-29 21:56 行之间阅读(284) 评论(0) 推荐(0)

selenium操作下拉选和网页提示框

摘要：```python import time from selenium import webdriver from selenium.webdriver.support.select import Select#处理下拉框 from selenium.webdriver.support.ui import WebDriverWait#等待一个元素加载完成 from selenium.webdriv 阅读全文

posted @ 2019-12-24 20:35 行之间阅读(373) 评论(0) 推荐(0)

摘要：```python # -*- coding: utf-8 -*- """ @author: Dell Created on Tue Dec 24 12:33:56 2019 """ import time from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait#等待一个元素加载完阅读全文

posted @ 2019-12-24 12:53 行之间阅读(1301) 评论(0) 推荐(0)

抓取腾讯招聘python岗位

摘要：# -*- coding: utf-8 -*- """ @author: Dell Created on Mon Dec 23 17:55:06 2019 """ import re import time import requests from lxml import etree from se 阅读全文

posted @ 2019-12-23 20:11 行之间阅读(404) 评论(0) 推荐(0)

爬虫学习笔记整理一

摘要：tips 不论爬取哪个网页，都可以加上请求头信息 requests使用代理 import requests url = "http://httpbin.org/ip"#访问这个地址会返回访问者的ip地址 proxies = {'http':'119.39.68.252:8118'} resp = r 阅读全文

posted @ 2019-12-20 22:05 行之间阅读(500) 评论(0) 推荐(0)

随笔分类 - python脚本学习