2019 年 4月 5 日随笔档案 - 就俗人一个

2019年4月5日

摘要： import re import json import requests from multiprocessing import Pool from requests.exceptions import RequestException def get_one_page(url): """ 获取单阅读全文

posted @ 2019-04-05 14:19 就俗人一个阅读(138) 评论(0) 推荐(0) 编辑

爬虫之Selenium库

摘要：官方文档：https://selenium-python.readthedocs.io/ Selenium：自动化测试工具，支持多种浏览器。爬虫中主要用来解决JavaScript渲染的问题。一、开始基本使用 from selenium import webdriver from selenium 阅读全文

posted @ 2019-04-05 09:17 就俗人一个阅读(272) 评论(0) 推荐(0) 编辑

爬虫之pyquery库

摘要：官方文档：https://pyquery.readthedocs.io/en/latest/ PyQuery是一个强大又灵活的网页解析库。如果你觉得正则写起来太麻烦、BeautifulSoup语法太难记，而你熟悉jQury的语法，那么PyQuery就是你的绝佳选择。一、开始字符串初始化： URL 阅读全文

posted @ 2019-04-05 07:53 就俗人一个阅读(262) 评论(0) 推荐(0) 编辑

爬虫之BeautifulSoup库

摘要：文档：https://beautifulsoup.readthedocs.io/zh_CN/latest/ 一、开始解析库基本使用 html = """ <html> <head> <title>The Dormouse's story</title> </head> <body> <p cla 阅读全文

posted @ 2019-04-05 06:15 就俗人一个阅读(260) 评论(0) 推荐(0) 编辑

爬虫之Requests库

摘要：官方文档：http://cn.python-requests.org/zh_CN/latest/ 一、引子各种请求方式：二、请求 GET请求基本写法：带参数get请求：解析json：获取二进制数据：添加请求头： POST请求基本操作：添加请求头：三、响应响应属性：状态码判断：阅读全文

posted @ 2019-04-05 00:18 就俗人一个阅读(246) 评论(0) 推荐(0) 编辑

公告