Python Crawler - 随笔分类 - RomanticChopin

python爬虫：BeautifulSoup库find_all ()、find()方法详解

摘要：find()和findAll()官方定义如下： findAll(tag, attributes, recursive, text, limit, keywords) find(tag, attributes, recursive, text, keywords) 唯一区别： *find()返回的是第阅读全文

posted @ 2019-08-08 12:57 RomanticChopin 阅读(5533) 评论(0) 推荐(0)

python爬虫实战：爬取股票信息，对上交所和深交所所有的股票信息进行搜集

摘要：要用到两个网站： 1.获取所有股票的名称的网址（这里指上交所和深交所的股票） https://www.banban.cn/gupiao/list_sz.html 2.获取单个股票的各类信息 https://gupiao.baidu.com/stock/股票名称.html ''' 要用到两个网站： 1 阅读全文

posted @ 2019-08-07 20:41 RomanticChopin 阅读(1415) 评论(0) 推荐(0)

python爬虫实战：爬取西刺代理网站，获取免费的代理IP

摘要：爬取的网站链接：西刺网站 import requests import chardet import random import time from bs4 import BeautifulSoup from telnetlib import Telnet import progressbar us 阅读全文

posted @ 2019-08-04 15:08 RomanticChopin 阅读(414) 评论(0) 推荐(0)

python爬虫：BeautifulSoup 库的基本函数用法及框架

摘要：安装： Win平台: “以管理员身份运行”cmd 执行 pip install beautifulsoup4 Beautiful Soup 库的理解： Beautiful Soup 库解析器： Beautiful Soup 库的基本元素：基于bs4库的HTML内容遍历方法：下行遍历： soup 阅读全文

posted @ 2019-08-03 19:41 RomanticChopin 阅读(1934) 评论(0) 推荐(1)

python爬虫：requests库的基本方法函数及运用框架

摘要：安装： Win 平台：“以管理员身份运行” cmd，执行 pip install requests 小测： >>>import requests >>>r=requests.get("http://www.baidu.com") >>>print(r.status_code) 200 >>>r.te 阅读全文

posted @ 2019-08-03 17:08 RomanticChopin 阅读(274) 评论(0) 推荐(0)

python爬虫实战：爬取中国大学排名网站的 2019年中国大学排名情况

摘要：爬取这个网页：软科中国最好大学排名2019 #采用bs4--Beautiful库实现 import requests from bs4 import BeautifulSoup import bs4 #得到网页内容 def getHTMLText(url): try: r = requests.ge 阅读全文

posted @ 2019-08-03 16:21 RomanticChopin 阅读(383) 评论(0) 推荐(0)

python爬虫防止IP被封的一些措施（伪造User-Agent ，在每次重复爬取之间设置一个随机时间间隔，伪造cookies ,使用代理）

摘要：转载于：转载地址伪造User-Agent 在请求头中把User-Agent设置成浏览器中的User-Agent，来伪造浏览器访问。比如： send_headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleW 阅读全文

posted @ 2019-08-02 11:26 RomanticChopin 阅读(600) 评论(0) 推荐(0)

Python3 爬虫之增加csdn访问量源代码（直接拿去就能用）

摘要：#需要Pycharm，然后安装相应的库，才能运行此代码，具体要安装的库，你看运行报错就行了 import re import time import random import requests import urllib.request from bs4 import BeautifulSoup 阅读全文

posted @ 2019-02-12 13:03 RomanticChopin 阅读(175) 评论(0) 推荐(0)

Python3 使用 urllib 包访问Web网站

摘要：import urllib.request #引入程序包 url="http://127.0.0.1:5000" #输入你要查询数据的网站，可以在引号里面更换网址 html=urllib.request.urlopen(url) #打开网址为url的网址 html=html.read() #读取该网阅读全文

posted @ 2019-02-05 16:10 RomanticChopin 阅读(257) 评论(0) 推荐(0)

Romantic Chopin in C Sharp Minor

Love Yourself ！

随笔分类 - Python Crawler

公告