python爬虫 - 随笔分类 - 疯陈演义

抓取美女图片

摘要：import urllib.request import re,os url = 'http://pic.yesky.com/' html = urllib.request.urlopen(url).read() html = html.decode('gbk') pattern = re.compile(r'shtml"><img src="(.*?)" alt=\'(.*?)\' onc... 阅读全文

posted @ 2017-05-22 08:32 疯陈演义阅读(595) 评论(0) 推荐(0)

python re模块正则表达式

摘要：正则表达式模式模式字符串使用特殊的语法来表示一个正则表达式：字母和数字表示他们自身。一个正则表达式模式中的字母和数字匹配同样的字符串。多数字母和数字前加一个反斜杠时会拥有不同的含义。标点符号只有被转义时才匹配自身，否则它们表示特殊的含义。反斜杠本身需要使用反斜杠转义。由于正则表达式通常都阅读全文

posted @ 2017-05-19 10:00 疯陈演义阅读(470) 评论(0) 推荐(0)

正则表达式分组、断言详解

摘要：正则表达式中的断言，作为高级应用出现，倒不是因为它有多难，而是概念比较抽象，不容易理解而已，今天就让小菜通俗的讲解一下。如果不用断言，以往用过的那些表达式，仅仅能获取到有规律的字符串，而不能获取无规律的字符串。举个例子，比如html源码中有<title>xxx</title>标签，用以前的知识，阅读全文

posted @ 2017-04-10 12:58 疯陈演义阅读(252) 评论(0) 推荐(0)

新浪股票数据接口

摘要：股票数据的获取目前有如下两种方法可以获取:1. http/javascript接口取数据2. web-service接口1.http/javascript接口取数据1.1Sina股票数据接口以大秦铁路（股票代码：601006）为例，如果要获取它的最新行情，只需访问新浪的股票数据接口：http://h 阅读全文

posted @ 2017-03-31 12:09 疯陈演义阅读(4941) 评论(0) 推荐(1)

查询网页编码方式

摘要：import urllib.request import chardet TestData = urllib.request.urlopen('http://www.baidu.com/').read() print(chardet.detect(TestData)) 阅读全文

posted @ 2017-03-31 12:01 疯陈演义阅读(303) 评论(0) 推荐(0)

搜狐股票动态数据的抓取

摘要：#http://q.stock.sohu.com/cn/000078/lshq.shtml 阅读全文

posted @ 2017-03-31 10:36 疯陈演义阅读(790) 评论(0) 推荐(0)

利用网易获取所有股票数据

摘要：import urllib.request import re ##def downback(a,b,c): ## '''' ## a:已经下载的数据块 ## b:数据块的大小 ## c:远程文件的大小 ## ''' ## per = 100.0 * a * b / c ## if per > 100 : ## per = 100 ## ... 阅读全文

posted @ 2017-01-01 12:14 疯陈演义阅读(11148) 评论(0) 推荐(0)

利用东方财富网获取股票代码

摘要：import urllib.request import re stock_CodeUrl = 'http://quote.eastmoney.com/stocklist.html' #获取股票代码列表 def urlTolist(url): allCodeList = [] html = urllib.request.urlopen(url).read() html... 阅读全文

posted @ 2016-12-25 13:41 疯陈演义阅读(4892) 评论(7) 推荐(0)

双色球小助手

摘要：500w.py pyCyp.py 阅读全文

posted @ 2016-11-11 08:51 疯陈演义阅读(3338) 评论(0) 推荐(0)

爬取图片

摘要：import re import urllib.request as ur import time import os import threading from urllib.error import URLError, HTTPError folerpath = '169mm' def gethtml(url): try: req = ur.Request(u... 阅读全文

posted @ 2016-09-08 21:22 疯陈演义阅读(787) 评论(0) 推荐(0)

pythn抓取网页小例子

摘要：import urllib.request import re from tkinter import * win = Tk() win.geometry('500x300+400+300') t = Text(win) t.pack() url = 'http://stock.sohu.com/news/' html = urllib.request.urlopen(url).read... 阅读全文

posted @ 2016-09-01 08:33 疯陈演义阅读(376) 评论(0) 推荐(0)

PYTHON代理IP

摘要：import urllib.request url = 'http://www.whatismyip.com.tw/' proxy_support = urllib.request.ProxyHandler({'HTTP':'180.104.62.22:9000'}) opener = urllib.request.build_opener(proxy_support) opener.ad... 阅读全文

posted @ 2016-05-15 13:58 疯陈演义阅读(240) 评论(0) 推荐(0)

有道翻译爬虫

摘要：import urllib.request import urllib.parse import json content = input('请输入要翻译的内容:') url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=http://www.... 阅读全文

posted @ 2016-05-15 13:27 疯陈演义阅读(291) 评论(0) 推荐(0)

用Python抓取指定页面

摘要：#encoding:UTF-8 import urllib.request url = "http://www.baidu.com" data = urllib.request.urlopen(url).read() data = data.decode('UTF-8') print(data) 阅读全文

posted @ 2016-05-06 13:38 疯陈演义阅读(241) 评论(0) 推荐(0)

疯陈演义

随笔分类 - python爬虫

公告