python html process - 随笔分类 - 怒杀神

win下命令行替代品Cmder

摘要：Cmder简单使用小结 Cmder是一款Windows环境下非常简洁美观易用的cmd替代者，它支持了大部分的Linux命令。从官网下载下来一个zip安装包，解压之后运行根目录的Cmder.exe即可。但是此时会有两个问题，一是ls命令不支持中文，二就是中文提示会有字体重叠现象。 1、解决中文乱码问阅读全文

posted @ 2016-08-25 12:50 怒杀神阅读(766) 评论(0) 推荐(0)

python windows安装

摘要：一、下载并安装下载地址http://www.python.org/download/安装二、配置环境变量配置python环境变量以便后面安装插件。D:\Program Files\Python27;D:\Program Files\Python27\Scripts三、配置下pip的环境变量安装虚拟工... 阅读全文

posted @ 2015-03-01 08:29 怒杀神阅读(241) 评论(0) 推荐(0)

[php/html/CSS]给Aptana3 安装 Emmet插件

摘要：aptana studio3 安装 zencoding（Emmet）插件zen coding 更名为Emmetemmet谷歌主页地址：http://code.google.com/p/zen-coding/ emmet官方 aptana 插件地址https://github.com/sergech... 阅读全文

posted @ 2015-01-17 11:13 怒杀神阅读(274) 评论(0) 推荐(0)

[div+css]竖排菜单

摘要：12345Menu64647484950首页51音乐MP352个人相册53我的博客54我的空间55565758 阅读全文

posted @ 2014-12-23 13:20 怒杀神阅读(3934) 评论(0) 推荐(0)

[转载]Emmet使用手册

摘要：转载地址:http://www.w3cplus.com/tools/emmet-cheat-sheet.html 介绍 Emmet (前身为 Zen Coding) 是一个能大幅度提高前端开发效率的一个工具: 基本上，大多数的文本编辑器都会允许你存储和重用一些代码块，我们称之为“片段... 阅读全文

posted @ 2014-12-22 08:51 怒杀神阅读(462) 评论(0) 推荐(0)

「LAMP」在ubuntu及其衍生版上安装LAMP

摘要：在Ubuntu上安装LAMP此种方法在Linux Mint 13/14/15/16/17、Ubuntu 12.10（Quantal Quetzal）和Ubuntu 13.04 Raring Ringtail上屡试不爽。sudo apt-get install lamp-server^ 测试Apach... 阅读全文

posted @ 2014-05-29 14:02 怒杀神阅读(278) 评论(0) 推荐(0)

【sublime】在终端下手动安装sublime text 2

摘要：Sublime2下载地址：http://www.sublimetext.com/downloadstep.1 解压下载的压缩包tar xf Sublime\ Text\ 2.0.2.tar.bz2 # 开始以外打错了，\ 表示空格，明白？ step.2 把释放的内容移动到合适的位置，这里是/usr... 阅读全文

posted @ 2014-04-25 10:33 怒杀神阅读(545) 评论(0) 推荐(0)

【bs4】安装beautifulsoup

摘要：Debian/Ubuntu,install$ apt-get install python-bs4easy_install/pip$ easy_install beautifulsoup4$ pip install beautifulsoup4安装第三方分析器bs4只有py2的代码，安装在py3下会很麻烦bs4支持HTML parser，也可以支持第三方的分析器lxml$ apt-get install python-lxml$ easy_install lxml$ pip install lxmlhtml5lib$ apt-get install python-html5lib$ easy_ 阅读全文

posted @ 2014-01-13 21:22 怒杀神阅读(6693) 评论(0) 推荐(0)

【py分析网页】可能有用的-re去除网页上的杂碎

摘要：def remove_js_css (content): """ remove the the javascript and the stylesheet and the comment content ( and ) """ r = re.compile(r'''''',re.I|re.M|re.S) s = r.sub ('',content) r = re.compile(r'''''',re.I|re.M|re.S) 阅读全文

posted @ 2014-01-12 21:21 怒杀神阅读(411) 评论(0) 推荐(0)

【pyQuery】抓取startup news首页

摘要：#! /usr/bin/python# coding: utf-8from pyquery import PyQueryc=PyQuery('http://news.dbanotes.net/')titles=c.find('.title') for t in titles: title=c(t).find('a') t1=title('a').text() h1=title('a').attr('href') if t1!=None: print t1,'\n\t',h1 阅读全文

posted @ 2014-01-12 20:45 怒杀神阅读(216) 评论(0) 推荐(0)

【pyQuery分析论坛】精英乒乓论坛

摘要：In [25]: t= h('table')In [26]: In [26]: t('.mainbox').text()Out[26]: u'\u72b6\u6001 \u4e3b\u9898 \u4f5c\u8005 \u56de\u590d / \u4eba\u6c14 \u6700\u540e\u66f4\u65b0 \u663e\u793a\u56fa\u9876\u4e3b\u9898\u5f00\u59cb \u633a\u62d4\u76f4\u901a\u5fb7\u56fd\u9009\u62d4\u8d5b \u7530\u603b\ 阅读全文

posted @ 2014-01-12 19:49 怒杀神阅读(2217) 评论(0) 推荐(0)

【pyQuery分析实例】分析体育网冠军联盟比赛成绩

摘要：目标地址：http://www.espncricinfo.com/champions-league-twenty20-2012/engine/match/574265.htmlliz@nb-liz:~$ script pyquery.log2Script started, file is pyquery.log2liz@nb-liz:~$ ipythonPython 2.7.3 (default, Jan 2 2013, 16:53:07) Type "copyright", "credits" or "license" for mo 阅读全文

posted @ 2014-01-12 17:26 怒杀神阅读(417) 评论(0) 推荐(0)

【PyQuery】PyQuery总结

摘要：pyquery库是jQuery的Python实现，可以用于解析HTML网页内容，官方文档地址是：http://packages.python.org/pyquery/。二、使用方法?1from pyquery import PyQuery as pq可加载一段HTML字符串，或一个HTML文件，或是一个url地址，例：?123d=pq("hello")d=pq(filename=path_to_html_file)d=pq(url='http://www.baidu.com')注意：此处url似乎必须写全html()和text() ——获取相应的HTML块或阅读全文

posted @ 2014-01-12 16:37 怒杀神阅读(621) 评论(0) 推荐(0)

【py分析】

摘要：pyQuerypyQuery是 jQuery 在 python 中的实现，能够以 jQuery 的语法来操作解析 HTML 文档，十分方便。使用前需要安装，easy_install pyquery 即可，或者 Ubuntu 下sudo apt-get install python-pyquery以下例子：from pyquery import PyQuery as pyqdoc=pyq(url=r'http://list.taobao.com/browse/cat-0.htm')cts=doc('.market-cat')for i in cts: print 阅读全文

posted @ 2014-01-12 15:33 怒杀神阅读(474) 评论(0) 推荐(0)

【py网页】sitecopy代码

摘要：001 #coding:utf-8002 import re,os,shutil,sys003 import urllib2,socket,cookielib004 from threading import Thread,stack_size,Lock005 from Queue import Queue006 import time007 from gzip import GzipFile008 from StringIO import StringIO009 010 class ContentEncodingProcessor(urllib2.BaseHandler):011 " 阅读全文

posted @ 2014-01-08 23:22 怒杀神阅读(560) 评论(0) 推荐(0)

【py分析】使用SGMLParser分析淘宝html

摘要：SGMLParserPython 默认自带 HTMLParser 以及 SGMLParser 等等解析器，前者实在是太难用了，我就用 SGMLParser 写了一个示例程序：import urllib2from sgmllib import SGMLParserclass ListName(SGMLParser): def __init__(self): SGMLParser.__init__(self) self.is_h4 = "" self.name = [] def start_h4(self, attrs): self.is_h4 = 1 def end_h4(s 阅读全文

posted @ 2014-01-08 23:08 怒杀神阅读(2461) 评论(0) 推荐(0)

【py技巧】使用reload重导入修改过的包或模块

摘要：#使用import导入import my_modulemy_module.something() #out - orignal#这里修改输出 - changedreload(my_module)my_module.something() #out - changed#使用from import导入import my_module #这个需要有，否则不能reloadfrom my_module import somethingsomething() # out - orignal# 修改输出为 changed##注意这里用reload不好使，咋办##需要在第3行前面加入import my_mod 阅读全文

posted @ 2014-01-08 23:03 怒杀神阅读(494) 评论(0) 推荐(0)

【ipython技巧】使用shell命令

摘要：在ipython终端时，可能临时需要使用shell命令进行简单处理；可以在shell命令前面使用！（感叹号）比如在win7，ipython下想要使用sublime新建一个py，可以这样！subl.exe .py 阅读全文

posted @ 2014-01-08 22:38 怒杀神阅读(533) 评论(0) 推荐(1)

【py网页】urllib.urlretrieve远程下载

摘要：下面我们再来看看urllib模块提供的 urlretrieve() 函数。urlretrieve() 方法直接将远程数据下载到本地。1>>>help(urllib.urlretrieve)2Helpon function urlretrieveinmodule urllib:34urlretrieve(url, filename=None, reporthook=None, data=None)参数 finename 指定了保存本地路径（如果参数未指定，urllib会生成一个临时文件保存数据。）参数 reporthook 是一个回调函数，当连接上服务器、以及相应的数据块传输完阅读全文

posted @ 2014-01-08 21:59 怒杀神阅读(3218) 评论(0) 推荐(1)

【py网页】urlopen的补充，完美

摘要：urllib 是 python 自带的一个抓取网页信息一个接口，他最主要的方法是urlopen()，是基于 python 的 open() 方法的。下面是主要说明：1urllib.urlopen('网址')这里传入urlopen()的参数有特别说要求，要遵循一些网络协议，比如http,ftp,也就是说，在网址的开头必须要有http://这样的说明，如：urllib.urlopen('http://www.baidu.com')。要么就是本地文件，本地文件需要使用file关键字，比如 urllib.urlopen('file:nowamagic.py 阅读全文

posted @ 2014-01-08 21:19 怒杀神阅读(875) 评论(0) 推荐(0)

怒杀神殿

这里是我家

随笔分类 - python html process

公告