上一页 1 ··· 162 163 164 165 166 167 168 169 170 ··· 198 下一页
摘要: # -*- coding: utf-8 -*- from bs4 import BeautifulSoup import re import os import urllib2 import urllib def download_img(urls,k): #urls = "http://tieba.baidu.com/p/4807867791" page = urllib2... 阅读全文
posted @ 2016-11-30 15:01 brady-wang 阅读(442) 评论(0) 推荐(0)
摘要: 本文实例讲述了python实现从URL地址提取文件名的方法。分享给大家供大家参考。具体分析如下: 如:地址为 http://www.jb51.net/images/logo.gif 要想从该地址提取logo.gif,只需要一句代码就可以搞定 import osurl = 'http://www.jb 阅读全文
posted @ 2016-11-30 14:06 brady-wang 阅读(858) 评论(0) 推荐(0)
摘要: # -*- coding: utf-8 -*- import re import urllib import os.path def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getImg(html,p): reg = r'<img src="(htt... 阅读全文
posted @ 2016-11-29 23:13 brady-wang 阅读(352) 评论(0) 推荐(0)
摘要: #coding:utf8 import urllib2 __author__ = 'wang' class HtmlDownloader(object): def download(self, url): if url is None: return None response = urllib2.urlopen(url) ... 阅读全文
posted @ 2016-11-29 22:46 brady-wang 阅读(964) 评论(0) 推荐(0)
摘要: #coding:utf8 __author__ = 'wang' class HtmlOutputer(object): def __init__(self): self.datas = []; def collect_data(self, data): if data is None: return ... 阅读全文
posted @ 2016-11-29 22:45 brady-wang 阅读(471) 评论(0) 推荐(0)
摘要: #coding:utf8 import urlparse from bs4 import BeautifulSoup import re __author__ = 'wang' class HtmlParser(object): def parse(self, page_url, html_cont): if page_url is None or html_con... 阅读全文
posted @ 2016-11-29 22:44 brady-wang 阅读(695) 评论(0) 推荐(0)
摘要: spider_main.py 阅读全文
posted @ 2016-11-29 22:42 brady-wang 阅读(715) 评论(0) 推荐(0)
摘要: #coding:utf8 class UrlManager(object): def __init__(self): self.new_urls = set() self.old_urls = set() def add_new_url(self, url): if url is None: return... 阅读全文
posted @ 2016-11-29 22:42 brady-wang 阅读(904) 评论(0) 推荐(0)
摘要: import re from bs4 import BeautifulSoup html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse' 阅读全文
posted @ 2016-11-29 22:20 brady-wang 阅读(430) 评论(0) 推荐(0)
摘要: python scripts下 pip install beautifulsoup4 阅读全文
posted @ 2016-11-29 22:00 brady-wang 阅读(216) 评论(0) 推荐(0)
上一页 1 ··· 162 163 164 165 166 167 168 169 170 ··· 198 下一页