林川的日志 - 博客园

2016年1月27日

摘要：在SSS论坛看到有人写的Python爬取乌云厂商，想练一下手，就照着重新写了一遍原帖：http://bbs.sssie.com/thread-965-1-1.html #coding:utf-8 import urllib2 from bs4 import BeautifulSoup url = 阅读全文

posted @ 2016-01-27 15:52 林川的日志阅读(504) 评论(0) 推荐(0)

2016年1月22日

Python每日一练(3):爬取百度贴吧图片

摘要： import requests,re#先把要访问URL和头部准备好url = 'http://tieba.baidu.com/p/2166231880'head = { 'Accept': '*/*', 'Accept-Encoding':'gzip,deflate,sd... 阅读全文

posted @ 2016-01-22 15:18 林川的日志阅读(244) 评论(0) 推荐(0)

2016年1月20日

Python每日一练(2):找出html中的所有链接（Xpath、正则两个版本）

摘要：要在hrml文件中找出特定的内容，首先需要观察该内容是什么东西，在什么位置，这样才能找出来。假设html的文件名称是:"1.html"、href属性全都在a标签里。正则版：#coding:utf-8import rewith open('1.html','r') as f: data = f.... 阅读全文

posted @ 2016-01-20 11:29 林川的日志阅读(1856) 评论(0) 推荐(1)

2016年1月19日

Python每日一练(1):计算文件夹内各个文章中出现次数最多的单词

摘要： #coding:utf-8import os,repath = 'test'files = os.listdir(path)def count_word(words): dic = {} max = 0 marked_key = '' #计算每个单词出现的次数 for ... 阅读全文

posted @ 2016-01-19 23:26 林川的日志阅读(2422) 评论(0) 推荐(0)

水系cmos日志

网络安全爱好者

公告