上一页 1 2 3 4 5 6 7 8 ··· 13 下一页
摘要: from lxml import etree html = ''' <li class="tag_1">需要的内容1 <a>需要的内容2</a> </li> ''' selector = etree.HTML(html) contents = selector.xpath('//li[@class 阅读全文
posted @ 2020-02-11 23:19 Caper123 阅读(1082) 评论(0) 推荐(0)
摘要: import requests from lxml import etree import time, json, requests import pymysql header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 阅读全文
posted @ 2020-02-11 23:13 Caper123 阅读(331) 评论(0) 推荐(0)
摘要: 利用split进行分割 f=open("output.txt","r", encoding = 'utf-8',errors='ignore') for line in f: print(line.split(' ')[0]) 原文件: 效果: 阅读全文
posted @ 2020-02-11 19:50 Caper123 阅读(5701) 评论(0) 推荐(0)
摘要: import jieba from collections import Counter if __name__ == '__main__': filehandle = open("boke.txt", "r", encoding='utf-8',errors='ignore'); mystr = 阅读全文
posted @ 2020-02-10 23:43 Caper123 阅读(505) 评论(0) 推荐(0)
摘要: 首先需要安装工具 在此页面输入pip install jieba wordcloud matplotlib即可 代码如下: import matplotlib.pyplot as plt import jieba from wordcloud import WordCloud #1.读出歌词 tex 阅读全文
posted @ 2020-02-09 21:34 Caper123 阅读(325) 评论(0) 推荐(0)
摘要: 应用实例: 准备一个txt文件 import jieba txt = open("三国演义.txt","r", encoding = 'gbk',errors='ignore').read() #读取已存好的txt文档 words = jieba.lcut(txt) #进行分词 counts = { 阅读全文
posted @ 2020-02-09 16:00 Caper123 阅读(133) 评论(0) 推荐(0)
摘要: import time, json, requests import pymysql url='https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5&&callback=&_=%d'%int(time.time()*1000) data = 阅读全文
posted @ 2020-02-09 11:50 Caper123 阅读(1030) 评论(0) 推荐(0)
摘要: 在scrapy项目中写一个定时爬虫的程序main.py ,直接放在scrapy的存储代码的目录中就能设定时间定时多次执行。 import time import os while True: os.system("scrapy crawl News") time.sleep(86400) #每隔一天 阅读全文
posted @ 2020-02-07 23:25 Caper123 阅读(1643) 评论(0) 推荐(0)
摘要: import requests from lxml import etree ###网址 url="https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6" ###模拟浏览器 header={'User-Agent':'Mozilla 阅读全文
posted @ 2020-02-06 13:23 Caper123 阅读(340) 评论(0) 推荐(0)
摘要: 数据准备: 二、可视化展示 阅读全文
posted @ 2020-02-05 19:43 Caper123 阅读(133) 评论(0) 推荐(0)
上一页 1 2 3 4 5 6 7 8 ··· 13 下一页