Caper123 - 博客园

2020年2月11日

摘要： from lxml import etree html = ''' <li class="tag_1">需要的内容1 <a>需要的内容2</a> </li> ''' selector = etree.HTML(html) contents = selector.xpath('//li[@class 阅读全文

posted @ 2020-02-11 23:19 Caper123 阅读(1082) 评论(0) 推荐(0)

python爬取百度百科（根据爬取的热词自动匹配相应解释）且将数据存入数据库中

摘要： import requests from lxml import etree import time, json, requests import pymysql header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 阅读全文

posted @ 2020-02-11 23:13 Caper123 阅读(331) 评论(0) 推荐(0)

python-读取txt文件每行数据的第一个

摘要：利用split进行分割 f=open("output.txt","r", encoding = 'utf-8',errors='ignore') for line in f: print(line.split(' ')[0]) 原文件：效果：阅读全文

posted @ 2020-02-11 19:50 Caper123 阅读(5701) 评论(0) 推荐(0)

2020年2月10日

jieba分割热词，统计频率，以及停用词

摘要： import jieba from collections import Counter if __name__ == '__main__': filehandle = open("boke.txt", "r", encoding='utf-8',errors='ignore'); mystr = 阅读全文

posted @ 2020-02-10 23:43 Caper123 阅读(505) 评论(0) 推荐(0)

2020年2月9日

python--词云图

摘要：首先需要安装工具在此页面输入pip install jieba wordcloud matplotlib即可代码如下： import matplotlib.pyplot as plt import jieba from wordcloud import WordCloud #1.读出歌词 tex 阅读全文

posted @ 2020-02-09 21:34 Caper123 阅读(325) 评论(0) 推荐(0)

jieba库应用 python

摘要：应用实例：准备一个txt文件 import jieba txt = open("三国演义.txt","r", encoding = 'gbk',errors='ignore').read() #读取已存好的txt文档 words = jieba.lcut(txt) #进行分词 counts = { 阅读全文

posted @ 2020-02-09 16:00 Caper123 阅读(133) 评论(0) 推荐(0)

Python爬取疫情数据并存入mysql中

摘要： import time, json, requests import pymysql url='https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5&&callback=&_=%d'%int(time.time()*1000) data = 阅读全文

posted @ 2020-02-09 11:50 Caper123 阅读(1030) 评论(0) 推荐(0)

2020年2月7日

python实现scrapy定时执行爬虫

摘要：在scrapy项目中写一个定时爬虫的程序main.py ，直接放在scrapy的存储代码的目录中就能设定时间定时多次执行。 import time import os while True: os.system("scrapy crawl News") time.sleep(86400) #每隔一天阅读全文

posted @ 2020-02-07 23:25 Caper123 阅读(1643) 评论(0) 推荐(0)

2020年2月6日

爬取微博热搜

摘要： import requests from lxml import etree ###网址 url="https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6" ###模拟浏览器 header={'User-Agent':'Mozilla 阅读全文

posted @ 2020-02-06 13:23 Caper123 阅读(340) 评论(0) 推荐(0)

2020年2月5日

北京市政信息列表echarts可视化

摘要：数据准备：二、可视化展示阅读全文

posted @ 2020-02-05 19:43 Caper123 阅读(133) 评论(0) 推荐(0)

吴林祥

公告