155林俊彪 - 博客园

Hadoop综合大作业&补交两次作业

摘要： Hadoop综合大作业： 1.用Hive对爬虫大作业产生的文本文件（或者英文词频统计下载的英文长篇小说）进行词频统计。把文件上传到hdfs上启动hive 将数据写入到study表创建分析表统计查看分析统计结果 2.用Hive对爬虫大作业产生的csv文件进行数据分析，写一篇博客描述你的分析过阅读全文

posted @ 2018-05-25 19:43 155林俊彪阅读(281) 评论(0) 推荐(0)

用mapreduce 处理气象数据集

摘要：编写程序求每日最高最低气温，区间最高最低气温阅读全文

posted @ 2018-05-09 21:57 155林俊彪阅读(140) 评论(0) 推荐(0)

熟悉常用的HBase操作，编写MapReduce作业

摘要： 1. 以下关系型数据库中的表和数据，要求将其转换为适合于HBase存储的表并插入数据：学号（S_No）姓名（S_Name）性别（S_Sex）年龄（S_Age）课程（course） 2015001 Zhangsan male 23 2015003 Mary female 22 2015003 阅读全文

posted @ 2018-05-08 20:53 155林俊彪阅读(136) 评论(0) 推荐(0)

数据结构化与保存

摘要： 1. 将新闻的正文内容保存到文本文件。 2. 将新闻数据结构化为字典的列表: 单条新闻的详情-->字典news 一个列表页所有单条新闻汇总-->列表newsls.append(news) 所有列表页的所有新闻汇总列表newstotal.extend(newsls) 3. 安装pandas，用pand 阅读全文

posted @ 2018-04-12 20:48 155林俊彪阅读(94) 评论(0) 推荐(0)

获取全部校园新闻

摘要： 1.取出一个新闻列表页的全部新闻包装成函数。 2.获取总的新闻篇数，算出新闻总页数。 3.获取全部新闻列表页的全部新闻详情。 4.找一个自己感兴趣的主题，进行数据爬取，并进行分词分析。不能与其它同学雷同。阅读全文

posted @ 2018-04-11 22:03 155林俊彪阅读(119) 评论(0) 推荐(0)

爬取校园新闻首页的新闻

摘要： import requests from bs4 import BeautifulSoup from datetime import datetime import re res = requests.get('http://news.gzcc.cn/html/xiaoyuanxinwen/') res.encoding = 'utf-8' soup = BeautifulSoup(res.te... 阅读全文

posted @ 2018-04-09 20:27 155林俊彪阅读(95) 评论(0) 推荐(0)

网络爬虫基础练习

摘要： import requests from bs4 import BeautifulSoup res = requests.get('https://www.cnblogs.com/') res.encoding = 'UTF-8' soup = BeautifulSoup(res.text, 'html.parser') # 取出h1标签的文本 for h1 in soup.find_all(... 阅读全文

posted @ 2018-03-29 20:53 155林俊彪阅读(91) 评论(0) 推荐(0)

中文词频统计

摘要： import jieba file=open('pingfan','r',encoding = 'utf-8') wordList=list(jieba.cut(file.read())) wordDict={} for word in wordList: if(len(word)==1): continue wordDict[word]= wordList.... 阅读全文

posted @ 2018-03-28 21:55 155林俊彪阅读(103) 评论(0) 推荐(0)

英文词频统计

摘要： song = ''' Can't believe its over That you're leaving Weren't we meant to be? Should've sensed the danger Read the warnings Right there in front of me Just stop Lets start it over Couldn't I get one ... 阅读全文

posted @ 2018-03-26 20:41 155林俊彪阅读(128) 评论(0) 推荐(0)

字符串练习

摘要：一、字符串练习： 1.http://news.gzcc.cn/html/2017/xiaoyuanxinwen_1027/8443.html 取得校园新闻的编号 2.https://docs.python.org/3/library/turtle.html 产生python文档的网址 3.http: 阅读全文

posted @ 2018-03-21 17:51 155林俊彪阅读(159) 评论(0) 推荐(0)

微笑就好

导航

公告