370蔡轩

Hadoop综合大作业

摘要：开启Hadoop jps 创建文件移动文件启动Hive 创建数据库结果阅读全文

posted @ 2018-05-28 21:18 370蔡轩阅读(191) 评论(0) 推荐(0)

理解MapReduce

摘要： 1.用Python编写WordCount程序并提交任务程序 WordCount 输入一个包含大量单词的文本文件输出文件中每个单词及其出现次数（频数），并按照单词字母顺序排序，每个单词和其频数占一行，单词和频数之间有间隔阅读全文

posted @ 2018-05-10 21:51 370蔡轩阅读(186) 评论(0) 推荐(0)

爬虫大作业

摘要：获取网址新闻内容全部新闻列表阅读全文

posted @ 2018-05-03 21:42 370蔡轩阅读(192) 评论(0) 推荐(0)

数据结构化与保存

摘要： 1. 将新闻的正文内容保存到文本文件。 2. 将新闻数据结构化为字典的列表: 单条新闻的详情-->字典news news = {} news['title'] = soupd.select('.show-title')[0].text # c = soupd.select('#content')[0 阅读全文

posted @ 2018-04-16 12:01 370蔡轩阅读(183) 评论(0) 推荐(0)

使用正则表达式，取得点击次数，函数抽离

摘要： 1. 2. 3. 4. 5. 6. 7. def getClickCount(newsUrl): newsId = re.search('\_(.*).html', newsUrl).group(1).split('/')[-1] clickUrl = 'http://oa.gzcc.cn/api. 阅读全文

posted @ 2018-04-11 21:51 370蔡轩阅读(241) 评论(0) 推荐(0)

爬取校园新闻首页的新闻

摘要： 1. 用requests库和BeautifulSoup库，爬取校园新闻首页新闻的标题、链接、正文。标题链接正文 2. 分析字符串，获取每篇新闻的发布时间，作者，来源，摄影等信息。发布时间 3. 将其中的发布时间由str转换成datetime类型。阅读全文

posted @ 2018-04-02 11:58 370蔡轩阅读(283) 评论(0) 推荐(0)

网络爬虫练习

摘要：网页练习阅读全文

posted @ 2018-03-30 21:02 370蔡轩阅读(209) 评论(0) 推荐(0)

摘要： # -*- coding: UTF-8 -*- str = '''Gotta Have You （The Weepies） Gray, quiet and tired and mean Picking at a worried seam I try to make you mad at me over the phone Red eyes and fire and signs I'm tak... 阅读全文

posted @ 2018-03-26 11:44 370蔡轩阅读(210) 评论(0) 推荐(0)

组合数据类型练习

摘要： >>>classmate=['Mi','Bo','Tra','李三','Tra',56]>>> print(classmate)['Mi', 'Bo', 'Tra', '李三', 'Tra', 56]>>> f=['1','2','3']>>> f['1', '2', '3 阅读全文

posted @ 2018-03-22 11:54 370蔡轩阅读(162) 评论(0) 推荐(0)

python

摘要： # -*- coding:UTF-8 -*- import turtle def newgoto(x, y): turtle.up() turtle.goto(x, y) turtle.down() def draw(x): turtle.begin_fill() for i in range(5): turtle.forward(... 阅读全文

posted @ 2018-03-19 10:05 370蔡轩阅读(199) 评论(0) 推荐(0)