bioamin

2018年4月13日

摘要： sort -k1 -n 第一列按数字正排序 sort -K1 -rn 第一列按数字倒排序阅读全文

posted @ 2018-04-13 20:58 bioamin 阅读(275) 评论(0) 推荐(0)

摘要： a.txt、b.txt文件如下： a.txt 1 hadoop 3 hadoop 5 hadoop 7 hadoop 9 hadoop 11 hadoop 13 hadoop 15 hadoop 17 hadoop 19 hadoop 21 hadoop 23 hadoop 25 hadoop 27 阅读全文

posted @ 2018-04-13 20:54 bioamin 阅读(2291) 评论(0) 推荐(0)

大数据mapreduce俩表join之python实现

摘要：二次排序在Hadoop中，默认情况下是按照key进行排序，如果要按照value进行排序怎么办？即：对于同一个key，reduce函数接收到的value list是按照value排序的。这种应用需求在join操作中很常见，比如，希望相同的key中，小表对应的value排在前面。有两种方法进行二次排序阅读全文

posted @ 2018-04-13 18:27 bioamin 阅读(1082) 评论(0) 推荐(0)

linux显示完整目录

摘要：然后保存退出 source ~/.bashrc 或者关机重新启动即可阅读全文

posted @ 2018-04-13 16:40 bioamin 阅读(568) 评论(0) 推荐(0)

linux压缩与解压（持续更新）

摘要：压缩 tar cvzf w.tar.gz xxx1 xxx2 对应解压：tar xvzf w.tar.gz 阅读全文

posted @ 2018-04-13 16:11 bioamin 阅读(222) 评论(0) 推荐(0)

grep匹配某个次出现的次数

摘要： cat file | grep -c 'xxx' 统计xxx在file中出现的行数 cat file | grep -o 'xxx' 统计xxx在file中出现的次数阅读全文

posted @ 2018-04-13 16:05 bioamin 阅读(3934) 评论(0) 推荐(0)

2018年4月12日

vi快速查找

摘要：用vim时，想高亮显示一个单词并查找的方发，将光标移动到所找单词. 1: shift + "*" 向下查找并高亮显示 2: shift + "#" 向上查找并高亮显示 3: "g" + "d" 高亮显示光标所属单词，"n" 查找！阅读全文

posted @ 2018-04-12 16:19 bioamin 阅读(1400) 评论(0) 推荐(0)

python基础练习

摘要： 1.随机数(0-100)产生： import random print int(random.random()*100) 2.numpy包中常用的函数： shape函数返回数组每一个维度的长度 #用法a.shapea.shape[0]a.shape[1] tile()主要用于数组的重复，生成新数组阅读全文

posted @ 2018-04-12 15:52 bioamin 阅读(193) 评论(0) 推荐(0)

2018年4月3日

大数据python词频统计之hdfs分发-cacheArchive

摘要： -cacheArchive也是从hdfs上进分发，但是分发文件是一个压缩包，压缩包内可能会包含多层目录多个文件 1.The_Man_of_Property.txt文件如下（将其上传至hdfs上） hadoop fs -put The_Man_of_Property.txt /mapreduce Pr 阅读全文

posted @ 2018-04-03 23:24 bioamin 阅读(977) 评论(0) 推荐(0)

hadoop常用命令详细解释

摘要： hadoop命令分为2级，在linux命令行中输入hadoop，会提示输入规则 Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem#这阅读全文

posted @ 2018-04-03 19:51 bioamin 阅读(6472) 评论(0) 推荐(0)

追寻创业的梦想

公告