随笔分类 - Hadoop
摘要:何谓五横,基本还是根据数据的流向自底向上划分五层,跟传统的数据仓库其实很类似,数据类的系统,概念上还是相通的,分别为数据采集层、数据处理层、数据分析层、数据访问层及应用层。同时,大数据平台架构跟传统数据仓库有一个不同,就是同一层次,为了满足不同的场景,会采用更多的技术组件,体现百花齐放的特点,这是一
阅读全文
摘要:combineByKey-->>aggregateByKey-->>foldByKey-->>reduceByKey-->>groupByKey-->>countByKey 0> combineByKey(createCombiner, mergeValue, mergeCombiners, num
阅读全文
摘要:名词解释 ▪ Operations are eager when they are executed as soon as the statement is reached in the code; 勤快运行:接收到代码立刻执行; ▪ Operations are lazy when the exe
阅读全文
摘要:名词解释 CDH #(Cloudera’s Distribution including Apache Hadoop) ecosystem projects #生态系统项目 Subscription #订阅 Volume #容积 Velocity #速度 Variety #多样的 ETL #Extr
阅读全文
摘要:Privileges Required for Hive Operations Codes Y: Privilege required. Y + G: Privilege "WITH GRANT OPTION" required. Action Select Insert Update Delete
阅读全文
摘要:安装基于CentOS 7 安装,系统非最小化安装,选择部分Server 服务,开发工具组。全程使用root用户,因为操作系统的权限、安全,在启动时会和使用其它用户有差别。Step 1:下载hadoop.apache.org 选择推荐的下载镜像结点; https://hadoop.apache.org
阅读全文
摘要:Failed redirect for container_1544160578687_0003_01_000001ResourceManager RM Home NodeManagerTools Failed while trying to construct the redirect url t
阅读全文
摘要:[root@master hadoop-3.1.1]# bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar An example program must be given as the first argu
阅读全文
摘要:[root@master hadoop-3.1.1]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar An example program must be given as the first ar
阅读全文
摘要:Spark安装 Spark2.1.0完全分布式环境搭建:MASTER节点:1.下载文件:wget -O "spark.tgz" "http://d3kbcqa49mib13.cloudfront.net/spark.tgz"2.解压并移动至相应的文件夹;tar -xvf spark.tgzmv sp
阅读全文
摘要:解释narrow transformation和wide transformation的区别掌握map flatmap filter coalesce列举两种wide transformation列举Spark pipeline中的4种常见actionTransformationsnarrow tr
阅读全文
摘要:Install Hadoop: Setting up a Single Node Hadoop Cluster There are two ways to install Hadoop, i.e. Single node and Multi node. Single node cluster mea
阅读全文
浙公网安备 33010602011771号