[CDH] Cloudera's Distribution including Apache Hadoop
You may choose to install spark, yarn, hive, etc one by one. [Spark] 00 - Install Hadoop & Spark
But here, we will introduce how to install and configure big data environment in an automatic way. You will also understand why CDH is there.
一些资源
cdh的pyspark安装:https://blog.csdn.net/weixin_43215250/article/details/89186733
背景知识
Cluster
Ubuntu 18.04
node00, 192.168.56.1
CentOS 7.7
node01, 192.168.56.100
node02, 192.168.56.110
node03, 192.168.56.120
Packages
(base) [hadoop@node01 soft]$ ll
total 3850204 -rwxrwxrwx 1 hadoop hadoop 541906131 Nov 22 18:33 Anaconda3-2019.07-Linux-x86_64.sh -rw-r--r-- 1 hadoop hadoop 2108071134 Nov 22 18:50 CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel -rwxrwxr-x 1 hadoop hadoop 41 Nov 23 16:17 CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha -rw-r--r-- 1 hadoop hadoop 41 Nov 22 18:50 CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 -rw-r--r-- 1 hadoop hadoop 832469335 Nov 22 18:51 cloudera-manager-centos7-cm5.14.2_x86_64.tar.gz -rw-rw-r-- 1 hadoop hadoop 194042837 Nov 22 18:09 jdk-8u202-linux-x64.tar.gz -rw-r--r-- 1 hadoop hadoop 179439263 Nov 22 18:51 jdk-8u211-linux-x64.rpm drwxr-xr-x 2 hadoop hadoop 120 Nov 22 18:51 kafka parcel安装包 -rw-r--r-- 1 hadoop hadoop 74072 Nov 22 18:51 manifest.json -rw-r--r-- 1 hadoop hadoop 50970604 Nov 22 18:51 maxwell-1.22.1.tar.gz -rw-r--r-- 1 root root 25548 Apr 7 2017 mysql57-community-release-el7-10.noarch.rpm -rw-r--r-- 1 hadoop hadoop 848399 Nov 22 18:51 mysql-connector-java.jar -rw-rw-r-- 1 hadoop hadoop 34731946 Nov 22 18:25 zookeeper-3.4.5-cdh5.14.2.tar.gz
Data Pipeline
一、引出问题
QPS: Queries Per Second
单个表的大小达到400w-500w时,性能比较低下。开始 “分库、分表”。那岂不是要分太多的表?
-
- 辅助索引:只能局部有效。
- 分库策略:join方法不能用了;cout, order, group也不能用了。
- 扩容策略:需要再次水平拆分,迁移数据。
读写分离
Ref: 什么是数据库读写分离?
Ref: mysql数据库的主从同步,实现读写分离
大多数互联网业务,往往读多写少,这时候,数据库的读会首先称为数据库的瓶颈,这时,如果我们希望能够线性的提升数据库的读性能,消除读写锁冲突从而提升数据库的写性能,那么就可以使用“分组架构”(读写分离架构)。
用一句话概括,读写分离是用来解决数据库的读性能瓶颈的。
同时也带来一些问题:
-
- 数据库连接池要进行区分,哪些是读连接池,哪个是写连接池,研发的难度会增加;
- 为了保证高可用,读连接池要能够实现故障自动转移;
- 主从的一致性问题需要考虑。
缓存策略
如果在缓存的读写分离进行二选一时,还是应该首先考虑 "缓存"。
-
- 缓存的使用成本要比从库少非常多;
- 缓存的开发比较容易,大部分的读操作都可以先去缓存,找不到的再渗透到数据库。
- 当然,如果我们已经运用了缓存,但是读依旧还是瓶颈时,就可以选择“读写分离”架构了。简单来说,我们可以将读写分离看做是缓存都解决不了时的一种解决方案。
- 当然,缓存也不是没有缺点的:对于缓存,我们必须要考虑的就是高可用,不然,如果缓存一旦挂了,所有的流量都同时聚集到了数据库上,那么数据库是肯定会挂掉的。
水平切分
常见的,其实是数据容量的瓶颈。例如订单表,数据量只增不减,历史数据又必须要留存,非常容易成为性能的瓶颈。
数据库水平切分,也是一种常见的数据库架构,是一种通过算法,将数据库进行分割的架构。一个水平切分集群中的每个数据库,通常称为一个“分片”。每一个分片中的数据没有重合,所有分片中的数据并集组成全部数据。
二、数据工程的 "细节问题"
binlog的意义
分析数据时,不能影响主库。
主库 --> binlog文件 --> maxwell --> kalfa cluster --> hbaseMaster
把数据分析的业务 脱离出来,缓解数据分析的 “读写压力” 。
hbase的使用问题
spark和hbase集成时默认进行的是全表扫描,给内存带来压力。
kafka的使用问题
kafka的偏移量管理,(1) 可能会导致一条消息被处理多次。(2) 至少被消费一次。
前后端分离
REST架构风格。
Goto: 理解本真的 REST 架构风格
Cloudera 集群搭建
一、Cloudera Manager
Server --> Management Service --> Database Agent 1 Agent 2 ...
二、前期准备
三台机器执行如下命令。
防火墙 systemctl stop firewalld systemctl disable firewalld 关闭selinux安全子系统 vim /etc/selinux/config SELINUX=disabled 配置时区 Asia>China>beijing cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime 同步时间 如果下面不成功,则先执行类似的命令:date -s '2019-11-11 11:55:55' 三台机器执行以下命令定时同步xxx云服务器时间 yum -y install ntpdate crontab -e */1 * * * * /usr/sbin/ntpdate time1.aliyun.com
Ubuntu 创建root用户;ssh 免密登录。
(base) hadoop@unsw-ThinkPad-T490:/kkb/soft$ su root
Password:
su: Authentication failure
(base) hadoop@unsw-ThinkPad-T490:/soft$ sudo passwd root
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
(base) hadoop@unsw-ThinkPad-T490:/soft$ su
Password:
root@unsw-ThinkPad-T490:/soft#
准备 cloudera安装包,jdk包,依赖包(如下)
yum -y install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb
三、配置MySQL的scm用户
MySQL数据库只安装在node02即可,放不同的node上。
[root@node02 ~]# yum -y install mysql57-community-release-el7-10.noarch.rpm
[root@node02 ~]# yum -y install mysql-community-server
# 已经查询不到mariadb数据库,被覆盖掉了。
[root@node02 ~]# rpm -qa|grep mariadb
You have new mail in /var/spool/mail/root
[服务] MySQL数据库设置。
[root@node02 hadoop]# systemctl start mysqld.service [root@node02 hadoop]# systemctl status mysqld.service
● mysqld.service - MySQL Server Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2019-11-24 10:00:18 AEDT; 2h 58min ago Docs: man:mysqld(8) http://dev.mysql.com/doc/refman/en/using-systemd.html Main PID: 1311 (mysqld) CGroup: /system.slice/mysqld.service └─1311 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid Nov 24 10:00:17 node02.kaikeba.com systemd[1]: Starting MySQL Server... Nov 24 10:00:18 node02.kaikeba.com systemd[1]: Started MySQL Server.
添加root用户远程访问数据库。
[root@node02 ~]# mysql -u root -p
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| maxwell |
| mysql |
| #mysql50#mysql-bin |
| performance_schema |
| scm |
| sys |
+--------------------+
7 rows in set (0.09 sec)
mysql> update mysql.user set Grant_priv='Y',Super_priv='Y' where user = 'root' and host = '%';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> quit
[root@node02 ~]# systemctl restart mysqld.service
[root@node01 ~]# cp mysql-connector-java.jar /opt/cm-5.14.2/share/cmf/lib/
所有节点手动创建文件夹。
[root@node01 ~]# mkdir /opt/cm-5.14.2/run/cloudera-scm-agent
所有节点,创建cloudera-scm用户
useradd --system --home=/opt/cm-5.14.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
远程创建用户:scm。如果原数据库中有scm database,先删掉。再创建如下。
[root@node01 ~]# /opt/cm-5.14.2/share/cmf/schema/scm_prepare_database.sh mysql -h node02 -uroot -p'<pwd>' --scm-host node01 scm scm '<pwd>'
所有节点,设置好cloudera的master的位置。
[root@node01 ~]# vi /opt/cm-5.14.2/etc/cloudera-scm-agent/config.ini [General] # 修改成node01 server_host=node01
所有节点,添加cloudera-scm权限。
[root@node01 ~]# chown -R cloudera-scm:cloudera-scm /opt/cloudera
[root@node01 ~]# chown -R cloudera-scm:cloudera-scm /opt/cm-5.14.2
四、启动cloudera服务界面
[服务] node01启动服务。
[root@node01 ~]# /opt/cm-5.14.2/etc/init.d/cloudera-scm-server start Starting cloudera-scm-server: [ OK ]
等待直到7180端口出现。
[root@node01 opt]# ps -ef | grep scm-server root 20411 1 77 15:17 pts/0 00:02:46 /kkb/install/jdk1.8.0_202/bin/java -cp .:lib/*:/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar -server -Dlog4j.configuration=file:/opt/cm-5.14.2/etc/cloudera-scm-server/log4j.properties -Dfile.encoding=UTF-8 -Dcmf.root.logger=INFO,LOGFILE -Dcmf.log.dir=/opt/cm-5.14.2/log/cloudera-scm-server -Dcmf.log.file=cloudera-scm-server.log -Dcmf.jetty.threshhold=WARN -Dcmf.schema.dir=/opt/cm-5.14.2/share/cmf/schema -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dpython.home=/opt/cm-5.14.2/share/cmf/python -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+HeapDumpOnOutOfMemoryError -Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:OnOutOfMemoryError=kill -9 %p com.cloudera.server.cmf.Main root 21270 9383 0 15:20 pts/0 00:00:00 grep --color=auto scm-server [root@node01 opt]# netstat -anpl | grep 20411 tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 20411/java tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 20411/java tcp 0 0 192.168.56.100:7182 192.168.56.110:55332 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47974 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:7182 192.168.56.100:42990 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47230 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47228 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47972 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47976 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47234 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:7182 192.168.56.120:53120 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47232 192.168.56.110:3306 ESTABLISHED 20411/java tcp 0 0 192.168.56.100:47226 192.168.56.110:3306 ESTABLISHED 20411/java unix 2 [ ] STREAM CONNECTED 81363 20411/java unix 2 [ ] STREAM CONNECTED 82057 20411/java
[服务] 所有node启动agent。
[root@node01 ~]# /opt/cm-5.14.2/etc/init.d/cloudera-scm-agent start Starting cloudera-scm-agent: [ OK ]
登录管理界面,使用admin登录。常见问题如下。
注意下文件夹权限,删除这个cm_guid,重启agent服务就好了。
rm /opt/cm-5.14.2/lib/cloudera-scm-agent/cm_guid -rf /opt/cm-5.14.2/etc/init.d/cloudera-scm-agent restart
如果半途中断,则reboot,关掉防火墙,删掉MySQL中的scm并重新创建,删除之前的分配数据,再开启server & agent。

[root@node01 ~]# hadoop jar /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/words /test/output3 19/11/27 11:41:51 INFO client.RMProxy: Connecting to ResourceManager at node03.kaikeba.com/192.168.56.120:8032 19/11/27 11:41:52 INFO input.FileInputFormat: Total input paths to process : 1 19/11/27 11:41:52 INFO mapreduce.JobSubmitter: number of splits:1 19/11/27 11:41:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1574760305839_0002 19/11/27 11:41:52 INFO impl.YarnClientImpl: Submitted application application_1574760305839_0002 19/11/27 11:41:52 INFO mapreduce.Job: The url to track the job: http://node03.kaikeba.com:8088/proxy/application_1574760305839_0002/ 19/11/27 11:41:52 INFO mapreduce.Job: Running job: job_1574760305839_0002 19/11/27 11:41:58 INFO mapreduce.Job: Job job_1574760305839_0002 running in uber mode : false 19/11/27 11:41:58 INFO mapreduce.Job: map 0% reduce 0% 19/11/27 11:42:03 INFO mapreduce.Job: map 100% reduce 0% 19/11/27 11:42:07 INFO mapreduce.Job: map 100% reduce 100% 19/11/27 11:42:08 INFO mapreduce.Job: Job job_1574760305839_0002 completed successfully 19/11/27 11:42:08 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=45 FILE: Number of bytes written=298317 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=119 HDFS: Number of bytes written=17 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2744 Total time spent by all reduces in occupied slots (ms)=2409 Total time spent by all map tasks (ms)=2744 Total time spent by all reduce tasks (ms)=2409 Total vcore-milliseconds taken by all map tasks=2744 Total vcore-milliseconds taken by all reduce tasks=2409 Total megabyte-milliseconds taken by all map tasks=2809856 Total megabyte-milliseconds taken by all reduce tasks=2466816 Map-Reduce Framework Map input records=1 Map output records=2 Map output bytes=21 Map output materialized bytes=41 Input split bytes=106 Combine input records=2 Combine output records=2 Reduce input groups=2 Reduce shuffle bytes=41 Reduce input records=2 Reduce output records=2 Spilled Records=4 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=121 CPU time spent (ms)=1290 Physical memory (bytes) snapshot=622190592 Virtual memory (bytes) snapshot=5582626816 Total committed heap usage (bytes)=505413632 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=13 File Output Format Counters Bytes Written=17
五、单独安装Spark
可能自带的spark版本过低。
[root@node01 hadoop]# ls
manifest.json
SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel
SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha1
SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@node01 hadoop]# ls /opt/cloudera/csd/
[root@node01 hadoop]# cp SPARK2_ON_YARN-2.2.0.cloudera1.jar /opt/cloudera/csd/
[root@node01 hadoop]# chown cloudera-scm:cloudera-scm /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@node01 hadoop]# ls /opt/cloudera/csd/
SPARK2_ON_YARN-2.2.0.cloudera1.jar
[root@node01 hadoop]# ls /opt/cloudera/parcel-repo/
bap_manifest.json KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.sha
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.torrent
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.torrent manifest.json
[root@node01 hadoop]# mv /opt/cloudera/parcel-repo/manifest.json /opt/cloudera/parcel-repo/manifest.json.bak
[root@node01 hadoop]# ls /opt/cloudera/parcel-repo/
bap_manifest.json KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.sha
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.torrent
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.torrent manifest.json.bak
[root@node01 hadoop]# cp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel manifest.json /opt/cloudera/parcel-repo/
[root@node01 hadoop]#
先关掉集群,运行如下命令,再启动集群;如此,集群 -> parcel目录下能看到新增的spark2安装包,安装分配激活即可。
[root@node01 hadoop]# /opt/cm-5.14.2/etc/init.d/cloudera-scm-agent restart Stopping cloudera-scm-agent: [ OK ] Starting cloudera-scm-agent: [ OK ] [root@node01 hadoop]# /opt/cm-5.14.2/etc/init.d/cloudera-scm-server restart Stopping cloudera-scm-server: [ OK ] Starting cloudera-scm-server: [ OK ]
Ref: CDH5.14.4离线安装Spark2.2.0详细步骤
还需要继续配置,依赖hadoop。
# 拷贝文件 cp /opt/cloudera/parcels/CDH/etc/spark/conf.dist/* /opt/cloudera/parcels/SPARK2/etc/spark2/conf.dist/ # 配置spark-env.sh文件 vim /opt/cloudera/parcels/SPARK2/etc/spark2/conf.dist/spark-env.sh
配置文件:/opt/cloudera/parcels/SPARK2/etc/spark2/conf.dist/spark-env.sh
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:$(hadoop classpath)" HADOOP_CONF_DIR=/etc/hadoop/conf
测试 spark 安装是否成功。

[root@node01 ~]# spark2-submit --deploy-mode client --conf spark.ui.port=4041 --class org.apache.spark.examples.SparkPi /opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/examples/jars/spark-examples_2.11-2.1.0.cloudera1.jar 10 19/11/27 13:56:58 INFO spark.SparkContext: Running Spark version 2.1.0.cloudera1 19/11/27 13:56:58 INFO spark.SecurityManager: Changing view acls to: root,hdfs 19/11/27 13:56:58 INFO spark.SecurityManager: Changing modify acls to: root,hdfs 19/11/27 13:56:58 INFO spark.SecurityManager: Changing view acls groups to: 19/11/27 13:56:58 INFO spark.SecurityManager: Changing modify acls groups to: 19/11/27 13:56:58 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, hdfs); groups with view permissions: Set(); users with modify permissions: Set(root, hdfs); groups with modify permissions: Set() 19/11/27 13:56:59 INFO util.Utils: Successfully started service 'sparkDriver' on port 42324. 19/11/27 13:56:59 INFO spark.SparkEnv: Registering MapOutputTracker 19/11/27 13:56:59 INFO spark.SparkEnv: Registering BlockManagerMaster 19/11/27 13:56:59 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/11/27 13:56:59 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/11/27 13:56:59 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-006c0a44-497e-4cc6-b3bd-91c133c9fb83 19/11/27 13:56:59 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 19/11/27 13:56:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator 19/11/27 13:56:59 INFO util.log: Logging initialized @2260ms 19/11/27 13:56:59 INFO server.Server: jetty-9.2.z-SNAPSHOT 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@791cbf87{/jobs,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a7e2d9d{/jobs/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@754777cd{/jobs/job,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@372ea2bc{/stages,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4cc76301{/stages/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f08c4b{/stages/stage,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7de0c6ae{/stages/pool,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a486d78{/stages/pool/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cdc3aae{/storage,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ef2d7a6{/storage/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5dcbb60{/storage/rdd,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21526f6c{/environment,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49f5c307{/environment/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@299266e2{/executors,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5471388b{/executors/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66ea1466{/executors/threadDump,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3bffddff{/static,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66971f6b{/,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50687efb{/api,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@517bd097{/jobs/job/kill,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@142eef62{/stages/stage/kill,null,AVAILABLE} 19/11/27 13:56:59 INFO server.ServerConnector: Started ServerConnector@e6516e{HTTP/1.1}{0.0.0.0:4041} 19/11/27 13:56:59 INFO server.Server: Started @2384ms 19/11/27 13:56:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4041. 19/11/27 13:56:59 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.56.100:4041 19/11/27 13:56:59 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/examples/jars/spark-examples_2.11-2.1.0.cloudera1.jar at spark://192.168.56.100:42324/jars/spark-examples_2.11-2.1.0.cloudera1.jar with timestamp 1574823419519 19/11/27 13:56:59 INFO executor.Executor: Starting executor ID driver on host localhost 19/11/27 13:56:59 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33556. 19/11/27 13:56:59 INFO netty.NettyBlockTransferService: Server created on 192.168.56.100:33556 19/11/27 13:56:59 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/11/27 13:56:59 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.56.100, 33556, None) 19/11/27 13:56:59 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.56.100:33556 with 366.3 MB RAM, BlockManagerId(driver, 192.168.56.100, 33556, None) 19/11/27 13:56:59 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.56.100, 33556, None) 19/11/27 13:56:59 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.56.100, 33556, None) 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24e8de5c{/metrics/json,null,AVAILABLE} 19/11/27 13:56:59 INFO internal.SharedState: Warehouse path is 'file:/root/spark-warehouse/'. 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f6bcf87{/SQL,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@78c7f9b3{/SQL/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e93f3d5{/SQL/execution,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a26928a{/SQL/execution/json,null,AVAILABLE} 19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73eb8672{/static/sql,null,AVAILABLE} 19/11/27 13:57:00 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38 19/11/27 13:57:00 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions 19/11/27 13:57:00 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38) 19/11/27 13:57:00 INFO scheduler.DAGScheduler: Parents of final stage: List() 19/11/27 13:57:00 INFO scheduler.DAGScheduler: Missing parents: List() 19/11/27 13:57:00 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents 19/11/27 13:57:00 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB) 19/11/27 13:57:00 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1172.0 B, free 366.3 MB) 19/11/27 13:57:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.100:33556 (size: 1172.0 B, free: 366.3 MB) 19/11/27 13:57:00 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996 19/11/27 13:57:00 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) 19/11/27 13:57:00 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 10 tasks 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0) 19/11/27 13:57:01 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1) 19/11/27 13:57:01 INFO executor.Executor: Fetching spark://192.168.56.100:42324/jars/spark-examples_2.11-2.1.0.cloudera1.jar with timestamp 1574823419519 19/11/27 13:57:01 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2) 19/11/27 13:57:01 INFO client.TransportClientFactory: Successfully created connection to /192.168.56.100:42324 after 57 ms (0 ms spent in bootstraps) 19/11/27 13:57:01 INFO util.Utils: Fetching spark://192.168.56.100:42324/jars/spark-examples_2.11-2.1.0.cloudera1.jar to /tmp/spark-185ac05b-8163-4178-b316-b1081fa7cf76/userFiles-bf97f19c-518c-4bb9-aefe-2b67ab152467/fetchFileTemp69449570899073492.tmp 19/11/27 13:57:01 INFO executor.Executor: Adding file:/tmp/spark-185ac05b-8163-4178-b316-b1081fa7cf76/userFiles-bf97f19c-518c-4bb9-aefe-2b67ab152467/spark-examples_2.11-2.1.0.cloudera1.jar to class loader 19/11/27 13:57:01 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO executor.Executor: Running task 4.0 in stage 0.0 (TID 4) 19/11/27 13:57:01 INFO executor.Executor: Finished task 4.0 in stage 0.0 (TID 4). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO executor.Executor: Running task 5.0 in stage 0.0 (TID 5) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO executor.Executor: Running task 7.0 in stage 0.0 (TID 7) 19/11/27 13:57:01 INFO executor.Executor: Finished task 5.0 in stage 0.0 (TID 5). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO executor.Executor: Running task 6.0 in stage 0.0 (TID 6) 19/11/27 13:57:01 INFO executor.Executor: Finished task 6.0 in stage 0.0 (TID 6). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO executor.Executor: Finished task 7.0 in stage 0.0 (TID 7). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 629 ms on localhost (executor driver) (1/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 6036 bytes) 19/11/27 13:57:01 INFO executor.Executor: Running task 8.0 in stage 0.0 (TID 8) 19/11/27 13:57:01 INFO executor.Executor: Running task 9.0 in stage 0.0 (TID 9) 19/11/27 13:57:01 INFO executor.Executor: Finished task 9.0 in stage 0.0 (TID 9). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 759 ms on localhost (executor driver) (2/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 129 ms on localhost (executor driver) (3/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 105 ms on localhost (executor driver) (4/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 104 ms on localhost (executor driver) (5/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 52 ms on localhost (executor driver) (6/10) 19/11/27 13:57:01 INFO executor.Executor: Finished task 8.0 in stage 0.0 (TID 8). 1041 bytes result sent to driver 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 721 ms on localhost (executor driver) (7/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 149 ms on localhost (executor driver) (8/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 168 ms on localhost (executor driver) (9/10) 19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 59 ms on localhost (executor driver) (10/10) 19/11/27 13:57:01 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.801 s 19/11/27 13:57:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 19/11/27 13:57:01 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.254717 s Pi is roughly 3.1406751406751408 19/11/27 13:57:01 INFO server.ServerConnector: Stopped ServerConnector@e6516e{HTTP/1.1}{0.0.0.0:4041} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@142eef62{/stages/stage/kill,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@517bd097{/jobs/job/kill,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@50687efb{/api,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66971f6b{/,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bffddff{/static,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66ea1466{/executors/threadDump,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5471388b{/executors/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@299266e2{/executors,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@49f5c307{/environment/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21526f6c{/environment,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5dcbb60{/storage/rdd,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7ef2d7a6{/storage/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cdc3aae{/storage,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a486d78{/stages/pool/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7de0c6ae{/stages/pool,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2f08c4b{/stages/stage,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cc76301{/stages/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@372ea2bc{/stages,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@754777cd{/jobs/job,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a7e2d9d{/jobs/json,null,UNAVAILABLE} 19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@791cbf87{/jobs,null,UNAVAILABLE} 19/11/27 13:57:01 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.56.100:4041 19/11/27 13:57:01 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/11/27 13:57:01 INFO memory.MemoryStore: MemoryStore cleared 19/11/27 13:57:01 INFO storage.BlockManager: BlockManager stopped 19/11/27 13:57:02 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 19/11/27 13:57:02 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/11/27 13:57:02 INFO spark.SparkContext: Successfully stopped SparkContext 19/11/27 13:57:02 INFO util.ShutdownHookManager: Shutdown hook called 19/11/27 13:57:02 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-185ac05b-8163-4178-b316-b1081fa7cf76
六、图形管理界面 Kafka Manager
Download: https://github.com/yahoo/kafka-manager/archive/1.3.1.6.zip
Goto: kafka-manager的安装与配置
启动后登录:http://192.168.56.100:8080/,设置cluster name, Cluster Zookeeper Hosts后,保存即可。图形化管理Kafka配置。
[root@node01 kafka-manager-1.3.1.6]# ls bin conf lib README.md share [root@node01 kafka-manager-1.3.1.6]# nohup bin/kafka-manager -Dconfig.file=conf/application.conf -Dhttp.port=8080 & [1] 4115 [root@node01 kafka-manager-1.3.1.6]# nohup: ignoring input and appending output to ‘nohup.out’ [root@node01 kafka-manager-1.3.1.6]# [root@node01 kafka-manager-1.3.1.6]# netstat -ano|grep 8080 tcp6 0 0 :::8080 :::* LISTEN off (0.00/0/0)
七、分布式缓存 Redis
Ref: 同为分布式缓存,为何Redis更胜一筹?
Ref: 为什么分布式一定要有redis?
先编译,然后将编译结果拷贝到 /usr/local 下面。
make
make test
mkdir -p /usr/local/redis/bin
mkdir -p /usr/local/redis/etc
cd ./src
cp redis-cli redis-server mkreleasehdr.sh redis-check-aof redis-check-dump redis-benchmark /usr/local/redis/bin
cp ../redis.conf /usr/local/redis/etc
编辑 redis.conf文件:
将daemonize选项由no置为yes,使redis能后台运行。
注释掉bind 127.0.0.1,将它改为bind 0.0.0.0, protected-mode yes 改为 protected-mode no (这个3.2版本以后才有)。
[root@node01 bin]# [root@node01 bin]# [root@node01 bin]# pwd /usr/local/redis/bin [root@node01 bin]# ./redis-server ../etc/redis.conf 31444:C 27 Nov 10:48:10.708 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 31444:C 27 Nov 10:48:10.708 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=31444, just started 31444:C 27 Nov 10:48:10.708 # Configuration loaded
如果还是连不上,那肯定是你的服务器后台安全组设置没有吧6379放行;
systemctl stop firewalld.service
systemctl stop iptables.service
End.