scala+spark+Hbase

  • spark特点与应用场景

  •   Spark是通用的并行化计算框架,基于MapReduce实现分布式计算,其中间结果可以保存在内存中,从而不再需要读写HDFS。
  • 特点:
  • 简单方便,使用scala语言。(与RDD很好结合)
  • 计算速度快,中间结果缓存在内存中。
  • 高错误容忍。
  • 操作丰富。
  • 广播,每个节点可以保留一份小数据集。
  • 核心:RDD(Resilient Distributed Datasets弹性分布式数据集)
  • 应用场景:
  • 迭代式算法:迭代式机器学习、图算法,包括PageRank、K-means聚类和逻辑回归(logistic regression)。
  • 交互式数据挖掘工具:用户在同一数据子集上运行多个Adhoc查询。

  • 在上篇博文中搭建了zookpree+hadoop集群,接下来准备搭建scala+spark+Hbase完善下集群。
  • rz -E  上传scala+spark+hbase包
  • tar -zxvf scala-2.11.8.tgz
  • tar -zxf spark-2.0.1-bin-hadoop2.7.tgz
  • Spark的安装教程

  • 安装JDK与Scala

  • 下载JDK:sudo apt-get install openjdk-7-jre-headless。
  • 下载Scala: http://www.scala-lang.org/
  • 解压缩:tar -zxvf scala-2.11.8.tgz。
  • 进入sudo vim /etc/profile在下面添加路径:(vi .bashrc)
  • export SCALA_HOME=/data/app/scala-2.11.8
  • export SPARK_HOME=/data/app/spark-2.0.1-bin-hadoop2.7
  • export HBASE_HOME=/data/app/hbase-1.2.3
  • export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$ZOOKEEPER/bin:$HADOOP/bin:$HADOOP/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HBASE_HOME/bin
  • 使修改生效source /etc/profile。 source /etc/profile
  • 在命令行输入scala测试。
  • scala
  • Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
  • Type in expressions for evaluation. Or try :help.
  • scala>
  • 出现这个证明scala已经安装成功。

 

  • 安装Spark

  • 下载Spark: http://spark.apache.org/downloads.html
  • 解压缩:spark-2.0.1-bin-hadoop2.7.tgz
  • 进入sudo vim /etc/profile在下面添加路径:
  • (上面已经添加过了就不用添加了)
  • vi /home/soft/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh
  • export JAVA_HOME=/home/soft/app/jdk1.8.0_101
    export SCALA_HOME=/home/soft/app/scala-2.11.8
    export HADOOP_HOME=/home/soft/app/hadoop-2.7.3
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181,node4:2181,node5:2181 -Dspark.deploy
    .zookeeper.dir=/spark"
    export SPARK_EXECUTOR_MEMORY=5g
    export SPARK_WORKER_MEMORY=7g
    export SPARK_LOG_DIR=/data/logs/spark_logs/

  • mkdir -pv /data/logs/spark_logs/

vi /home/soft/app/spark-2.0.1-bin-hadoop2.7/conf/slaves

node1
node2
node3
node4

media@node1:~$ spark-shell
/data/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh: line 72: unexpected EOF while looking for matching `"'
/data/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh: line 76: syntax error: unexpected end of file
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/07/11 11:41:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/11 11:41:47 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.31.81.41:4040
Spark context available as 'sc' (master = local[*], app id = local-1499744506831).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.1
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

出现这个界面spark安装成功!!!

安装hbase

cd /home/soft/app/hbase-1.2.3/conf

cat regionservers
node1
node2
node3
node4

cat backup-masters
node2

 

cat hbase-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
<property>
   <name>dfs.ha.namenodes.ns</name>
   <value>node1,node2</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ns/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>16000</value>
</property>
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>16020</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>16030</value>
</property>
       <property>
          <name>hbase.zookeeper.property.clientPort</name>
          <value>2181</value>
          <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.
          </description>
        </property>
        <property>
          <name>hbase.zookeeper.quorum</name>
          <value>node1,node2,node3,node4,node5</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.dataDir</name>
          <value>/data/zk_data</value>
        </property>
        <property>
          <name>zookeeper.session.timeout</name>
          <value>180000</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.tickTime</name>
          <value>9000</value>
        </property>
<property>
    <name>hbase.tmp.dir</name>
    <value>/data/hbase/tmp</value>
</property>
</configuration>
View Code

mkdir -pv /data/hbase/tm

vi hbase-env.sh

27 export JAVA_HOME=/home/media/app/jdk1.8
30 export HBASE_CLASSPATH=/home/media/app/hadoop-2.7.3/etc/hadoop

 

posted @ 2017-07-06 21:04  肖咏卓  阅读(690)  评论(0编辑  收藏  举报