ubuntu上spark-1.5 standalone mode 测试

第一步，创建用户uspark

root@hadoop1:~# adduser uspark
Adding user `uspark' ...
Adding new group `uspark' (1002) ...
Adding new user `uspark' (1002) with group `uspark' ...
Creating home directory `/home/uspark' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for uspark
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
root@hadoop1:~#

第二步，配置Java环境变量

uspark@hadoop:~$ java -version
The program 'java' can be found in the following packages:
* default-jre
* gcj-4.8-jre-headless
* openjdk-7-jre-headless
* gcj-4.6-jre-headless
* openjdk-6-jre-headless
Ask your administrator to install one of them
uspark@hadoop:~$ vi .bashrc

在 .bashrc 文件末尾加上

#set Java Environment

export JAVA_HOME=/home/uspark/jdk1.8.0_60

export CLASSPATH=".:$JAVA_HOME/lib/rt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH"

export PATH="$JAVA_HOME/bin:$PATH"

uspark@hadoop:~$ source .bashrc
uspark@hadoop:~$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) Server VM (build 25.60-b23, mixed mode)
uspark@hadoop:~$

第三步，下载spark

打开http://spark.apache.org/downloads.html

复制下作链接

uspark@hadoop:~/backup$ wget http://mirrors.cnnic.cn/apache/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz

下载完成，解压文件

tar xf spark-1.5.1-bin-hadoop2.6.tgz

uspark@liuhy:~$ cd spark-1.5.1-bin-hadoop2.6/
uspark@liuhy:~/spark-1.5.1-bin-hadoop2.6$ ll
total 1068
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:13 bin/
-rw-r--r-- 1 uspark uspark 960539 Oct  8 05:13 CHANGES.txt
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:13 conf/
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:12 data/
-rw-rw-r-- 1 uspark uspark    747 Oct  8 05:23 derby.log
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:12 ec2/
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:13 examples/
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:12 lib/
-rw-r--r-- 1 uspark uspark  50972 Oct  8 05:12 LICENSE
drwxrwxr-x 5 uspark uspark   4096 Oct  8 05:23 metastore_db/
-rw-r--r-- 1 uspark uspark  22559 Oct  8 05:12 NOTICE
drwxr-xr-x 6 uspark uspark   4096 Oct  8 05:12 python/
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:12 R/
-rw-r--r-- 1 uspark uspark   3593 Oct  8 05:12 README.md
-rw-r--r-- 1 uspark uspark    120 Oct  8 05:12 RELEASE
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:12 sbin/
uspark@liuhy:~/spark-1.5.1-bin-hadoop2.6$

Interactive Analysis with the Spark Shell

参考http://spark.apache.org/docs/latest/quick-start.html

uspark@liuhy:~/spark-1.5.1-bin-hadoop2.6$ bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/08 05:23:26 WARN Utils: Your hostname, liuhy resolves to a loopback address: 127.0.1.1; using 192.168.1.112 instead (on interface eth0)
15/10/08 05:23:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/10/08 05:23:30 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.

scala> val tf = sc.textFile("README.md")

tf: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at <console>:21

scala> tf.count

count countApprox countApproxDistinct countByValue

countByValueApprox

scala> tf.count

res2: Long = 98

scala>

scala> val lineWithSpark = tf.filter(_.contains("Spark"))
lineWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at <console>:23

scala> lineWithSpark.first
res5: String = # Apache Spark

scala> lineWithSpark.count
count                 countApprox           countApproxDistinct   countByValue
countByValueApprox

scala> lineWithSpark.count
res6: Long = 18

scala> lineWithSpark.foreach
foreach            foreachPartition   foreachWith

scala> lineWithSpark.foreach(println)
# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
and Spark Streaming for stream processing.
You can find the latest Spark documentation, including a programming
## Building Spark
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
The easiest way to start using Spark is through the Scala shell:
Spark also comes with several sample programs in the `examples` directory.
    ./bin/run-example SparkPi
    MASTER=spark://host:7077 ./bin/run-example SparkPi
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
Hadoop, you must build Spark against the same version that your cluster runs.
for guidance on building a Spark application that works with a particular
in the online documentation for an overview on how to configure Spark.

scala>

over

posted on 2015-10-08 00:26 develooop 阅读(427) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

keep thinking

ubuntu上spark-1.5 standalone mode 测试

Interactive Analysis with the Spark Shell

导航

公告