Hadoop 3.0.0 single node installation

原文地址:http://chennaihug.org/knowledgebase/hadoop-3-0-0-alpha-single-node-installation/

Hadoop 3.0.0 alpha single node installation

Hi Hadoop observers,

This article is sharing you information about the simple installation of hadoop-3.0.0-alpha1 single node. Install hadoop and start your analysis. Follow the steps clearly to get perfect hadoop to be installed in your machine.For more http://hadoop.apache.org/docs/r3.0.0-alpha1/index.html

hadoop3x_singlenode_installation

Hadoop-3.0.0-alpha1_singlenode_installation

Hadoop download link
1. Download the Hadoop tar from below link. You can also take the link from Apache Hadoop Site. Below is the link for hadoop-3.0.0-alpha1. Current Version is hadoop-3.0.0-alpha1. Please check for current version in the hadoop Site.
wget http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz

I actually downloaded from http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz

2. Download the Java from Oracle. Please Check for Oracle Website if the link is broken. Please check for current version in the Oracle Site(click me if below link is not working)
wget http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html/jdk-8u102-linux-x64.tar.gz

3. Unpack the Comparisons
$tar -zxvf hadoop-3.0.0-alpha1.tar.gz
$tar -zxf jdk-8u102-linux-x64.tar.gz

I just unzipped them to /opt folder

4. Set JAVA PATH in Linux Environment. Edit .bashrc and add below 2 lines
$vi .bashrc
export JAVA_HOME=/home/username/jdk1.8.0_102
export PATH=$HOME/bin:$JAVA_HOME/bin:$PATH

Execute the .bashrc File to make the changes in effect immediately for the current ssh session
$source .bashrc

5. As we know, Core hadoop is made up of 5 daemons viz. NameNode(NN), SecondaryNameNode(SN), DataNode(NN),ResourceManager(RM), NodeManager(NM). We need to modify the config files in the conf folder.Below are the files required for respective daemons.

NAMENODE core-site.xml
RESOURCE MANAGER mapred-site.xml
SECONDARYNAMENODE  
DATANODE slaves
NODEMANAGER  slaves & yarn-site.xml

Ports used by Hadoop Daemons
Remote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer in a network without having to understand network details.
WEB which is denoted in the table is the WEB port number.

Hadoop Daemons RPC Port WEB – UI
NameNode 9000 9870
SecondaryNameNode   50090
DataNode 50010 50075
Resource Manager 8030 8088
Node Manager 8040 8042

$cd hadoop-hadoop-3.0.0-alpha1
Enter in to hadoop-3, now we need to configure the following configuration files. Apply the given properties for the denoted files.

$ vi etc/hadoop/core-site.xml
<!– This conf denotes the filesystem. Also which ip & port for NN to bind –>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

$ vi etc/hadoop/yarn-site.xml

<property>
<name>yarn.nodemanager.aux‐services</name>
<value>mapreduce_shuffle</value>
</property>

when run mapreduce example in hadoop3.0(the command is : bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'), I just encountered the error message:Container is running beyond memory limits. I just added below configuration in yarn-site.xml to solve this issue.

<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
    <description>Whether virtual memory limits will be enforced for containers</description>
  </property>
 <property>
   <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>4</value>
    <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
  </property>

$ vi etc/hadoop/hdfs-site.xml
<!– Directory to store NameNode metadata–>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/username/hadoop3-dir/namenode-dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/username/hadoop3-dir/datanode-dir</value>
</property>

I just mkdir namenode1 and datanode1 and configure as below:

<property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-3.0.0/namenode1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-3.0.0/datanode1</value>
</property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>localhost:9870</value>
    </property>

$ vi etc/hadoop/mapred-site.xml.template
<!– mapreduce to under YARN–>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

when run mapreduce example in hadoop3.0(the command is : bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'), I just encountered the error message:Please check if yarn.app.mapreduce.am.env,mapreduce.map.env,mapreduce.reduce.env is configured in mapred-site.xml. I just added below configuration in mapred-site.xml to solve this issue.

<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=/opt/hadoop-3.0.0</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=/opt/hadoop-3.0.0</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=/opt/hadoop-3.0.0</value>
</property>

 

$ vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/username/jdk1.8.0_102

When runing sbin/start-dfs.sh, I just got the error message: HDFS_NAMENODE_USER is not configured........ To fix this issue, I just added below configurations:

export HDFS_NAMENODE_USER=hengheng
export HDFS_DATANODE_USER=hengheng
export HDFS_JOURNALNODE_USER=hengheng
export YARN_RESOURCEMANAGER_USER=hengheng
export YARN_NODEMANAGER_USER=hengheng

$ vi etc/hadoop/mapred-env.sh
export JAVA_HOME=/home/username/jdk1.8.0_102

$ vi etc/hadoop/yarn-env.sh
export JAVA_HOME=/home/username/jdk1.8.0_102

$ vi etc/hadoop/slaves
localhost
————————————————————————————————

Passwordless Authentication. If you use default script such as start-all.sh, stop-all.sh and other similar scripts, it needs to log(using ssh) into other machines from the machine where you are running the script. Typically we run it from NN machine. While logging in, every machine will ask for passwords. If you are having 10 node cluster, you are needed to enter password minimum of 10. To avoid the same, we create passwordless authentication. First we need to generate the ssh key and copy the public key into authorized keys of the destination machine.

Below are the steps for the same

Install the Openssh-server

$ sudo apt-get install openssh-server

Generate the ssh key

(manages and converts authentication keys)

$ cd
$ ssh-keygen -t rsa
$ cd .ssh
$ cat id_rsa.pub >> authorized_keys

Setup passwordless ssh to localhost and to slaves

$ ssh localhost (Asking No Password )

if running ssh localhost still needs password, then run below commands and try again:

ssh-agent bash

ssh-add id_rsa
————————————————————————————————————–

Format Hadoop NameNode

$ cd hadoop-3.0.0-alpha1
$ bin/hdfs namenode ‐format    (Your Hadoop File System Ready)           (Hadoop 2:bin/hadoop namenode –format)

Start All Hadoop Related Services

$ sbin/start-dfs.sh (Starting Daemon’s For NN,DN and SNN )

$ sbin/start-yarn.sh        (Starting Daemon’s For RM and NM )                       (Hadoop 2:sbin/start-all.sh)

 

生成HDFS请求目录执行MapReduce任务

 

  1. $ bin/hdfs dfs -mkdir /user  
  2. $ bin/hdfs dfs -mkdir /user/hduser  



将输入文件拷贝到分布式文件系统

  1. $ bin/hdfs dfs -mkdir /user/hduser/input   
  2. $ bin/hdfs dfs -put etc/hadoop/*.xml /user/hduser/input  

运行提供的示例程序

 

  1. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep /user/hduser/input output 'dfs[a-z.]+'   

查看输出文件:
将输出文件从分布式文件系统拷贝到本地文件系统查看:

    1. $ bin/hdfs dfs -get output output  
    2. $ cat output/* 

 

posted @ 2017-12-29 17:47  starheng  阅读(541)  评论(0)    收藏  举报