How to install hadoop 2.7.3 single node cluster on ubuntu 16.04

Posted on Updated on

In this post, we are installing Hadoop-2.7.3 on Ubuntu-16.04 OS. Followings are step by step process to install hadoop-2.7.3 as a single node cluster.

Before installing or downloading anything, It is always better to update using following command:

$ sudo apt-get update

Step 1: Install Java

$ sudo apt-get install default-jdk

We can check JAVA is properly installed or not using following command:

$ java –version

javaverion

Step 2: Add dedicated hadoop user

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

NOTE: don’t write password or any things here, Just press ‘y’ when it ask “Is the information correct?[Y|n]”

adduser

$ sudo adduser hduser sudo

Step 3: Install SSH

$ sudo apt-get install ssh

Step-4: Passwordless entry for localhost using SSH

$ su hduser

hduserlogin

Now we are logined in in ‘hduser’.

$ ssh-keygen -t rsa
NOTE: Leave file name and other things blank.
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ ssh localhost

sshlogin

Use above command and make sure it is passwordless login. Once we are logined in localhost, exit from this session using following command

$ exit

Step 5: Install hadoop-2.7.3

$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
$ tar xvzf hadoop-2.7.3.tar.gz
$ sudo mkdir -p /usr/local/hadoop
$ cd hadoop-2.7.3/
$ sudo mv * /usr/local/hadoop
$ sudo chown -R hduser:hadoop /usr/local/hadoop

movehadoop

Step 6: Setup Configuration Files

The following files should to be modified to complete the Hadoop setup:

6.1 ~/.bashrc

6.2 hadoop-env.sh

6.3 core-site.xml

6.4 mapred-site.xml

6.5 hdfs-site.xml

6.6 yarn-site.xml

6.1 ~/.bashrc

First, we need to find the path where JAVA is installed in our system

$ update-alternatives --config java

javahome1

Now we append at the end of ~/.bashrc:

It may possible vi will not work properly. If it does install vim

$ sudo apt-get install vim

Open bashrc file using command:

$ vim ~/.bashrc

Append following at the end. (Follow this process -> First append below content at the end by pressing ‘INSERT’ or ‘i’ key from keyboard-> Press ‘ecs’ -> Press ‘:’ (colon) -> Press ‘wq’->Press ‘Enter’ Key)

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
#HADOOP VARIABLES END

Update .bashrc file to apply changes

$ source ~/.bashrc

6.2 hadoop-env.sh

We need to modify JAVA_HOME path in hadoop-env.sh to ensure that the value of JAVA_HOME variable will be available to Hadoop whenever it is started up.

$ vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Search JAVA_HOME variable in file. It may first variable in file. Do Change it by following:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

javahome2

6.3 core-site.xml

core-site.xml file has configuration properties which are requires when Hadoop is started up.

$ sudo mkdir -p /app/hadoop/tmp

$ sudo chown hduser:hadoop /app/hadoop/tmp

Open the file and enter the following in between the tag:

$ vim /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>

coresite

6.4 mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains /usr/local/hadoop/etc/hadoop/mapred-site.xml.template file which has to be renamed/copied with the name mapred-site.xml:

$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

The /usr/local/hadoop/etc/hadoop/mapred-site.xml file is used to specify which framework is being used for MapReduce.

We need to enter the following content in between the tag:

$ vim /usr/local/hadoop/etc/hadoop/mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

mapperd

6.5 hdfs-site.xml

We need to configure hdfs-site.xml for each host in the cluster which specifies two directories:

  1. Name node
  2. Data node

These can be done using the following commands:

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop_store

Open hdfs-site.xml file and enter the following content in between the tag:

$ vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>  

hdfs-site

6.6 yarn-site.xml

Open hdfs-site.xml file and enter the following content in between the tag:

$ vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
 <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

yarnxml

Step7: Format hadoop file system

$  hadoop namenode –format

namenode-format

Step 8: Start Hadoop Daemons

$  cd /usr/local/hadoop/sbin

$ start-all.sh

startall

We can check all daemons are properly started using following command:

$ jps

stratalljps

Step 9: Stop hadoop Daemons

$ stop-all.sh

stopall

Congratulation..!! We have installed hadoop successfully.. 🙂

Hadoop has Web Interfaces too. (Copy and paste following links in your browser)

NameNode daemon: http://localhost:50070/

namenodeweb

mapreduce: http://localhost:8042/

mapreduce-web

SecondaryNameNode:: http://localhost:50090/status.html

ssn-web

Resource Manager: http://localhost:8088/

res-web

Now, we run mapreduce job on our newly created hadoop single node cluster setup.

hduser@parthgoel:/usr/local/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 2 5

piprogram

You may like this post:

How to Create Word Count MapReduce Application using Eclipse

NOTE: Whenever we login in Ubuntu, make sure you are in ‘hduser’.

If you are not in hduser, use below command to login in ‘hduser’

$ su hduser

hduserlogin

Now you can use all hadoop commands here.

Enjoy..!! Happy Hadooping..!! 🙂

NOTE/error: Once we run any mapreduce program and then you format the namenode,  you start all services, it may possible that data node may not get started.

Solution:

Your datanode is not starting because You formatted the namenode again. That means you have cleared the metadata from namenode. Now the files which you have stored for running any mapreduce job are still in the datanode and datanode has no idea where to send the block reports since you formatted the namenode so it will not start.

$ sudo rm -r /usr/local/hadoop_store/hdfs/datanode/current
$ hadoop namenode -format
$ start-all.sh
$ jps

Again Congratulations…!!! Now we can run any hadoop example on this single node cluster. Just make sure you are login in ‘hduser’ because our hadoop setup is available on this dedicated ‘hduser’.

36 thoughts on “How to install hadoop 2.7.3 single node cluster on ubuntu 16.04

    Nikhil Yadav said:
    January 4, 2017 at 3:11 pm

    Very nice step by step tutorial.
    Thanks

    Like

    hassan said:
    February 9, 2017 at 12:40 pm

    i done all steps till finished setup done thanks
    but http://localhost:50070/
    not working can tell me why? and how it to work

    Like

      parthgoelblog responded:
      February 9, 2017 at 1:59 pm

      Make sure-Namenode is working properly.

      Like

      Gangadhar SIraveni said:
      June 8, 2017 at 9:33 am

      Even I too got the same issue. What is the problem?

      Like

      Anonymous said:
      February 1, 2018 at 10:36 am

      You should not stop demons.Make sure that after start-all.sh command execution then go to browser and the type http://localhost:50070/ then you will get it.

      Like

    Rakesh Kumar said:
    February 14, 2017 at 8:47 am

    HI , I am installing Hadoop 2.7.3. I follow the exact steps mentioned. My secondary name node is not getting stared. Secondary name Node is not getting started.When I am running JPS command ,it doesn’t shows secondary name node.
    Plse help me regarding this.

    Like

    […] NOTE: In this post, you have learned WordCount Application in eclipse using MapReduce. In the next post, I will guide you how to export jar files from eclipse and run on Hadoop. (If you don’t know, how to configure Hadoop click here) […]

    Like

    Anonymous said:
    March 14, 2017 at 6:52 pm

    Excellent! very detailed and clear instructions. Thanks.

    Like

    Shekhar Reddy said:
    March 17, 2017 at 8:01 am

    Hello Parth, is there anything to be added to this to get it done on Ubuntu 14.04??

    Like

      parthgoelblog responded:
      March 17, 2017 at 8:10 am

      Hi Shekhar, I have not tried on Ubuntu 14.04. We use all general commands related to hadoop, so i don’t think so OS version will be an issue.

      Liked by 2 people

    randhir65 said:
    March 20, 2017 at 2:16 pm

    hi shekhar,I have done all the steps mentioned above but my namenode is not running and i unable to perform hadoop fs or dfs -ls command ,its showing error that it is unable to load class file.

    thanks.

    Like

      parthgoelblog responded:
      March 20, 2017 at 2:33 pm

      Make sure to create namenode folder and change owenership for hduser and provide proper path of namenode in hdfs-site.xml

      Like

        Abhijit Dey said:
        April 11, 2017 at 4:37 pm

        Namenode folder is created and also has ownership for hduser. Path of namenode is also provided in hdfs-site.xml but namenode is still not starting when I execute start-all.sh

        Like

        Abhijit Dey said:
        April 13, 2017 at 2:53 pm

        It worked very well…namenode is now starting…problem fixed….thanks.

        Like

    btcwp2017 said:
    March 29, 2017 at 11:46 am

    Awesome!

    Like

      parthgoelblog responded:
      March 29, 2017 at 2:19 pm

      Thanks

      Like

        Rob said:
        April 13, 2017 at 10:35 am

        This is really good but I find that all the folders in hadoop are like this now there is no simple hadoop/bin file anymore does this mean I have to update all the paths in the .bashrc file and go to each folder like hdfs and common to execute the various commands now ?

        Any help v appreciated can’t find any references on stack or anywhere else about this

        Thanks

        Rob

        hduser@GGH-D-HADAPP-L2:/usr/local/hadoop$ ls -la
        total 204
        drwxr-xr-x 16 hduser hadoop 4096 Mar 2 17:47 .
        drwxrwxrwx 12 root root 4096 Apr 11 16:31 ..
        -rw-r–r– 1 hduser hadoop 13050 Aug 12 2016 BUILDING.txt
        drwxr-xr-x 3 hduser hadoop 4096 Mar 1 16:59 dev-support
        drwxr-xr-x 3 hduser hadoop 4096 Mar 1 16:59 hadoop-assemblies
        drwxr-xr-x 2 hduser hadoop 4096 Mar 1 16:59 hadoop-build-tools
        drwxr-xr-x 2 hduser hadoop 4096 Mar 1 16:59 hadoop-client
        drwxr-xr-x 10 hduser hadoop 4096 Mar 1 16:59 hadoop-common-project
        drwxr-xr-x 2 hduser hadoop 4096 Mar 1 16:59 hadoop-dist
        drwxr-xr-x 6 hduser hadoop 4096 Mar 1 16:59 hadoop-hdfs-project
        drwxr-xr-x 9 hduser hadoop 4096 Mar 1 16:59 hadoop-mapreduce-project
        drwxr-xr-x 3 hduser hadoop 4096 Mar 1 16:59 hadoop-maven-plugins
        drwxr-xr-x 2 hduser hadoop 4096 Mar 1 16:59 hadoop-minicluster
        drwxr-xr-x 3 hduser hadoop 4096 Mar 1 16:59 hadoop-project
        drwxr-xr-x 2 hduser hadoop 4096 Mar 1 16:59 hadoop-project-dist
        drwxr-xr-x 16 hduser hadoop 4096 Mar 1 16:59 hadoop-tools
        drwxr-xr-x 3 hduser hadoop 4096 Mar 1 16:59 hadoop-yarn-project
        -rw-r–r– 1 hduser hadoop 84854 Aug 12 2016 LICENSE.txt
        -rw-r–r– 1 hduser hadoop 14978 Aug 12 2016 NOTICE.txt
        -rw-r–r– 1 hduser hadoop 18993 Aug 12 2016 pom.xml
        -rw-r–r– 1 hduser hadoop 1366 Aug 12 2016 README.txt
        hduser@GGH-D-HADAPP-L2:/usr/local/hadoop$
        .

        Like

        Rob said:
        April 13, 2017 at 2:32 pm

        Sorry ignore me being daft I forgot I had to do the build off the source this is a great tutorial

        Like

    NILESH BHOSALE said:
    April 10, 2017 at 9:33 am

    For hadoop 2.7.3 then command “hadoop namenode –format” is depreciated, and we should use ” hdfs namenode –format”. thanks …great work…..

    Liked by 2 people

      Anonymous said:
      July 29, 2017 at 6:19 am

      ok

      Like

    Srikanta Patra said:
    May 16, 2017 at 12:24 pm

    Thanks for the excellent step by step guide. It works great.

    Like

    Anonymous said:
    June 16, 2017 at 8:26 am

    Thanks very much for the post. Really useful for novice person.

    Like

    Sheshant Shazzy said:
    June 17, 2017 at 10:50 am

    namenode not started
    when I run jps, this is my output

    hduser@sheshant-Inspiron-3521:/usr/local/hadoop/sbin$ jps
    18842 ResourceManager
    18467 DataNode
    18688 SecondaryNameNode
    19076 Jps
    18988 NodeManager
    hduser@sheshant-Inspiron-3521:/usr/local/hadoop/sbin$

    Like

    Anonymous said:
    August 2, 2017 at 6:44 am

    yeah it is very useful for the installation process

    Like

    ram said:
    August 2, 2017 at 6:45 am

    it is very useful for the installlation process…

    Like

    alanldawkins said:
    August 31, 2017 at 7:52 am

    IMPORTANT:
    “NILESH BHOSALE said:”
    For hadoop 2.7.3 then command “hadoop namenode –format” is depreciated, and we should use ” hdfs namenode –format”. thanks …great work…..

    Like

    Anonymous said:
    September 20, 2017 at 8:35 am

    very nice…

    Like

    etov said:
    September 25, 2017 at 7:58 am

    Some hints and variations I had to applied to make things work / work better for me:
    – after installing java, had to open a new session for it to take effect (maybe could have reloaded .bashrc or something)
    – I did set a password for hduser, and having root access, it think it may be a good idea
    – after installing ssh, I had to restart it to make it work: sudo service ssh –full-restart
    – downloading hadoop .tag.gz: provided server seems to host only the latest version, I downloaded http://www-us.apache.org/dist/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
    – when extracting, use tar xzf (without the v) to suppress extensive output
    – for formatting, use hdfs namenode –format
    – in order to get the secondary name-node running, I had to run format again after start-all, then rerun the start-all script.

    Like

    Rajesh Kumar said:
    October 9, 2017 at 1:28 pm

    Hi. I am unable to run command – hadoop namenode –format and getting the below error:

    raj@raj-VirtualBox:~/hadoop-3.0.0-beta1/bin$ ./hadoop namenode -format
    WARNING: Use of this script to execute namenode is deprecated.
    WARNING: Attempting to execute replacement “hdfs namenode” instead.

    WARNING: /usr/local/hadoop/logs does not exist. Creating.
    mkdir: cannot create directory ‘/usr/local/hadoop/logs’: Permission denied
    ERROR: Unable to create /usr/local/hadoop/logs. Aborting.

    Please assist

    Like

    saivignesh said:
    October 21, 2017 at 9:43 am

    thank you so much sir. you helped me a lot.

    Like

    Anonymous said:
    November 18, 2017 at 2:34 pm

    A big thank you…!!!

    Like

    Raj said:
    January 3, 2018 at 4:06 am

    I am not expert this is the first time I am using Hadoop. Following worked for me…

    Found a Solution for secondarynamenodes not starting. This worked like a charm for me.

    Problem is hdfs getconf -secondarynamenodes is not not able to get node result as 0.0.0.0.

    Here is the output that I see.
    /usr/local/hadoop/bin$ . hdfs getconf -secondaryNameNodes
    2018-01-02 21:20:11,029 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    0.0.0.0

    Instead of geting 0.0.0.0 this function is returning whole error message as string. so not able to start secondarynamenodes properly.

    Here is the solution that I found.

    Made changes to .bashrc in hduser home directory.

    Old entry:
    export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib/”

    New Entry:
    export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib/native”

    This helped to fix util.nativecodeloader error.

    I still have http://localhost:50070/ and http://localhost:50090/status.html not working.

    Like

    sutti said:
    January 8, 2018 at 7:06 pm

    WARNING: DEFAULT_LIBEXEC_DIR ignored. It has been replaced by HADOOP_DEFAULT_LIBEXEC_DIR.
    ERROR: Invalid HADOOP_YARN_HOME
    im getting this error when i am doing the “hadoop namenode –format” command.
    what should i do???

    Like

Leave a comment