How to Create Word Count MapReduce Application using Eclipse

Posted on Updated on

In this post, you will create WordCount Application using MapReduce Programming Model. Workflow diagram of WordCount Application is given below.

workflow

Steps to run WordCount Application in Eclipse

step-1

Download eclipse if you don’t have.  64 bit Linux os   32 bit Linux os

step-2

Open Eclipse and Make Java Project.

In eclipse Click on File menu-> new -> Java Project. Write there your project name. Here is WordCount. Make sure Java version must be 1.6 and above. Click on Finish.

m1

Step-3

Make Java class File and write a code.

Click on WordCount project. There will be ‘src’ folder. Right click on ‘src’ folder -> New -> Class. Write Class file name. Here is Wordcount. Click on Finish.

m2

Copy and Paste below code in Wordcount.java. Save it.

You will get lots of error but don’t panic. It is because of requirement of external library of hadoop which is required to run mapreduce program.

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Wordcount {

 public static class TokenizerMapper
 extends Mapper<Object, Text, Text, IntWritable>{

 private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();

 public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
 StringTokenizer itr = new StringTokenizer(value.toString());
 while (itr.hasMoreTokens()) {
 word.set(itr.nextToken());
 context.write(word, one);
 }
 }
 }

 public static class IntSumReducer
 extends Reducer<Text,IntWritable,Text,IntWritable> {
 private IntWritable result = new IntWritable();

 public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
 int sum = 0;
 for (IntWritable val : values) {
 sum += val.get();
 }
 result.set(sum);
 context.write(key, result);
 }
 }

 public static void main(String[] args) throws Exception {
 Configuration conf = new Configuration();
 Job job = Job.getInstance(conf, "word count");
 job.setJarByClass(WordCount.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(IntSumReducer.class);
 job.setReducerClass(IntSumReducer.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 System.exit(job.waitForCompletion(true) ? 0 : 1);
 }
}

Step-4

Add external libraries from hadoop.

Right click on WordCount Project -> Build Path -> Configure Build Path -> Click on Libraries -> click on  ‘Add External Jars..’ button.

m3

Select below files from hadoop folder.

In my case:-  /usr/local/hadoop/share/hadoop

m4

4.1 Add jar files from /usr/local/hadoop/share/hadoop/common folder.

m5

4.2 Add jar files from /usr/local/hadoop/share/hadoop/common/lib folder.

m6

4.3 Add jar files from /usr/local/hadoop/share/hadoop/mapreduce folder (Don’t need to add hadoop-mapreduce-examples-2.7.3.jar)

m7

4.4 Add jar files from /usr/local/hadoop/share/hadoop/yarn folder.

m8

Click on ok. Now you can see, all error in code is gone. 🙂

Step 5

Running Mapreduce Code.

5.1 Make input file for WordCount Project.

Right Click on WordCount project-> new -> File. Write File name and click on ok. You can copy and paste below contains into your input file.

car bus bike
bike bus aeroplane
truck car bus

m10

5.2 Right click on WordCount Project -> click on Run As. -> click on Run Configuration…

Make new configuration by clicking on ‘new launch configuration’. Set Configuration Name, Project Name and Class file name.

m9

5.3 click on Arguments tab. Write there input and output path. 

If you have save above file, write below paths into Program Arguments of arguments tab and click on run.

input out

m11

Output of WordCount Application and output logs in console.

Refresh WordCount Project. Right Click on project -> click on Refresh. You can find ‘out’ directory in project explorer. Open ‘out’ directory. There will be ‘part-r-00000’ file. Double click to open it.

m13

NOTE: In this post, you have learned WordCount Application in eclipse using MapReduce. In the next post, I will guide you how to export jar files from eclipse and run on Hadoop. (If you don’t know, how to configure Hadoop click here)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s