Monday, 4 May 2015

Hadoop Word Count Using C Language - Hadoop Streaming


Prerequisites
1. Hadoop (Example based on cloudera distribution cdh5)
2. gcc compiler

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. 

We need 2 programs mapper.c and reducer.c. You can find the code in GitHub.

1. Compile mapper.c and reducer.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o mapper.out  mapper.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o reducer.out  reducer.c
hadoop@namenode2:~/hadoopstreaming$ ls
mapper.c  mapper.out  reducer.c  reducer.out
Here you can see 2 executables mapper.out and reducer.out.

2. Place your wordcount input file in HDFS
hadoop@namenode2:~$ hadoop fs -put /home/hadoop/wc /
hadoop@namenode2:~$ hadoop fs -ls /
drwxr-xr-x   - hadoop hadoop         0 2015-05-04 15:50 /wc

3. Now we will run our C program in HDFS with the help of  Hadoop Streaming jar.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar 
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out 
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out 
-input /wc -output /wordcount-out

For Apache Hadoop
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-*streaming*.jar 
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out 
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out 
-input /wc -output /wordcount-out

Run the Job
hadoop@namenode2:~$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar -files hadoopstreamingtrail/mapper.out -mapper hadoopstreamingtrail/mapper.out -files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out -input /wc -output /wordcount-out
packageJobJar: [hadoopstreaming/mapper.out, hadoopstreaming/reducer.out] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar] /tmp/streamjob7616955264406618684.jar tmpDir=null
15/05/04 15:50:28 INFO mapred.FileInputFormat: Total input paths to process : 2
15/05/04 15:50:28 INFO mapreduce.JobSubmitter: number of splits:3
15/05/04 15:50:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426753134244_0119
15/05/04 15:50:29 INFO impl.YarnClientImpl: Submitted application application_1426753134244_0119
15/05/04 15:50:29 INFO mapreduce.Job: Running job: job_1426753134244_0119
15/05/04 15:50:37 INFO mapreduce.Job:  map 0% reduce 0%
15/05/04 15:50:46 INFO mapreduce.Job:  map 67% reduce 0%
15/05/04 15:50:47 INFO mapreduce.Job:  map 100% reduce 0%
15/05/04 15:50:53 INFO mapreduce.Job:  map 100% reduce 100%
15/05/04 15:50:55 INFO mapreduce.Job: Job job_1426753134244_0119 completed successfully

4. Lets see the results
hadoop@namenode2:~$ hadoop fs -ls  /wordcount-out
Found 2 items
-rw-r--r--   3 hadoop hadoop          0 2015-05-04 15:50 /wordcount-out/_SUCCESS
-rw-r--r--   3 hadoop hadoop      11685 2015-05-04 15:50 /wordcount-out/part-00000


Happy Hadooping

6 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Hadoop training institutes in chennai | Hadoop Training Chennai

    ReplyDelete
  2. For latest and updated Cloudera certification dumps in PDF format contact us at completeexamcollection@gmail.com.
    Refer our blog for more details http://completeexamcollection.blogspot.in/2015/04/cloudera-hadoop-certification-dumps.html

    ReplyDelete
  3. Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..

    ReplyDelete
  4. Thanks for providing this informative information…..
    You may also refer-
    http://www.s4techno.com/blog/category/hadoop/

    ReplyDelete
  5. Thanks for sharing Valuable information. Greatful Info about hadoop. Really helpful. Keep sharing........... If it possible share some more tutorials.........

    ReplyDelete
  6. Excellent .. Amazing .. I will bookmark your blog and take the feeds additionally? I’m satisfied to find so many helpful information here within the put up, we want work out extra strategies in this regard, thanks for sharing..

    Hadoop Training in Chennai

    Java Training in Chennai

    ReplyDelete