Monday, 4 May 2015

Hadoop Word Count Using C Language - Hadoop Streaming

1. Hadoop (Example based on cloudera distribution cdh5)
2. gcc compiler

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. 

We need 2 programs mapper.c and reducer.c. You can find the code in GitHub.

1. Compile mapper.c and reducer.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o mapper.out  mapper.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o reducer.out  reducer.c
hadoop@namenode2:~/hadoopstreaming$ ls
mapper.c  mapper.out  reducer.c  reducer.out
Here you can see 2 executables mapper.out and reducer.out.

2. Place your wordcount input file in HDFS
hadoop@namenode2:~$ hadoop fs -put /home/hadoop/wc /
hadoop@namenode2:~$ hadoop fs -ls /
drwxr-xr-x   - hadoop hadoop         0 2015-05-04 15:50 /wc

3. Now we will run our C program in HDFS with the help of  Hadoop Streaming jar.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar 
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out 
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out 
-input /wc -output /wordcount-out

For Apache Hadoop
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-*streaming*.jar 
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out 
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out 
-input /wc -output /wordcount-out

Run the Job
hadoop@namenode2:~$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar -files hadoopstreamingtrail/mapper.out -mapper hadoopstreamingtrail/mapper.out -files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out -input /wc -output /wordcount-out
packageJobJar: [hadoopstreaming/mapper.out, hadoopstreaming/reducer.out] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar] /tmp/streamjob7616955264406618684.jar tmpDir=null
15/05/04 15:50:28 INFO mapred.FileInputFormat: Total input paths to process : 2
15/05/04 15:50:28 INFO mapreduce.JobSubmitter: number of splits:3
15/05/04 15:50:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426753134244_0119
15/05/04 15:50:29 INFO impl.YarnClientImpl: Submitted application application_1426753134244_0119
15/05/04 15:50:29 INFO mapreduce.Job: Running job: job_1426753134244_0119
15/05/04 15:50:37 INFO mapreduce.Job:  map 0% reduce 0%
15/05/04 15:50:46 INFO mapreduce.Job:  map 67% reduce 0%
15/05/04 15:50:47 INFO mapreduce.Job:  map 100% reduce 0%
15/05/04 15:50:53 INFO mapreduce.Job:  map 100% reduce 100%
15/05/04 15:50:55 INFO mapreduce.Job: Job job_1426753134244_0119 completed successfully

4. Lets see the results
hadoop@namenode2:~$ hadoop fs -ls  /wordcount-out
Found 2 items
-rw-r--r--   3 hadoop hadoop          0 2015-05-04 15:50 /wordcount-out/_SUCCESS
-rw-r--r--   3 hadoop hadoop      11685 2015-05-04 15:50 /wordcount-out/part-00000

Happy Hadooping


  1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Hadoop training institutes in chennai | Hadoop Training Chennai

  2. For latest and updated Cloudera certification dumps in PDF format contact us at
    Refer our blog for more details

  3. Thanks for sharing this article.. You may also refer

  4. Thanks for providing this informative information…..
    You may also refer-

  5. Thanks for sharing Valuable information. Greatful Info about hadoop. Really helpful. Keep sharing........... If it possible share some more tutorials.........

  6. Excellent .. Amazing .. I will bookmark your blog and take the feeds additionally? I’m satisfied to find so many helpful information here within the put up, we want work out extra strategies in this regard, thanks for sharing..

    Hadoop Training in Chennai

    Java Training in Chennai

  7. Thanks for your marvelous posting! It is very useful and good. Come on. I want to introduce an get app installs, I try it and I feel it is so good to rank app to top in app store search results, have you ever heard it?

  8. Thanks for sharing this blog. This very important and informative blog
    Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind.
    Very interesting and useful blog!

    best Hadoop training in gurgaon

  9. awesome post presented by you..your writing style is fabulous and keep update with your blogs
    Big data hadoop online Course

  10. AWS Training in Bangalore - Live Online & Classroom
    myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

    IOT Training in Bangalore - Live Online & Classroom
    IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.