Monday, 4 May 2015

Hadoop Word Count Using C Language - Hadoop Streaming


Prerequisites
1. Hadoop (Example based on cloudera distribution cdh5)
2. gcc compiler

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. 

We need 2 programs mapper.c and reducer.c. You can find the code in GitHub.

1. Compile mapper.c and reducer.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o mapper.out  mapper.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o reducer.out  reducer.c
hadoop@namenode2:~/hadoopstreaming$ ls
mapper.c  mapper.out  reducer.c  reducer.out
Here you can see 2 executables mapper.out and reducer.out.

2. Place your wordcount input file in HDFS
hadoop@namenode2:~$ hadoop fs -put /home/hadoop/wc /
hadoop@namenode2:~$ hadoop fs -ls /
drwxr-xr-x   - hadoop hadoop         0 2015-05-04 15:50 /wc

3. Now we will run our C program in HDFS with the help of  Hadoop Streaming jar.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar 
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out 
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out 
-input /wc -output /wordcount-out

For Apache Hadoop
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-*streaming*.jar 
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out 
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out 
-input /wc -output /wordcount-out

Run the Job
hadoop@namenode2:~$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar -files hadoopstreamingtrail/mapper.out -mapper hadoopstreamingtrail/mapper.out -files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out -input /wc -output /wordcount-out
packageJobJar: [hadoopstreaming/mapper.out, hadoopstreaming/reducer.out] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar] /tmp/streamjob7616955264406618684.jar tmpDir=null
15/05/04 15:50:28 INFO mapred.FileInputFormat: Total input paths to process : 2
15/05/04 15:50:28 INFO mapreduce.JobSubmitter: number of splits:3
15/05/04 15:50:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426753134244_0119
15/05/04 15:50:29 INFO impl.YarnClientImpl: Submitted application application_1426753134244_0119
15/05/04 15:50:29 INFO mapreduce.Job: Running job: job_1426753134244_0119
15/05/04 15:50:37 INFO mapreduce.Job:  map 0% reduce 0%
15/05/04 15:50:46 INFO mapreduce.Job:  map 67% reduce 0%
15/05/04 15:50:47 INFO mapreduce.Job:  map 100% reduce 0%
15/05/04 15:50:53 INFO mapreduce.Job:  map 100% reduce 100%
15/05/04 15:50:55 INFO mapreduce.Job: Job job_1426753134244_0119 completed successfully

4. Lets see the results
hadoop@namenode2:~$ hadoop fs -ls  /wordcount-out
Found 2 items
-rw-r--r--   3 hadoop hadoop          0 2015-05-04 15:50 /wordcount-out/_SUCCESS
-rw-r--r--   3 hadoop hadoop      11685 2015-05-04 15:50 /wordcount-out/part-00000


Happy Hadooping

19 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Hadoop training institutes in chennai | Hadoop Training Chennai

    ReplyDelete
  2. For latest and updated Cloudera certification dumps in PDF format contact us at completeexamcollection@gmail.com.
    Refer our blog for more details http://completeexamcollection.blogspot.in/2015/04/cloudera-hadoop-certification-dumps.html

    ReplyDelete
  3. Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..

    ReplyDelete
  4. Thanks for providing this informative information…..
    You may also refer-
    http://www.s4techno.com/blog/category/hadoop/

    ReplyDelete
  5. Thanks for sharing Valuable information. Greatful Info about hadoop. Really helpful. Keep sharing........... If it possible share some more tutorials.........

    ReplyDelete
  6. Excellent .. Amazing .. I will bookmark your blog and take the feeds additionally? I’m satisfied to find so many helpful information here within the put up, we want work out extra strategies in this regard, thanks for sharing..

    Hadoop Training in Chennai

    Java Training in Chennai

    ReplyDelete
  7. Thanks for your marvelous posting! It is very useful and good. Come on. I want to introduce an get app installs, I try it and I feel it is so good to rank app to top in app store search results, have you ever heard it?

    ReplyDelete
  8. Thanks for sharing this blog. This very important and informative blog
    Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind.
    Very interesting and useful blog!

    best Hadoop training in gurgaon


    ReplyDelete
  9. awesome post presented by you..your writing style is fabulous and keep update with your blogs
    Big data hadoop online Course

    ReplyDelete
  10. AWS Training in Bangalore - Live Online & Classroom
    myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

    IOT Training in Bangalore - Live Online & Classroom
    IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.

    ReplyDelete
  11. myTectra Placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance

    ReplyDelete
  12. Hey, Wow all the posts are very informative for the people who visit this site. Good work! We also have a Website. Please feel free to visit our site. Thank you for sharing.Well written article Thank You Sharing with Us pmp training in chennai | pmp training institute in chennai | pmp training centers in chennai| pmp training in velachery | pmp training near me | pmp training courses online

    ReplyDelete
  13. This blog is full of Innovative ideas.surely i will look into this insight.please add more information's like this soon.
    AWS training courses near me
    AWS training in Chennai
    AWS Training Institutes in Vadapalani
    AWS training courses near me

    ReplyDelete
  14. such an effective blog you are posted.this blog is full of innovative ideas and i really like your information's. i expect more ideas from your site please add more details in future.
    Salesforce Training in Perambur
    Salesforce Training in Mogappair
    Salesforce Training in Ashok Nagar
    Salesforce Training in Nungambakkam

    ReplyDelete