Sunday, 4 May 2014

Map-Only Jobs In Hadoop


There may be reasons where Map-Only job is needed,Where there is no Reducer to execute.Here Map does all its task with its InputSplit and no job for Reducer.This can be achieved by setting  job.setNumReduceTasks()  to Zero in Configuration.

Job job = new Job(getConf(), "Map-Only Job");
job.setJarByClass(MaponlyDriver.class);

job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);

job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
/*
 * Set no of reducers to 0
 */
job.setNumReduceTasks(0);

job.setMapperClass(Mapper.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

boolean success = job.waitForCompletion(true);
return(success ? 0 : 1);

This sets Reducer task to 0 and turns off the Reducer.

job.setNumReduceTasks(0);

So the no. of output files will be equal to no. of mappers and output files will be named as part-m-00000.

And once Reducer task is set to Zero the result will be unsorted.

If we are not specifying this property in Configuration, an Identity Reducer will get executed in which the same value is simply emitted along with the incoming key and the output file will be part-r-00000.



Happy Hadooping ...

23 comments:

  1. In what types of situations might it be ideal not to run any reducers?

    ReplyDelete
    Replies
    1. For image processing and all we dont need to run a reducer and even in some preprocessor techniques in dataMining while chaining different jobs there are situation where we dont need a key(Nullwritable key) but only values. In that situation if we dont want output in a single file we can run the job without reducers.

      Delete
  2. Hi this is vignesh i am having 3 years of experience as a android developer and i am certified. i have knowledge on OOPS concepts in android but dont know indepth. After learning hadoop will be enough to get a good career in IT with good package? and i crossed hadoop training in chennai website where someone please help me to identity the syllabus covers everything or not??

    Thanks,
    vignesh

    ReplyDelete
    Replies
    1. Hi Vignesh, I have gone through the training link.It covers almost everything.When did you appeared for the training.If you have confidence you can register for Cloudera certification exam.It is quite easy but tricky.And you will get a good career too. Pls have a look at this: " http://unmeshasreeveni.blogspot.in/2015/01/cloudera-certified-hadoop-developer-ccd.html ". Let me know if you have further queries

      Delete
  3. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  4. The expansion of internet and other business intelligence leads to large volume of data. Industries are looking for talented professionals to maintain and process huge volume of data with latest tools available in the market. Taking Hadoop Training in Chennai | Big Data Training in Chennai will ensure better career prospects for talented professionals.

    ReplyDelete
  5. Thank you so much.your blog is very helpful.Could you please post any material on Web services testing too.

    Hadoop Certification in Chennai

    ReplyDelete
  6. Thank you so much.your blog is very helpful.Could you please post any material on Web services testing too.
    qlikview training in chennai

    ReplyDelete
  7. I was getting errors trying to run this program with my own input, so I changed "wiki/in" in the driver file to args[0], and it ran successfully.
    But my output seems to match with the sample output of job 1 for each iteration, instead of converting to the formats of job 2 and 3. Could this small change in code have changed so much? How?

    hadoop training in chennai

    ReplyDelete
  8. This is extremely helpful info!! Very good work. It is very interesting to learn and easy to understood. Thank you for giving information. Please let us know and more information get a post to link. Data Science Training | Online Data Science Training





    ReplyDelete
  9. Excellent and very cool idea and the subject at the top of magnificence and I am happy to this post..Interesting post! Thanks for writing it.What's wrong with this kind of post exactly? It follows your previous guideline for post length as well as clarity..
    Android Training in Chennai

    ReplyDelete
  10. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information&its very useful to me...
    Android training in chennai
    Ios training in chennai

    ReplyDelete
  11. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Hadoop Training in Chennai

    ReplyDelete
  12. Great and interesting article to read.. i Gathered more useful and new information from this article.thanks a lot for sharing this article to us..

    best institute for big data in Chennai | big data training in Velachery

    ReplyDelete
  13. Useful article.. This article helpful to me for update my hadoop skills ..

    hadoop training in chennai | big data training in chennai

    ReplyDelete
  14. Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
    mcdonaldsgutscheine.net | startlr.com | saludlimpia.com

    ReplyDelete