In this site you can see one way to count number of lines in a file.
They are emitting count as one for each record in each map. So if 1 map holds 10,000 lines 10,000 values will be passed to reducer.If more than one mapper that many read-writes will happen.
Lets reduce the intermediate writes.
Below is an optimized way to count no of lines in a file/dir
Changes are done in
1. Mapper
Instead of emitting 'one' for each record, we increment line count in map and emit them in cleanup() phase.
public class LineCntMapper extends Mapper<LongWritable, Text, Text, IntWritable> { Text keyEmit = new Text("Total Lines"); IntWritable valEmit = new IntWritable(); int partialSum = 0; public void map(LongWritable key, Text value, Context context) { partialSum++; } public void cleanup(Context context) { valEmit.set(partialSum); try { context.write(keyEmit, valEmit); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(0); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(0); } } }
So if we have 5 map tasks we will only emit 5 intermediate key-value pair.
2. Driver
In Driver we will include a combiner also
job.setMapperClass(LineCntMapper.class); job.setCombinerClass(LineCntReducer.class); job.setReducerClass(LineCntReducer.class);
Combiner doesnt do nothing more than Reducer. we can use reducer as combiner itself.
Reducer doesnt need any change.
If you run this code you will get the results faster than the previous mentioned code in this site .
Working code is here
Happy Hadooping........
For latest and updated Cloudera certification dumps in PDF format contact us at completeexamcollection@gmail.com.
ReplyDeleteRefer our blog for more details http://completeexamcollection.blogspot.in/2015/04/cloudera-hadoop-certification-dumps.html
Hi Sreeveni,
ReplyDeletecan you please share the list of interview questions or questions and answers kind of stuff if you have. my mail id is guresh.amara at gmail.com
Will share
DeleteYour blog is more informative..Its very interesting to know about new
ReplyDeletethings..Keep on sharing.
SEO training in chennai
This article describes the Hadoop Software, All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. This post gives great idea on Hadoop Certification for beginners. Also find best Hadoop Online Training in your locality at StaygreenAcademy.com
ReplyDeleteSuch a great articles in my carrier, It's wonderful commands like easiest understand words of knowledge in information's.
ReplyDeleteAws Training in Chennai
Thank you
Delete
ReplyDeleteتتعدد الشركات التي تقدم خدمات ىالتنظيف لاكن لا يمكن ان تكون كلها في نفس مستوي الجوده فان كنت من الباحثين عن جودة الشركه قبل اي شئ اخر فانصحة بزيارة احدي تلك الصفحات
شركة تنظيف مساجد بالرياض
شركة تنظيف خزانات بالخرج
شركة تنظيف بالخرج
والتي تقدم افضل خدمات التنظيف بالمنزل باعلي مستوي من الكفائه
شركة تنظيف منازل بالطائف
It's like you read my mind! You seem to know a lot about this, like you wrote the book in it or something. I think that you can do with some pics to drive the message home a little bit, but instead of that, this is fantastic blog. A great read. I will definitely be back.
ReplyDeleteOffice Interiors in Chennai
Interior Decorators in Chennai
The blog gave me idea about components of Python.They explained in effective manner.Thanks for sharing it. Keep sharing more blogs.
ReplyDeleteHadoop Training in Chennai
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking Big Data Hadoop Online Course Bangalore
ReplyDeleteReally very nice blog information for this one and more technical skills are improve,i like that kind of post.
ReplyDeleteMicrosoft azure training in Bangalore
Power bi training in Chennai
Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article. thank you for sharing such a great blog with us.
ReplyDeletebest rpa training in bangalore
rpa training in pune | rpa course in bangalore
RPA training in bangalore
rpa training in chennai
Read all the information that i've given in above article. It'll give u the whole idea about it.
ReplyDeletepython Online training in chennai
python Online training in bangalore
python interview question and answers
This is such a great resource that you are providing and you give it away for free. free word counter
ReplyDelete