Monday, 29 September 2014

Comments On CCD-410 Sample Dumps


What do you think of these three questions mentioned in site CCD-410 Practice Exam Questions Demo 100% Pass-Guaranteed or Your Money Back!!!


QUESTION: 3

What happens in a MapReduce job when you set the number of reducers to one?

A. A single reducer gathers and processes all the output from all the mappers. The output is written in as many separate files as there are mappers.
B. A single reducer gathers and processes all the output from all the mappers. The output is written to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.

Answer:A

QUESTION: 4

In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?

A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.

Answer:A

QUESTION: 5

You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?

A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive

Answer :B


Correct Answers

QUESTION 3: Answer B
QUESTION 4: Answer D
QUESTION 5: Answer C

See reviews on correct answer

Can we change the default key-value input seperator in Hadoop MapReduce

Previous Post


Yes, We can change it using "key.value.separator.in.input.line" property in Driver class.

There may be cases where we need to take each line with specific delimiter.


eg:
one	first line
two	second line
Inorder to read a file like this we will be using KeyValueTextInputFormat.class as it takes the line with TAB  as default seperator.

So while printing each line in map() , the key will be "one" and value will be "first line".


What if we need other delimiters instead of TAB delimiter


eg:
one,first line
two,second line

Here also we need to get key as "one" and value as "first line".

It is possible by adding an extra configuration along with KeyValueTextInputFormat to change the default seperator.

//New API
Configuration conf = new Configuration();
conf.set("key.value.separator.in.input.line", ","); 
Job job = new Job(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);

Monday, 22 September 2014

Matrix Multiplication in Hadoop MapReduce

Matrix multiplication is common and important algebraic operation.While coming to MapReduce Paradigm, If we give 2 matrices for multiplication, once we get the data from HDFS and process them in map() we only get one input split.We cannot make sure which row is that. So butter way to do that is to order your data with row and column index.


Matrix multiplication is applied to file of format:MatrixName,row,col,element

A,0,1,1.0
A,0,2,2.0
A,0,3,3.0
A,0,4,4.0
B,3,1,10.0
B,3,2,11.0
A,1,0,5.0
A,1,1,6.0
A,1,2,7.0
A,1,3,8.0
A,1,4,9.0
B,0,1,1.0
B,0,2,2.0
B,1,0,3.0
B,1,1,4.0
B,1,2,5.0
B,2,0,6.0
B,2,1,7.0
B,2,2,8.0
B,3,0,9.0
B,4,0,12.0
B,4,1,13.0
B,4,2,14.0

Find code : GitHub




Monday, 8 September 2014

How To Set Counters In Hadoop MapReduce

Counters are a useful channel for gathering statistics about the job: for quality control or for application level-statistics.Lets see an example where Counters count the no of keys processed in reducer.


3 key points to set

1. Define counter in Driver class

public class CounterDriver extends Configured implements Tool{
 long c = 0;
 static enum UpdateCount{
  CNT
 }
 public static void main(String[] args) throws Exception{
     Configuration conf = new Configuration();
     int res = ToolRunner.run(conf, new CounterDriver(), args);
     System.exit(res);
  }
 public int run(String[] args) throws Exception {

2. Increment or set counter in Reducer

public class CntReducer extends Reducer<IntWritable, Text, IntWritable, Text>{
 public void reduce(IntWritable key,Iterable<Text> values,Context context)  {
      //do something
      context.getCounter(UpdateCount.CNT).increment(1);
 }
}

3. Get counter in Driver class 

public class CounterDriver extends Configured implements Tool{
 long c = 0;
 static enum UpdateCount{
  CNT
 }
 public static void main(String[] args) throws Exception{
     Configuration conf = new Configuration();
     int res = ToolRunner.run(conf, new CounterDriver(), args);
     System.exit(res);
  }
 public int run(String[] args) throws Exception {
 .
 .
 .
 job.setInputFormatClass(TextInputFormat.class);
 job.setOutputFormatClass(TextOutputFormat.class);
 FileInputFormat.setInputPaths(job,in );
 FileOutputFormat.setOutputPath(job, out);
 job.waitForCompletion(true);
 c = job.getCounters().findCounter(UpdateCount.CNT).getValue();
 }
}

Full code :  GitHub

You will be able to see the counters in console also.