Prerequisites
1. Hadoop (Example based on cloudera distribution cdh5)
2. gcc compiler
Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer.
We need 2 programs mapper.c and reducer.c. You can find the code in GitHub.
1. Compile mapper.c and reducer.c
hadoop@namenode2:~/hadoopstreaming$ gcc -o mapper.out mapper.c hadoop@namenode2:~/hadoopstreaming$ gcc -o reducer.out reducer.c hadoop@namenode2:~/hadoopstreaming$ ls mapper.c mapper.out reducer.c reducer.out
Here you can see 2 executables mapper.out and reducer.out.
2. Place your wordcount input file in HDFS
hadoop@namenode2:~$ hadoop fs -put /home/hadoop/wc / hadoop@namenode2:~$ hadoop fs -ls / drwxr-xr-x - hadoop hadoop 0 2015-05-04 15:50 /wc
3. Now we will run our C program in HDFS with the help of Hadoop Streaming jar.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out
-input /wc -output /wordcount-out
For Apache Hadoop
Run the Job
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-*streaming*.jar
-files hadoopstreaming/mapper.out -mapper hadoopstreaming/mapper.out
-files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out
-input /wc -output /wordcount-out
Run the Job
hadoop@namenode2:~$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar -files hadoopstreamingtrail/mapper.out -mapper hadoopstreamingtrail/mapper.out -files hadoopstreaming/reducer.out -reducer hadoopstreaming/reducer.out -input /wc -output /wordcount-out packageJobJar: [hadoopstreaming/mapper.out, hadoopstreaming/reducer.out] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.1.jar] /tmp/streamjob7616955264406618684.jar tmpDir=null 15/05/04 15:50:28 INFO mapred.FileInputFormat: Total input paths to process : 2 15/05/04 15:50:28 INFO mapreduce.JobSubmitter: number of splits:3 15/05/04 15:50:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426753134244_0119 15/05/04 15:50:29 INFO impl.YarnClientImpl: Submitted application application_1426753134244_0119 15/05/04 15:50:29 INFO mapreduce.Job: Running job: job_1426753134244_0119 15/05/04 15:50:37 INFO mapreduce.Job: map 0% reduce 0% 15/05/04 15:50:46 INFO mapreduce.Job: map 67% reduce 0% 15/05/04 15:50:47 INFO mapreduce.Job: map 100% reduce 0% 15/05/04 15:50:53 INFO mapreduce.Job: map 100% reduce 100% 15/05/04 15:50:55 INFO mapreduce.Job: Job job_1426753134244_0119 completed successfully
4. Lets see the results
hadoop@namenode2:~$ hadoop fs -ls /wordcount-out Found 2 items -rw-r--r-- 3 hadoop hadoop 0 2015-05-04 15:50 /wordcount-out/_SUCCESS -rw-r--r-- 3 hadoop hadoop 11685 2015-05-04 15:50 /wordcount-out/part-00000