Saturday, 17 May 2014

Count Frequency Of Values In A Column Using Apache Pig


There may be situations to count the occurence of a value in a field.
Let this be the sample input bag.


user_id   course_name user_name
1           Social      Anju
2           Maths       Malu
1           English     Anju
1           Maths       Anju

Say we need to calculate no of occurence of each user_name.
Anju 3
Malu 1

Inorder to achieve this - COUNT Built In Function can be used.


COUNT Function in Apache Pig


COUNT function  compute the number of elements in a bag.
To group count a preceding GROUP BY statement and for global counts GROUP ALL statement is required.

The basic idea to do the above example is to group by user_name and count the tuples in the bag.


--count.pig

 userAlias = LOAD '/home/sreeveni/myfiles/pig/count.txt' as 
             (user_id:long,course_name:chararray,user_name:chararray);
 groupedByUser = group userAlias by user_name;
 counted = FOREACH groupedByUser GENERATE group as user_name,COUNT(userAlias) as cnt;
 result = FOREACH counted GENERATE user_name, cnt;
 store result into '/home/sreeveni/myfiles/pig/OUT/count';

The COUNT function ignores NULLs, that is tuple in the bag will not be counted if the first field in this tuple is NULL.
COUNT_STAR can be used to count fields including NULL values.




6 comments:

  1. For latest and updated Cloudera certification dumps in PDF format contact us at completeexamcollection@gmail.com.
    Refer our blog for more details http://completeexamcollection.blogspot.in/2015/04/cloudera-hadoop-certification-dumps.html

    ReplyDelete

  2. Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop tutorial. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.:https://www.youtube.com/watch?v=1jMR4cHBwZE

    ReplyDelete
  3. Top Trending Technologies of 2019. Watch here:https://www.youtube.com/watch?v=-y5Z2fmnp-o

    ReplyDelete
  4. Very Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&

    ReplyDelete
  5. شركة السالم لخدمات التنظيف ومكافحة الحشرات ونقل العفش مع الفك والتركيب بالطائف يتم العمل لدينا من خلال فريق وعماله فنيه مدربه في غاية الاتقان ومن خلال احدث المعدات والاداوات مع شركة السالم فانت دائما في راحة تامه وامن مستمر
    شركة تنظيف بالطائف
    شركة تنظيف مجالس بالطائف
    شركة تنظيف خزانات بالطائف
    شركة مكافحة حشرات بالطائف
    شركة رش مبيدات بالطائف
    شركة عزل اسطح بالطائف
    شركة تسليك مجاري بالطائف
    شركة نقل اثاث بالطائف


    ReplyDelete