Saturday, 17 May 2014

Count Frequency Of Values In A Column Using Apache Pig

There may be situations to count the occurence of a value in a field.
Let this be the sample input bag.

user_id   course_name user_name
1           Social      Anju
2           Maths       Malu
1           English     Anju
1           Maths       Anju

Say we need to calculate no of occurence of each user_name.
Anju 3
Malu 1

Inorder to achieve this - COUNT Built In Function can be used.

COUNT Function in Apache Pig

COUNT function  compute the number of elements in a bag.
To group count a preceding GROUP BY statement and for global counts GROUP ALL statement is required.

The basic idea to do the above example is to group by user_name and count the tuples in the bag.


 userAlias = LOAD '/home/sreeveni/myfiles/pig/count.txt' as 
 groupedByUser = group userAlias by user_name;
 counted = FOREACH groupedByUser GENERATE group as user_name,COUNT(userAlias) as cnt;
 result = FOREACH counted GENERATE user_name, cnt;
 store result into '/home/sreeveni/myfiles/pig/OUT/count';

The COUNT function ignores NULLs, that is tuple in the bag will not be counted if the first field in this tuple is NULL.
COUNT_STAR can be used to count fields including NULL values.


  1. For latest and updated Cloudera certification dumps in PDF format contact us at
    Refer our blog for more details


  2. Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop tutorial. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.:

  3. Top Trending Technologies of 2019. Watch here:

  4. Very Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.