There may be situations to count the occurence of a value in a field.
Let this be the sample input bag.
user_id course_name user_name
1 Social Anju
2 Maths Malu
1 English Anju
1 Maths Anju
Say we need to calculate no of occurence of each user_name.
Anju 3
Malu 1
Inorder to achieve this - COUNT Built In Function can be used.
COUNT Function in Apache Pig
COUNT function compute the number of elements in a bag.
To group count a preceding GROUP BY statement and for global counts GROUP ALL statement is required.
The basic idea to do the above example is to group by user_name and count the tuples in the bag.
--count.pig
userAlias = LOAD '/home/sreeveni/myfiles/pig/count.txt' as
(user_id:long,course_name:chararray,user_name:chararray);
groupedByUser = group userAlias by user_name;
counted = FOREACH groupedByUser GENERATE group as user_name,COUNT(userAlias) as cnt;
result = FOREACH counted GENERATE user_name, cnt;
store result into '/home/sreeveni/myfiles/pig/OUT/count';