Tuesday, 8 April 2014

Custom Parameters To Pig Script


There may be scenerios where we need to make our custom pig scripts, which can take any arguments.

Below is an sample code for a Custom Pig Script.

Sample Pig Script

The "customparam.pig" loads an input with custom argument and generates a single field from the input bag to another bag and stores the new bag to HDFS.


Here the input,delimiter for input file,output and filed to seperate are given as custom arguments to Pig Scripts.
--customparam.pig
--load hdfs/local fs data
original = load '$input' using PigStorage('$delimiter');
--filter a specific field value into another bag 
filtered = foreach original generate $split; 
--storing data into hdfs/local fs
store filtered into '$output'; 

Pig Scripts can be run as Local or in MapReduce Mode.


Local Mode

pig -x local -f customparam.pig -param input=Pig.csv -param output=OUT/pig -param delimiter="," -param split='$1'

This is the sample "Pig.csv" file which is the custom input used in command line.The custom delimiter is ",".

Pig1,23.5,Matched
Pig2,6.88,Not Matched
Pig3,6.1,Not Matched

And seperating 2 nd column from the original bag to a new bag.Any field in Pig starts with $0,$1,$2,....So if we need to generate 2 nd column the split param should be "$1".


After executing the above command. The part file content will be

23.5
6.88
6.1

If the command is run in MapReduce mode the part file get stored in HDFS.


2 comments:

  1. Above example not working in my cluster, I'm using pig-0.8.1-cdh3u5.tar.gz version of PIG. Is there any version dependent to run above example, Please suggest me.

    ReplyDelete
    Replies
    1. Will let you know Anil. willl check with the same version of Pig

      Delete