Tuesday, 2 December 2014

Hive Bucketed Tables

In previous post we had seen how  to create partition tables in Hive.

Lets see how to create buckets in Hive table

The main difference between Hive partitioning and Bucketing is ,when we do partitioning, we create a partition for each unique value of the column. But there may be situation where we need to create lot of tiny partitions. But if you use bucketing, you can limit it to a number which you choose and decompose your data into those buckets. In hive a partition is a directory but a bucket is a file.

In hive, bucketing does not work by default. You will have to set following variable to enable bucketing. set hive.enforce.bucketing=true;

1. Creating a staging table to store your data

create external table stagingtbl (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String) row format delimited fields terminated by "," location '/user/aibladmin/Hive'; 

2. Create bucketed table

create table emp_bucket (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String) clustered by (department) into 3 buckets row format delimited fields terminated by ",";

3. Load data from stagingtbl to bucketed table

from stagingtbl insert into table emp_bucket 
       select employeeid,firstname,designation,salary,department;

4. Check how many data file have created in Hive metastore.

Lets check the table content in Hive warehouse

We can find 3 files in warehouse directory for department A,B and C.Each bucket contains unique values.


  1. Hi Sreeveni,
    Did you use bucket map join. can you explain usecase for bucket map join.explain with simple example


  2. For latest and updated Cloudera certification dumps in PDF format contact us at completeexamcollection@gmail.com.
    Refer our blog for more details http://completeexamcollection.blogspot.in/2015/04/cloudera-hadoop-certification-dumps.html

  3. When i tried this i don't see all the 3buckets created only one is created with all data can you please explain me whether do i need to set anything other than what was mentioned here.

  4. Hi sreeveni, Nice explanation,
    I need a small info when would we use exactly this bucketing concepts? real time scenarios can you explain pls?!

  5. This comment has been removed by the author.

  6. Thank you. Very helpful explanation for hive bucketing. you can also see the full details about hive partition and bucketing as well as the hadoop ecosystems in-depth with clear examples in the below link http://www.geoinsyssoft.com/hive-partition-bucketing/

  7. This comment has been removed by the author.

  8. This comment has been removed by the author.

  9. very nice explanation..thanks for sharing..and visit our site for more on hadoop..

  10. its a very good explanation for hive bucketing..easy to learn..


  11. Good Explanation..One question
    How the data distributed among buckets and
    How hive decides which department will move to which bucket ?

  12. Excellent blog on Hive Bucketed Tables. Thank you sharing you knowledge with us.
    Devops Training in Bangalore

  13. Is there a way to re-create the same bucketed table again after droping it.
    like in partitioning we use msck repair to get our partitioned data.