Tuesday 2 December 2014

Hive Bucketed Tables


In previous post we had seen how  to create partition tables in Hive.

Lets see how to create buckets in Hive table


The main difference between Hive partitioning and Bucketing is ,when we do partitioning, we create a partition for each unique value of the column. But there may be situation where we need to create lot of tiny partitions. But if you use bucketing, you can limit it to a number which you choose and decompose your data into those buckets. In hive a partition is a directory but a bucket is a file.



In hive, bucketing does not work by default. You will have to set following variable to enable bucketing. set hive.enforce.bucketing=true;


1. Creating a staging table to store your data

create external table stagingtbl (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String) row format delimited fields terminated by "," location '/user/aibladmin/Hive'; 

2. Create bucketed table

create table emp_bucket (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String) clustered by (department) into 3 buckets row format delimited fields terminated by ",";

3. Load data from stagingtbl to bucketed table

from stagingtbl insert into table emp_bucket 
       select employeeid,firstname,designation,salary,department;


4. Check how many data file have created in Hive metastore.


Lets check the table content in Hive warehouse




We can find 3 files in warehouse directory for department A,B and C.Each bucket contains unique values.

31 comments:

  1. Very nice explanation

    thanks

    ReplyDelete
    Replies
    1. article is nice,
      hive bucketing

      It is a way of dividing a table into related parts based on the values of partitioned columns. example date, city and department.
      It is helpful when the table has one or more Partition keys.
      Hive partition is a sub-directory in the table directory.
      It is a basic unit of data storage method used in Apache hive

      Delete
  2. Hi Sreeveni,
    Did you use bucket map join. can you explain usecase for bucket map join.explain with simple example


    Thanks
    Hareesh

    ReplyDelete
  3. When i tried this i don't see all the 3buckets created only one is created with all data can you please explain me whether do i need to set anything other than what was mentioned here.

    ReplyDelete
  4. Hi sreeveni, Nice explanation,
    I need a small info when would we use exactly this bucketing concepts? real time scenarios can you explain pls?!
    Thanks
    Venu
    http://www.apachespark.in

    ReplyDelete
    Replies
    1. hive bucketing
      Advantages
      It reduces the query latency (delay) by scanning only relevant partitioned data instead of the whole data set. i.e No need to search entire table columns for a single record.
      It distributes execution load horizontally.
      Hive partition reduces the query processing time on the partitioned data set.
      It is suitable for low volume of data. e.g. Calculate the population of “Bangalore city” is very fast instead of searching entire population of world.

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Thank you. Very helpful explanation for hive bucketing. you can also see the full details about hive partition and bucketing as well as the hadoop ecosystems in-depth with clear examples in the below link http://www.geoinsyssoft.com/hive-partition-bucketing/

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. very nice explanation..thanks for sharing..and visit our site for more on hadoop..
    http://bit.ly/2bZrnGP

    ReplyDelete
  10. its a very good explanation for hive bucketing..easy to learn..

    http://bit.ly/2dcxHPD

    ReplyDelete
  11. Excellent blog on Hive Bucketed Tables. Thank you sharing you knowledge with us.
    Devops Training in Bangalore
    itEanz

    ReplyDelete
  12. Is there a way to re-create the same bucketed table again after droping it.
    like in partitioning we use msck repair to get our partitioned data.

    ReplyDelete
  13. I have to voice my passion for your kindness giving support to those people that should have guidance on this important matter.

    AWS Training in Bangalore

    ReplyDelete
  14. Thanks for your great and helpful presentation I like your good service. I always appreciate your post. That is very interesting I love reading and I am always searching for informative information like this. Nice blog,I understood the topic very clearly,And want to study more like thisJava training in Chennai

    Java Online training in Chennai

    Java Course in Chennai

    Best JAVA Training Institutes in Chennai

    Java training in Bangalore

    Java training in Hyderabad

    Java Training in Coimbatore

    Java Training

    Java Online Training

    ReplyDelete
  15. Thanks for sharing an informative blog keep rocking bring more details.I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly. I am quite sure I will learn much new stuff right here! Good luck for the next!

    angular js training in chennai

    angular training in chennai

    angular js online training in chennai

    angular js training in bangalore

    angular js training in hyderabad

    angular js training in coimbatore

    angular js training

    angular js online training


    ReplyDelete
  16. This article is really helpful for me. I am regular visitor to this blog. Share such kind of article more in future.It’s hard to come by experienced people about this subject, but you seem like you know what you’re talking about! Thanks.
    DevOps Training in Chennai

    DevOps Online Training in Chennai

    DevOps Training in Bangalore

    DevOps Training in Hyderabad

    DevOps Training in Coimbatore

    DevOps Training

    DevOps Online Training

    ReplyDelete
  17. Great site and a great topic as well I really get amazed to read this.This is incredible,I feel really happy to have seen your webpage.I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
    Full Stack Training in Chennai | Certification | Online Training Course
    Full Stack Training in Bangalore | Certification | Online Training Course

    Full Stack Training in Hyderabad | Certification | Online Training Course
    Full Stack Developer Training in Chennai | Mean Stack Developer Training in Chennai
    Full Stack Training

    Full Stack Online Training




    ReplyDelete
  18. Thanks for sharing an informative blog keep rocking bring more details.I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly. I am quite sure I will learn much new stuff right here! Good luck for the next!



    AWS Course in Bangalore

    AWS Course in Hyderabad

    AWS Course in Coimbatore

    AWS Course

    AWS Certification Course

    AWS Certification Training

    AWS Online Training

    AWS Training

    ReplyDelete
  19. There are many interesting information included and i can easily understand all given information.I post something on my blog to post something, or wait to post something worth saying. Keep update more information....

    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training


    ReplyDelete