In statistics and probability theory, the median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half.
The median is the central point of a data set.
Consider the following data points: 1,4,5,6,7
The Median is "5".
Lets see how we will find median in Hive.
Consider a "test" table.
-------------------|Name Age|
------------------
|A 23 |
|B 23 |
|C 20 |
------------------
hive> select * from test;
OK
A 23
B 23
C 20
Time taken: 4.219 seconds, Fetched: 3 row(s)
Lets say we are going to find the median for Age column in "test" table.
Our expected median is "23".
PERCENTILE(BIGINT col,0.5) function helps to compute median in hive.The 50th percentile would be the median.
Structure of "test" table
hive> desc test;
OK
firstname string
age int
Time taken: 0.32 seconds, Fetched: 2 row(s)
Here we can see the column we are going to find median is in INT. We need to convert the column into BIGINT.
Lets try out the query
select percentile(cast(age as BIGINT), 0.5) from test;
Here we casted age column into BIGINT.
hive> select percentile(cast(age as BIGINT), 0.5) from test1; Query ID = aibladmin_20141211140606_c61cb042-ed14-4048-8270-4cea1eece1c7 Total jobs = 1 Launching Job 1 out of 1 . . OK 23.0 Time taken: 27.659 seconds, Fetched: 1 row(s)
23.0 is the expected result which is the median for [23,23,20].
i have also cleared CCDH-410 exam with 82%
ReplyDeletehttps://commons.wikimedia.org/wiki/File:CCDH-410.JPG
Delete
ReplyDeleteGreat post!
Hive Tutorial- http://www.hub4tech.com/hive-tutorial
Machine Learning Tutorial- http://www.hub4tech.com/machine-learning-tutorial
Oozie Tutorial-http://www.hub4tech.com/oozie-tutorial
Pig Tutorial- http://www.hub4tech.com/pig-tutorial
HDFS Tutorial-https://www.hub4tech.com/hdfs-tutorial
Free Study Material R Analytics & Hadoop - http://www.hub4tech.com/big-data
Hadoop Interview Questions - http://www.hub4tech.com/interview/hadoop
Really a good piece of knowledge on Big Data and Hadoop. Thanks for such a good post. I would like to recommend one more resource NPN Training which helps in getting more knowledge on Hadoop. The best part of NPN Training is they provide complete Hands-on classes.
ReplyDeleteFor More feedback visit
http://npntraining.com/testimonial.php
ReplyDeleteGreat and useful article. Creating content regularly is very tough. Your points are motivated me to move on.
Manual testing training in Chennai
Selenium training in Chennai
Software testing training in Chennai
best job oriented big data big data hadoop online training ...
ReplyDeleteIt's interesting that many of the bloggers your tips helped to clarify a few things for me as well as giving.. very specific nice content. And tell people specific ways to live their lives.Sometimes you just have to yell at people and give them a good shake to get your point across.
ReplyDeleteDigital Marketing Training in Chennai
Hadoop Training in Chennai
Can you please tell how to compute statistical mode in hive for multiple columns in a single query
ReplyDeleteUseful info thanks for sharing
ReplyDeleteselenium training institute chennai
Thanks for the great article this is very useful info thanks for the wonderful post.
ReplyDeleteBest Rpa Online Training
this article is really helpful thank you for this article. I want to suggest my own post also please go through it once https://indiancybersecuritysolutions.com/python-training-in-bangalore/
ReplyDeletearticle is really helpful thank you for this article.
ReplyDeletesalesforce online training
As the growth of Big data platform managed service , it is essential to spread knowledge in people. This meetup will work as a burst of awareness.
ReplyDeleteclick here to read more
ReplyDeleteclick here to read more
ReplyDeleteWhat is a casino site? | Lucky Club
ReplyDeleteThere is no online casino to play games luckyclub with but the easiest way to get started is by reading our guide on how to play slots. How Slots Games · Online Casino Sites · Slots Casinos