In statistics and probability theory, the median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half.

The median is the central point of a data set.

Consider the following data points: 1,4,5,6,7

The Median is "5".

Lets see how we will find median in Hive.

Consider a "test" table.

-------------------|Name Age|

------------------

|A 23 |

|B 23 |

|C 20 |

------------------

```
hive> select * from test;
OK
A 23
B 23
C 20
Time taken: 4.219 seconds, Fetched: 3 row(s)
```

Lets say we are going to find the median for Age column in "test" table.

Our expected median is "23".

**PERCENTILE(BIGINT col,0.5)**function helps to compute median in hive.The 50th percentile would be the median.

Structure of "test" table

```
hive> desc test;
OK
firstname string
age int
Time taken: 0.32 seconds, Fetched: 2 row(s)
```

Here we can see the column we are going to find median is in

**INT**. We need to convert the column into**BIGINT**.
Lets try out the query

**select percentile(cast(age as BIGINT), 0.5) from test;**
Here we casted age column into BIGINT.

hive> select percentile(cast(age as BIGINT), 0.5) from test1; Query ID = aibladmin_20141211140606_c61cb042-ed14-4048-8270-4cea1eece1c7 Total jobs = 1 Launching Job 1 out of 1 . . OK23.0Time taken: 27.659 seconds, Fetched: 1 row(s)

**23.0**is the expected result which is the median for

**[23,23,20]**.