How to pivot the data to create multiple columns out of 1 column with multiple rows.
spark-shell --queue= *;
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
Spark context available as sc
SQL context available as sqlContext.
scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4f9a8d71
scala> val BazarDF = Seq(
| ("Veg", "tomato", 1.99),
| ("Veg", "potato", 0.45),
| ("Fruit", "apple", 0.99),
| ("Fruit", "pineapple", 2.59),
| ("Fruit", "apple", 1.99)
| ).toDF("Type", "Item", "Price")
BazarDF: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]
scala> BazarDF.show()
+-----+---------+-----+
| Type| Item|Price|
+-----+---------+-----+
| Veg| tomato| 1.99|
| Veg| potato| 0.45|
|Fruit| apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit| apple| 1.99|
+-----+---------+-----+
A pivot can be thought of as translating rows into columns while applying one or more aggregations.
Lets see how we can achieve the same using the above dataframe.
We will pivot the data based on "Item" column.
Lets see how we can achieve the same using the above dataframe.
We will pivot the data based on "Item" column.
scala> BazarDF.groupBy("Type").pivot("Item").agg(min("Price")).show()
+-----+-----+---------+------+------+
| Type|apple|pineapple|potato|tomato|
+-----+-----+---------+------+------+
| Veg| null| null| 0.45| 1.99|
|Fruit| 0.99| 2.59| null| null|
+-----+-----+---------+------+------+
You are Done!