Lets see how to select multiple columns from a spark data frame.
Create Example DataFrame
spark-shell --queue= *;
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
Spark context available as sc
SQL context available as sqlContext.
scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4f9a8d71
scala> val BazarDF = Seq(
| ("Veg", "tomato", 1.99),
| ("Veg", "potato", 0.45),
| ("Fruit", "apple", 0.99),
| ("Fruit", "pineapple", 2.59),
| ("Fruit", "apple", 1.99)
| ).toDF("Type", "Item", "Price")
BazarDF: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]
scala> BazarDF.show()
+-----+---------+-----+
| Type| Item|Price|
+-----+---------+-----+
| Veg| tomato| 1.99|
| Veg| potato| 0.45|
|Fruit| apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit| apple| 1.99|
+-----+---------+-----+
Now our example dataframe is ready.
Create a List[String] with column names.
Create a List[String] with column names.
scala> var selectExpr : List[String] = List("Type","Item","Price")
selectExpr: List[String] = List(Type, Item, Price)
Now our list of column names is also created.
Lets select these columns from our dataframe.
Use .head and .tail to select the whole values mentioned in the
List()
scala> var dfNew = BazarDF.select(selectExpr.head,selectExpr.tail: _*)
dfNew: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]
scala> dfNew.show()
+-----+---------+-----+
| Type| Item|Price|
+-----+---------+-----+
| Veg| tomato| 1.99|
| Veg| potato| 0.45|
|Fruit| apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit| apple| 1.99|
+-----+---------+-----+
I will also explaine How to select multiple columns from a spark
data frame using List[Column] in next post.