Let us create Example DataFrame to explain how to select List of columns of type "Column" from a dataframe
spark-shell --queue= *;
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
Spark context available as sc
SQL context available as sqlContext.
scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4f9a8d71
scala> val BazarDF = Seq(
| ("Veg", "tomato", 1.99),
| ("Veg", "potato", 0.45),
| ("Fruit", "apple", 0.99),
| ("Fruit", "pineapple", 2.59),
| ("Fruit", "apple", 1.99)
| ).toDF("Type", "Item", "Price")
BazarDF: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]
scala> BazarDF.show()
+-----+---------+-----+
| Type| Item|Price|
+-----+---------+-----+
| Veg| tomato| 1.99|
| Veg| potato| 0.45|
|Fruit| apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit| apple| 1.99|
+-----+---------+-----+
Create a List[Column] with column names.
scala> var selectExpr : List[Column] = List("Type","Item","Price")
<console>:25: error: not found: type Column
var selectExpr : List[Column] = List("Type","Item","Price")
^
If you are getting the same error Please take a look into this page .
Using : _* annotation select the columns from dataframe.
scala> var dfNew = BazarDF.select(selectExpr: _*)
dfNew: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]
scala> dfNew.show()
+-----+---------+-----+
| Type| Item|Price|
+-----+---------+-----+
| Veg| tomato| 1.99|
| Veg| potato| 0.45|
|Fruit| apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit| apple| 1.99|
+-----+---------+-----+
You are Done!
I have also explained How to select multiple columns from a sparkdata frame using List[String]
I have also explained How to select multiple columns from a sparkdata frame using List[String]