In order to explain, Lets create a
dataframe with 3 columns
spark-shell --queue= *;
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
Spark context available as sc
SQL context available as sqlContext.
scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4f9a8d71
scala> import org.apache.spark.sql.Column
scala> val BazarDF = Seq(
("Veg", "tomato", 1.99),
("Veg", "potato", 0.45),
("Fruit", "apple", 0.99),
("Fruit", "pineapple", 2.59)
).toDF("Type", "Item", "Price")
BazarDF: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]
Lets see how to add 3 new columns into
this dataframe with dummy values.
One way of doing this is
scala> var BazarWithColumnDF = BazarDF.withColumn("Retailer",lit("null").as("StringType"))
.withColumn("Quantity",lit(0.0).as("DoubleType"))
BazarWithColumnDF: org.apache.spark.sql.DataFrame =
[Type: string, Item: string, Price: double, Retailer: string, Quantity: double]
We can use the same method using
foldLeft aswell
Now we need to define a list with new columnName to do this.
scala> var ColNameWithDatatype: List[(String, Column)] = List()
ColNameWithDatatype: List[(String, org.apache.spark.sql.Column)] = List()
scala> ColNameWithDatatype = List(("Retailer", lit("null").as("StringType")),
("Quantity", lit(0.0).as("DoubleType")))
ColNameWithDatatype: List[(String, org.apache.spark.sql.Column)] =
List((Retailer,null AS StringType#91),
(Quantity,0.0 AS DoubleType#92))
scala> var BazarWithColumnDF1 = ColNameWithDatatype.foldLeft(BazarDF)
{ (tempDF, colName) =>
| tempDF.withColumn(colName._1, colName._2)
| }
BazarWithColumnDF1: org.apache.spark.sql.DataFrame =
[Type: string, Item: string, Price: double,
Retailer: string, Quantity: double]
scala> BazarWithColumnDF1.show()
+-----+---------+-----+--------+--------+
| Type| Item|Price|Retailer|Quantity|
+-----+---------+-----+--------+--------+
| Veg| tomato| 1.99| null| 0.0|
| Veg| potato| 0.45| null| 0.0|
|Fruit| apple| 0.99| null| 0.0|
|Fruit|pineapple| 2.59| null| 0.0|
+-----+---------+-----+--------+--------+