Tuesday 12 February 2019

How to select multiple columns from a spark data frame using List[String]


Lets see how to select multiple columns from a spark data frame.
Create Example DataFrame
spark-shell --queue= *;

To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
Spark context available as sc 
SQL context available as sqlContext.

scala>  val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4f9a8d71  

scala> val BazarDF = Seq(
     | ("Veg", "tomato", 1.99),
     | ("Veg", "potato", 0.45),
     | ("Fruit", "apple", 0.99),
     | ("Fruit", "pineapple", 2.59),
     | ("Fruit", "apple", 1.99)
     | ).toDF("Type", "Item", "Price")
BazarDF: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]

scala> BazarDF.show()
+-----+---------+-----+
| Type|     Item|Price|
+-----+---------+-----+
|  Veg|   tomato| 1.99|
|  Veg|   potato| 0.45|
|Fruit|    apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit|    apple| 1.99|
+-----+---------+-----+

Now our example dataframe is ready.
Create a List[String] with column names.
scala> var selectExpr : List[String] = List("Type","Item","Price")
selectExpr: List[String] = List(Type, Item, Price)

Now our list of column names is also created.
Lets select these columns from our dataframe.
Use .head and .tail to select the whole values mentioned in the List()

scala> var dfNew = BazarDF.select(selectExpr.head,selectExpr.tail: _*)
dfNew: org.apache.spark.sql.DataFrame = [Type: string, Item: string, Price: double]

scala> dfNew.show()
+-----+---------+-----+
| Type|     Item|Price|
+-----+---------+-----+
|  Veg|   tomato| 1.99|
|  Veg|   potato| 0.45|
|Fruit|    apple| 0.99|
|Fruit|pineapple| 2.59|
|Fruit|    apple| 1.99|
+-----+---------+-----+

I will also explaine How to select multiple columns from a spark data frame using List[Column] in next post.

14 comments:

  1. Good Information
    "Pressure Vessel Design Course is one of the courses offered by Sanjary Academy in Hyderabad. We have offer professional
    Engineering Course like Piping Design Course,QA / QC Course,document Controller course,pressure Vessel Design Course,
    Welding Inspector Course, Quality Management Course, #Safety officer course."
    Piping Design Course
    Piping Design Course in India­
    Piping Design Course in Hyderabad
    Welding Inspector Course
    Quality Management Course
    Quality Management Course in india
    Safety officer course

    ReplyDelete
  2. As claimed by Stanford Medical, It's really the SINGLE reason women in this country live 10 years more and weigh on average 19 KG lighter than we do.

    (And realistically, it has absolutely NOTHING to do with genetics or some hard exercise and EVERYTHING about "how" they are eating.)

    BTW, What I said is "HOW", and not "what"...

    Tap on this link to find out if this brief questionnaire can help you discover your true weight loss possibilities

    ReplyDelete
  3. Thanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating Us Time Just Went On Reading The article
    by cognex is the AWS Training in Chennai. read more blog about cloud and aws visit here

    ReplyDelete
  4. It is very useful blog. Thanks for sharing. Keep updating new posts on your blog!!
    Who needs help in Digital Marketing for your Business? You can also visit my website. I hope to help you
    Digital Marketing Course AchieversIT

    ReplyDelete

  5. Easily, the article is actually the best topic on this registry related issue. I fit in with your conclusions and will eagerly look forward to your next updates.

    스포츠토토링크

    ReplyDelete
  6. Hello, i think that i saw you visited my site this i came to “return the favor”.I am trying to find things to enhance my web site! I suppose its ok to use a few of your ideas!! 파칭코사이트인포

    ReplyDelete
  7. “Nice info!”
    I went through your blog and I really found your blog very useful, and informative.
    Thank you for sharing for such useful information.
    If you want to make career in data science then visit :
    Data science course in Bangalore

    ReplyDelete
  8. learned how to select multiple columns
    Lead Generation For B2B Companies

    ReplyDelete
  9. Thank you for the infomation
    I concur that this is the most thorough description of the subject. I'm so glad I found your blog and am looking forward to reading your future posts. And I have referred to related content in the link below.

    At Login360, you may get the best training in android Training in Chennai
    . We provide a variety of software-related courses along with complete placement assistance.

    Excellent IT instruction has been given to our pupils in a number of methods by our teachers and subject-matter specialists.

    We offer top-notch instruction in Android technologies, and we frequently update our curricula to include the most recent IT trends.

    We provide placement help for recent grads (recent graduates). We will offer support to all eligible applicants.

    Contact Details:
    Name: Login360 Software Training Institute
    Address: No-06, Ground Floor, 5th Main Road, Vijaya Nagar Velachery, Chennai – 600042.
    Phone: 6385872810


    ReplyDelete
  10. This is a great post. I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting.our sclinbio.com 🙂

    ReplyDelete