Saturday, 3 March 2018

Conditional Join in Spark using Dataframe

Lets see how can we add conditions along with dataframe join in spark

Say we have 2 dataframes: dataFrame1,dataFrame2

val dataFrame1 = hc.sql("select * from tbl1") //id,name,code
val dataFrame2 = hc.sql("select * from tbl2") //id,name,code

We need to join these 2 df's with different columns based on condition

We have a decision Flag coming in with true/false value.If the decision flag is true we need to set join condition with id and code columns else only with id column.

So how can we achieve this in scala

val decision: Boolean = false

lets set an expression

val exprs = (if (decision != true) 
                   dataFrame1.col("id").equalTo(dataFrame2.col("id"))
               else dataFrame1.col("id").equalTo(dataFrame2.col("id")) and 
   dataFrame1.col("code").equalTo(dataFrame2.col("code"))))

and then join

dataFrame1.join(dataFrame2, exprs).show

 
This is how you join 2 dataframes with conditions.

2 comments:

  1. The Interior Designer is a plans, researches, coordinates, and manages the projects. Interior design is a multifaceted profession that includes conceptual development, space planning, site inspections, programming, research, communicating with the stakeholders of a project, construction management, and execution of the design.

    Interior Designers in OMR

    ReplyDelete
  2. Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....

    Restaurant in OMR
    Apartments in OMR
    Villas in OMR
    Resorts in OMR

    ReplyDelete