Tuesday, 14 June 2022

Join through expression variable as on condition in databricks using PySpark

 Lets see how to join 2 table with a parameterized on condition in PySpark

Eg: I have 2 dataframes A and B and I want to join them with id,inv_no,item and subitem


onExpr = [(A.id == B.id) &
                    (A.invc_no == B.invc_no) & 
                    (A.item == B.item) & 
                    (A.subItem == B.subItem)] 

 dailySaleDF = A.join(B, onExpr, 'left').select([c for c in df.columns])



Save dataframe to table and ADLS path in one go

 Lets see how to save dataframe into a table and create view on adls.


df.write.format('delta')
            .mode('overwrite')
            .option('overwriteSchema', 'true')
            .saveAsTable('{database_name}.{tbl}'.format(database_name = database,tbl = table_name)
                , path = '{base_dir}/{tbl}/'.format(base_dir  =  location,tbl  =  table_name))

How to get Azure Key Vault values into Azure Databricks Notebook

It is always a best practice to store the secrets in Azure Key vault. In order to access them in databricks , first a scope needs to be defined and using the scope and key you will be able to access the secrets.


In below example scope is "myScopeDEV" and key is "myScecretKey". Passing env_name as "DEV"