Friday 25 November 2022

Extracting JSON object within a spark dataframe

 Let's see how we can extract a Json object from a spark dataframe column

This is an example data frame

import numpy as np
import pandas as pd
from pyspark.sql.functions import *
from pyspark.sql.types import *
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled',False)

d = { 'result': [{"A": {"B": [{"key":{"dateAtString": "1990-10-12T10:45:11.691274Z", "dateLastString": "1990-10-12T10:46:45.372113Z", "valueRes": {"C": "AZZ", "test": "2"}, "result": "True"}},{"key":{"dateAtString": "1990-10-12T10:45:11.691274Z", "dateLastString": "1990-10-12T10:46:45.372113Z", "valueRes": {"C": "AW", "test": "2"}, "result": "true"}}]}}]}

df = pd.DataFrame(d)
sparkDF = spark.createDataFrame(df)



1. Let's extract value of  'A'
sparkDF = sparkDF.select(explode(sparkDF.result.A).alias('col1','col2'))
As we are exploding till A Json object, this will help to bring key (B) and value (array) into 2 different columns.





2. Now let's drill down further
    explode array element in col2

sparkDF = sparkDF.select(explode(sparkDF.result.A.B).alias('result'))



3. Expanding result column

sparkDF = sparkDF.select(explode(sparkDF.result.A.B).alias('result')).select('result.key')


I am just selecting data from result column and that too I want to expend the key to get all the needed values.


4. Extract value of result
     sparkDF = sparkDF.select('key.result')
  













5. Extract valueRes.test
    sparkDF = sparkDF.select('key.valueRes')














 You can also put columnname.* to bring all the values into column format.

No comments:

Post a Comment