Let's see how we can extract a Json object from a spark dataframe column
This is an example data frame
import numpy as npimport pandas as pd
from pyspark.sql.functions import *
from pyspark.sql.types import *
spark.conf.set('spark.sql.execution.arrow.pyspark.enabled',False)
d = { 'result': [{"A": {"B": [{"key":{"dateAtString": "1990-10-12T10:45:11.691274Z", "dateLastString": "1990-10-12T10:46:45.372113Z", "valueRes": {"C": "AZZ", "test": "2"}, "result": "True"}},{"key":{"dateAtString": "1990-10-12T10:45:11.691274Z", "dateLastString": "1990-10-12T10:46:45.372113Z", "valueRes": {"C": "AW", "test": "2"}, "result": "true"}}]}}]}
df = pd.DataFrame(d)
sparkDF = spark.createDataFrame(df)
1. Let's extract value of 'A'
sparkDF = sparkDF.select(explode(sparkDF.result.A).alias('col1','col2'))
As we are exploding till A Json object, this will help to bring key (B) and value (array) into 2 different columns.
2. Now let's drill down further
explode array element in col2
sparkDF = sparkDF.select(explode(sparkDF.result.A.B).alias('result'))
3. Expanding result column
sparkDF = sparkDF.select(explode(sparkDF.result.A.B).alias('result')).select('result.key')
I am just selecting data from result column and that too I want to expend the key to get all the needed values.
sparkDF = sparkDF.select('key.result')
5. Extract valueRes.test
sparkDF = sparkDF.select('key.valueRes')
You can also put columnname.* to bring all the values into column format.