JSON files with Apache Spark

Single JSON as Input:

File: families.json

{
“_comment”: “First record”,
“first_name”: “Arvind”,
“last_name”: “Gudiseva”,
“address”: {
“street”: “Kharkhana”,
“city”: “Secunderabad”,
“state”: “AP”,
“zip”: “500009”
}
}

{
“_comment”: “Second record”,
“first_name”: “Dhyuti”,
“last_name”: “Gudiseva”,
“address”: {
“street”: “Circlepet”,
“city”: “Machilipatnam”,
“state”: “AP”,
“zip”: “521001”
}
}

{
“_comment”: “Third record”,
“first_name”: “Haritha”,
“last_name”: “Murari”,
“address”: {
“street”: “Whitefield”,
“city”: “Bangalore”,
“state”: “KA”,
“zip”: “56066”
}
}

Spark Scala Console:

scala> val familyJson = sqlContext.jsonFile(“families.json”);
scala> familyJson.registerTempTable(“families”)
scala> val familyDetails = sqlContext.sql(“SELECT first_name, address.city, address.state FROM families”)
scala> familyDetails.collect.foreach(println)

 

Valid JSON File as Input:

File: family_names.json

[{
“first_name”: “Arvind”,
“last_name”: “Gudiseva”,
“address”: {
“street”: “Kharkhana”,
“city”: “Secunderabad”,
“state”: “AP”,
“zip”: “500009”
}
}, {
“first_name”: “Dhyuti”,
“last_name”: “Gudiseva”,
“address”: {
“street”: “Circlepet”,
“city”: “Machilipatnam”,
“state”: “AP”,
“zip”: “521001”
}
}, {
“first_name”: “Haritha”,
“last_name”: “Murari”,
“address”: {
“street”: “Whitefield”,
“city”: “Bangalore”,
“state”: “KA”,
“zip”: “56066”
}
}]

Spark Scala Console:

scala> val namesRDD = sc.wholeTextFiles(“family_names.json”).map(x => x._2)
scala> val namesJson = sqlContext.read.json(namesRDD)
scala> namesJson.registerTempTable(“names”)
scala> sqlContext.sql(“SELECT * FROM names”).collect.foreach(println)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s