PySpark SQL Recipes
Spark(PySpark)
Spark started from UC Berkeley AMPLab in 2009 and was open sourced in early 2010.
Spark bring multiple processors to the data, as it is parallel processing as data is being processed at a number of places at the same time.
Spark uses a diffferent data structure known as RDD(Resilient Distributed Dataset), It is resilient in a sense that they have an ability to re-create any point of time
during the execution process. Original RDDs remain unaltered.
in memory computations. It can be used with various data sources such as HBase, Cassandra, Amazon S3, HDFs, etc.
Spark Core is reponsible for managing tasks, I/O operations, fault tolerations, and memory management, etc.
TO BE UPDATED!