PySpark SQL Recipes

Spark（PySpark）

Spark started from UC Berkeley AMPLab in 2009 and was open sourced in early 2010.

Spark bring multiple processors to the data, as it is parallel processing as data is being processed at a number of places at the same time.

Spark uses a diffferent data structure known as RDD(Resilient Distributed Dataset), It is resilient in a sense that they have an ability to re-create any point of time

during the execution process. Original RDDs remain unaltered.

in memory computations. It can be used with various data sources such as HBase, Cassandra, Amazon S3, HDFs, etc.

Spark Core is reponsible for managing tasks, I/O operations, fault tolerations, and memory management, etc.

TO BE UPDATED！

posted @ 2022-05-06 10:23 大脚板同志阅读(10) 评论(0) 收藏举报

刷新页面返回顶部

PySpark SQL Recipes

公告