学习 PySpark (一)

网上有很多学习PySpark的博客教程,看了几个,觉得做的都很精致。

这篇博客不论排版或者内容目前都会粗糙一些,随着我学习一下如何

排版,以及对Spark了解更深入,会予以改进。  --12.3

<<Machine Learning with PySpark with NLP and RS >>

Spark Core enables the in-memory computations that drive the parallel and distributed processing of data. Spark Core

is rensponsible for managing tasks, I/O operations, fault tolerance, and memory management, etc.

新的大数据计算:Bringing multiple processors to the data.

Apache Spark started as a research project at the UC Berkeley AMPLab in 2009

and was open sourced in early 2010. different data structure known as RDD (resilient distributed dataset).

 

posted @ 2021-12-05 19:46  大脚板同志  阅读(50)  评论(0)    收藏  举报