Spart RDD

RDD: Resilient Distributed Dataset

1. Spark RDD is immutable

Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
various worker nodes for processing, and finally compiling the results to produce the final
result can be done safely without worrying about the underlying data getting changed.

 

2.Spark RDD is distributable

 

3.Spark RDD lives in memory

Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
where Spark is running out of memory or if the data size is growing beyond the capacity, is
it written to disk. Most of the processing on RDD happens in the memory, and that is the
reason why Spark is able to process the data at a lightning fast speed.

 

4.Spark RDD is strongly typed

Spark RDD can be created using any supported data types. These data types can be
Scala/Java supported intrinsic data types or custom created data types such as your own
classes. The biggest advantage coming out of this design decision is the freedom from
runtime errors. If it is going to break because of a data type issue, it will break during
compile time.

 

posted @ 2017-04-09 11:35  ordi  阅读(180)  评论(0编辑  收藏  举报