随笔分类 - 大数据
spark,scala
摘要:顶点:VertexRDD边:EdgeRDD、Edge、EdgeDirectionTriplet:EdgeTriplet存储:PartitionStrategy通常的存储方式有两种:切边或切顶点,GraphX用的是切顶点,有四种存储方式:EdgePartition2DEdgePartition1DRa...
阅读全文
摘要:1 OverviewGraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a...
阅读全文
摘要:1 OverviewSpark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data ...
阅读全文
摘要:1 OverviewSpark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as dist...
阅读全文
摘要:1、RDD是Resilient Distributed Dataset(即"弹性分布式数据”)的缩写,它是Spark中的基本抽象类,包含在所有RDD中存在的基本操作:map、filter、persist。immutable:不可变的;implicit conversion:隐式变换;propagat...
阅读全文
摘要:过去的几十年中,计算模式经历了大机时代的终端-主机模式(T-S模式),个人PC时代的客户机-服务器模式(C-S模式),到互联网时代的浏览器-服务器模式(B-S模式),一直到如今的网格计算和云计算的繁荣。 但是,网格计算缺少商业化实现,而且是基于中间件技术,需要用户通过编程或者安装设置来搭建底层架构,
阅读全文
摘要:一、HBase简介 HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。HBase利用Hadoop HDFS作为其文件存储系统,利用Hadoop MapReduce来处理HB
阅读全文
摘要:/** * Created by root on 9/8/15. */import org.apache.spark._import org.apache.spark.graphx._import org.apache.spark.rdd.RDDobject SparkGraphXTest { d...
阅读全文
摘要:/** * Created by root on 9/7/15. */import org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContextobject S...
阅读全文
摘要:/** * Created by root on 9/8/15. */import org.apache.spark._import org.apache.spark.rdd.RDDimport org.apache.spark.streaming._import org.apache.spark....
阅读全文
摘要:/** * Created by root on 9/7/15. */import org.apache.spark.SparkContextimport org.apache.spark.SparkConfobject RDDTest { def main(args: Array[String]...
阅读全文
摘要:/** * Created by root on 9/6/15. */import org.apache.spark.SparkContextimport org.apache.spark.SparkConfobject HelloSpark { def main(args: Array[Stri...
阅读全文

浙公网安备 33010602011771号