随笔分类 -  大数据

spark,scala
摘要:顶点:VertexRDD边:EdgeRDD、Edge、EdgeDirectionTriplet:EdgeTriplet存储:PartitionStrategy通常的存储方式有两种:切边或切顶点,GraphX用的是切顶点,有四种存储方式:EdgePartition2DEdgePartition1DRa... 阅读全文
posted @ 2015-11-26 14:33 sunflower627 阅读(409) 评论(0) 推荐(0)
摘要:1 OverviewGraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a... 阅读全文
posted @ 2015-11-26 14:32 sunflower627 阅读(375) 评论(0) 推荐(0)
摘要:1 OverviewSpark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data ... 阅读全文
posted @ 2015-11-26 14:31 sunflower627 阅读(348) 评论(0) 推荐(0)
摘要:1 OverviewSpark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as dist... 阅读全文
posted @ 2015-11-26 14:31 sunflower627 阅读(266) 评论(0) 推荐(0)
摘要:1、RDD是Resilient Distributed Dataset(即"弹性分布式数据”)的缩写,它是Spark中的基本抽象类,包含在所有RDD中存在的基本操作:map、filter、persist。immutable:不可变的;implicit conversion:隐式变换;propagat... 阅读全文
posted @ 2015-11-26 14:30 sunflower627 阅读(316) 评论(0) 推荐(0)
摘要:过去的几十年中,计算模式经历了大机时代的终端-主机模式(T-S模式),个人PC时代的客户机-服务器模式(C-S模式),到互联网时代的浏览器-服务器模式(B-S模式),一直到如今的网格计算和云计算的繁荣。 但是,网格计算缺少商业化实现,而且是基于中间件技术,需要用户通过编程或者安装设置来搭建底层架构, 阅读全文
posted @ 2015-11-03 15:41 sunflower627 阅读(1212) 评论(0) 推荐(0)
摘要:一、HBase简介 HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。HBase利用Hadoop HDFS作为其文件存储系统,利用Hadoop MapReduce来处理HB 阅读全文
posted @ 2015-09-16 16:09 sunflower627 阅读(164) 评论(0) 推荐(0)
摘要:/** * Created by root on 9/8/15. */import org.apache.spark._import org.apache.spark.graphx._import org.apache.spark.rdd.RDDobject SparkGraphXTest { d... 阅读全文
posted @ 2015-09-09 17:32 sunflower627 阅读(154) 评论(0) 推荐(0)
摘要:/** * Created by root on 9/7/15. */import org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContextobject S... 阅读全文
posted @ 2015-09-09 14:46 sunflower627 阅读(237) 评论(0) 推荐(0)
摘要:/** * Created by root on 9/8/15. */import org.apache.spark._import org.apache.spark.rdd.RDDimport org.apache.spark.streaming._import org.apache.spark.... 阅读全文
posted @ 2015-09-09 14:45 sunflower627 阅读(313) 评论(0) 推荐(0)
摘要:/** * Created by root on 9/7/15. */import org.apache.spark.SparkContextimport org.apache.spark.SparkConfobject RDDTest { def main(args: Array[String]... 阅读全文
posted @ 2015-09-09 14:44 sunflower627 阅读(226) 评论(0) 推荐(0)
摘要:/** * Created by root on 9/6/15. */import org.apache.spark.SparkContextimport org.apache.spark.SparkConfobject HelloSpark { def main(args: Array[Stri... 阅读全文
posted @ 2015-09-09 14:43 sunflower627 阅读(151) 评论(0) 推荐(0)