随笔分类 -  Big data

上一页 1 2 3 4 下一页

Copycat - command
摘要:client.submit(new PutCommand("foo", "Hello world!")); ServerContextconnection.handler(CommandRequest.class, request -> state.command(request)); State.command ReserveState开始,会把command forward到leade... 阅读全文

posted @ 2017-03-01 17:18 fxjwind 阅读(277) 评论(0) 推荐(0)

Copycat - CopycatServer
摘要:Server被拉起有两种方式, Address address = new Address("123.456.789.0", 5000); CopycatServer.Builder builder = CopycatServer.builder(address); builder.withStateMachine(MapStateMachine::new); 自己拉起一个cluster, ... 阅读全文

posted @ 2017-02-24 16:53 fxjwind 阅读(504) 评论(0) 推荐(0)

Copycat - Overview
摘要:Copycat’s primary role is as a framework for building highly consistent, fault-tolerant replicated state machines. Copycat servers receive state machi 阅读全文

posted @ 2017-02-23 14:34 fxjwind 阅读(668) 评论(0) 推荐(0)

Copycat - MemberShip
摘要:https://github.com/atomix/copycat http://atomix.io/copycat/docs/membership/ 为了便于实现,Copycat把member分成3种, active, passive, and reserve members — each of which play some role in supporting rapid ... 阅读全文

posted @ 2017-02-20 18:00 fxjwind 阅读(610) 评论(0) 推荐(0)

Online, Asynchronous Schema Change in F1
摘要:F1: A Distributed SQL Database That Scales http://disksing.com/understanding-f1-schema-change mark 阅读全文

posted @ 2016-12-27 17:15 fxjwind 阅读(563) 评论(0) 推荐(0)

Apache Apex
摘要:http://apex.apache.org/docs.html https://apex.apache.org/docs/apex/application_development/ 阅读全文

posted @ 2016-07-07 17:09 fxjwind 阅读(344) 评论(0) 推荐(0)

Why Apache Beam? A data Artisans perspective
摘要:https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison https://github.com/apache/incubator-beam https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.ore... 阅读全文

posted @ 2016-06-30 18:20 fxjwind 阅读(761) 评论(0) 推荐(0)

HybridTime - Accessible Global Consistency with High Clock Uncertainty
摘要:Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model,and offer only eventual consistency. Others such as HBase [1] and BigTable [4] offer strong consistency only for operat... 阅读全文

posted @ 2016-05-09 20:54 fxjwind 阅读(954) 评论(0) 推荐(0)

kudu
摘要:Kudu White Paper http://www.cloudera.com/documentation/betas/kudu/0-5-0/topics/kudu_resources.html http://getkudu.io/overview.html Kudu is a new stora 阅读全文

posted @ 2016-04-26 11:35 fxjwind 阅读(2573) 评论(0) 推荐(1)

Raft
摘要:http://thesecretlivesofdata.com/raft/ https://github.com/coreos/etcd 1 Introduction Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failure... 阅读全文

posted @ 2016-03-29 20:01 fxjwind 阅读(977) 评论(0) 推荐(0)

The world beyond batch: Streaming 101
摘要:https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 这篇文章,首先要说清的一个问题是,给‘Streaming’正名 What is streaming? The crux... 阅读全文

posted @ 2016-02-23 19:49 fxjwind 阅读(958) 评论(0) 推荐(0)

MillWheel: Fault-Tolerant Stream Processing at Internet Scale
摘要:http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/41378.pdf 为什么要做MillWheel? 因为当前的其他的流式系统,无法同时满足 fault tolerance, versatility, and scalability 的需求。 Spark Streaming... 阅读全文

posted @ 2016-02-22 19:48 fxjwind 阅读(1402) 评论(0) 推荐(0)

The Dataflow Model 论文
摘要:A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing 这篇论文的副标题很长,说明几点: 1. 这篇文章的主要工作是,Balancing Correctness, Latency, and Cost,故它仍然不能... 阅读全文

posted @ 2016-01-12 14:02 fxjwind 阅读(3854) 评论(8) 推荐(3)

让Storm插上CEP的翅膀 - Siddhi调研和集成
摘要:什么是 Siddhi?Siddhi 是一种 lightweight, easy-to-use, open source CEP(Complex Event Processing)引擎,由wso2公司开发(http://wso2.com/about/)。像绝大多数的 CEP 系统一样,Siddhi 支... 阅读全文

posted @ 2015-12-15 16:03 fxjwind 阅读(9533) 评论(0) 推荐(0)

漫谈流式计算的一致性
摘要:参考, http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink... 阅读全文

posted @ 2015-11-18 17:59 fxjwind 阅读(3741) 评论(0) 推荐(0)

如何保障流式处理的数据一致性
摘要:背景 相对于传统的Hadoop这样的batch分析平台,流式分析的优点就是实时性, 即可以在秒级别延迟上得到分析结果 。 当然缺点是, 很难保证强一致性,即Exactly-Once语义 (在海量数据的前提下,为了保障吞吐量,无法使用类似事务的强一致性的方案)。 一般流式分析平台都会promise较弱的一致性,即Least-Once语义,保证数据不丢但允许数据重复。 但这只是在正常... 阅读全文

posted @ 2015-07-30 15:55 fxjwind 阅读(1844) 评论(0) 推荐(0)

Big Data资料汇总
摘要:整理和翻新一下自己看过和笔记过的Big Data相关的论文和Blog Streaming & Spark In-Stream Big Data Processing Discretized Streams, 离散化的流数据处理 Spark - A Fault-Tolerant Abstraction for In-Memory Cluster Computing Mesos: A ... 阅读全文

posted @ 2014-01-27 17:12 fxjwind 阅读(700) 评论(0) 推荐(0)

Apache Samza - Reliable Stream Processing atop Apache Kafka and Hadoop YARN
摘要:http://engineering.linkedin.com/data-streams/apache-samza-linkedins-real-time-stream-processing-frameworkhttp://samza.incubator.apache.org/前两年一直在使用Kafka, 虽说Kafka一直说可用于online分析, 但是实际在使用的时候会发现问题很多, 比如deploy, 调度, failover等, 我们也做了一些相应的工作 Samza算是把这个补全了, 可以更加简单的在Kafka上进行online分析, 所以看着比较亲切1 Background首先对me 阅读全文

posted @ 2014-01-14 13:58 fxjwind 阅读(1555) 评论(0) 推荐(0)

Sparrow - Distributed, Low Latency Scheduling
摘要:http://www.cs.berkeley.edu/~matei/papers/2013/sosp_sparrow.pdf http://www.eecs.berkeley.edu/~keo/talks/sparrow-sosp-talk.pdf 解决的问题 现有的scheduler方案, 都是基于master的, 因为schedule必须要知道所有slave的情况, 然后才能决定... 阅读全文

posted @ 2014-01-14 13:41 fxjwind 阅读(1370) 评论(0) 推荐(0)

The Log: What every software engineer should know about real-time data's unifying abstraction
摘要:http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying主要的思想, 将所有的系统都可以看作两部分,真正的数据log系统和各种各样的query engine 所有的一致性由log系统来保证,其他各种query engine不需要考虑一致性,安全性,只需要不停的从log系统来同步数据,如果数据丢失或crash可以从log系统replay来恢复 可以看出kafka系统在linke... 阅读全文

posted @ 2013-12-18 10:41 fxjwind 阅读(1455) 评论(2) 推荐(0)

上一页 1 2 3 4 下一页