摘要:client.submit(new PutCommand("foo", "Hello world!")); ServerContextconnection.handler(CommandRequest.class, request -> state.command(request)); State.command ReserveState开始,会把command forward到leade...
阅读全文
摘要:Server被拉起有两种方式, Address address = new Address("123.456.789.0", 5000); CopycatServer.Builder builder = CopycatServer.builder(address); builder.withStateMachine(MapStateMachine::new); 自己拉起一个cluster, ...
阅读全文
摘要:Copycat’s primary role is as a framework for building highly consistent, fault-tolerant replicated state machines. Copycat servers receive state machi
阅读全文
摘要:https://github.com/atomix/copycat http://atomix.io/copycat/docs/membership/ 为了便于实现,Copycat把member分成3种, active, passive, and reserve members — each of which play some role in supporting rapid ...
阅读全文
摘要:F1: A Distributed SQL Database That Scales http://disksing.com/understanding-f1-schema-change mark
阅读全文
摘要:http://apex.apache.org/docs.html https://apex.apache.org/docs/apex/application_development/
阅读全文
摘要:https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison https://github.com/apache/incubator-beam https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.ore...
阅读全文
摘要:Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model,and offer only eventual consistency. Others such as HBase [1] and BigTable [4] offer strong consistency only for operat...
阅读全文
摘要:Kudu White Paper http://www.cloudera.com/documentation/betas/kudu/0-5-0/topics/kudu_resources.html http://getkudu.io/overview.html Kudu is a new stora
阅读全文
摘要:http://thesecretlivesofdata.com/raft/ https://github.com/coreos/etcd 1 Introduction Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failure...
阅读全文
摘要:https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 这篇文章,首先要说清的一个问题是,给‘Streaming’正名 What is streaming? The crux...
阅读全文
摘要:http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/41378.pdf 为什么要做MillWheel? 因为当前的其他的流式系统,无法同时满足 fault tolerance, versatility, and scalability 的需求。 Spark Streaming...
阅读全文
摘要:A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing 这篇论文的副标题很长,说明几点: 1. 这篇文章的主要工作是,Balancing Correctness, Latency, and Cost,故它仍然不能...
阅读全文
摘要:什么是 Siddhi?Siddhi 是一种 lightweight, easy-to-use, open source CEP(Complex Event Processing)引擎,由wso2公司开发(http://wso2.com/about/)。像绝大多数的 CEP 系统一样,Siddhi 支...
阅读全文
摘要:参考, http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink...
阅读全文
摘要:背景 相对于传统的Hadoop这样的batch分析平台,流式分析的优点就是实时性, 即可以在秒级别延迟上得到分析结果 。 当然缺点是, 很难保证强一致性,即Exactly-Once语义 (在海量数据的前提下,为了保障吞吐量,无法使用类似事务的强一致性的方案)。 一般流式分析平台都会promise较弱的一致性,即Least-Once语义,保证数据不丢但允许数据重复。 但这只是在正常...
阅读全文
摘要:整理和翻新一下自己看过和笔记过的Big Data相关的论文和Blog Streaming & Spark In-Stream Big Data Processing Discretized Streams, 离散化的流数据处理 Spark - A Fault-Tolerant Abstraction for In-Memory Cluster Computing Mesos: A ...
阅读全文
摘要:http://engineering.linkedin.com/data-streams/apache-samza-linkedins-real-time-stream-processing-frameworkhttp://samza.incubator.apache.org/前两年一直在使用Kafka, 虽说Kafka一直说可用于online分析, 但是实际在使用的时候会发现问题很多, 比如deploy, 调度, failover等, 我们也做了一些相应的工作 Samza算是把这个补全了, 可以更加简单的在Kafka上进行online分析, 所以看着比较亲切1 Background首先对me
阅读全文
摘要:http://www.cs.berkeley.edu/~matei/papers/2013/sosp_sparrow.pdf http://www.eecs.berkeley.edu/~keo/talks/sparrow-sosp-talk.pdf 解决的问题 现有的scheduler方案, 都是基于master的, 因为schedule必须要知道所有slave的情况, 然后才能决定...
阅读全文
摘要:http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying主要的思想, 将所有的系统都可以看作两部分,真正的数据log系统和各种各样的query engine 所有的一致性由log系统来保证,其他各种query engine不需要考虑一致性,安全性,只需要不停的从log系统来同步数据,如果数据丢失或crash可以从log系统replay来恢复 可以看出kafka系统在linke...
阅读全文