Big data - 随笔分类(第2页) - fxjwind

Copycat - command

摘要：client.submit(new PutCommand("foo", "Hello world!")); ServerContextconnection.handler(CommandRequest.class, request -> state.command(request)); State.command ReserveState开始，会把command forward到leade... 阅读全文

posted @ 2017-03-01 17:18 fxjwind 阅读(277) 评论(0) 推荐(0)

Copycat - CopycatServer

摘要：Server被拉起有两种方式， Address address = new Address("123.456.789.0", 5000); CopycatServer.Builder builder = CopycatServer.builder(address); builder.withStateMachine(MapStateMachine::new); 自己拉起一个cluster， ... 阅读全文

posted @ 2017-02-24 16:53 fxjwind 阅读(504) 评论(0) 推荐(0)

Copycat - Overview

摘要：Copycat’s primary role is as a framework for building highly consistent, fault-tolerant replicated state machines. Copycat servers receive state machi 阅读全文

posted @ 2017-02-23 14:34 fxjwind 阅读(668) 评论(0) 推荐(0)

Copycat - MemberShip

摘要：https://github.com/atomix/copycat http://atomix.io/copycat/docs/membership/ 为了便于实现，Copycat把member分成3种， active, passive, and reserve members — each of which play some role in supporting rapid ... 阅读全文

posted @ 2017-02-20 18:00 fxjwind 阅读(610) 评论(0) 推荐(0)

Online, Asynchronous Schema Change in F1

摘要：F1: A Distributed SQL Database That Scales http://disksing.com/understanding-f1-schema-change mark 阅读全文

posted @ 2016-12-27 17:15 fxjwind 阅读(563) 评论(0) 推荐(0)

Apache Apex

摘要：http://apex.apache.org/docs.html https://apex.apache.org/docs/apex/application_development/ 阅读全文

posted @ 2016-07-07 17:09 fxjwind 阅读(344) 评论(0) 推荐(0)

Why Apache Beam? A data Artisans perspective

摘要：https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison https://github.com/apache/incubator-beam https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.ore... 阅读全文

posted @ 2016-06-30 18:20 fxjwind 阅读(761) 评论(0) 推荐(0)

HybridTime - Accessible Global Consistency with High Clock Uncertainty

摘要：Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model，and offer only eventual consistency. Others such as HBase [1] and BigTable [4] offer strong consistency only for operat... 阅读全文

posted @ 2016-05-09 20:54 fxjwind 阅读(954) 评论(0) 推荐(0)

kudu

摘要：Kudu White Paper http://www.cloudera.com/documentation/betas/kudu/0-5-0/topics/kudu_resources.html http://getkudu.io/overview.html Kudu is a new stora 阅读全文

posted @ 2016-04-26 11:35 fxjwind 阅读(2573) 评论(0) 推荐(1)

Raft

摘要：http://thesecretlivesofdata.com/raft/ https://github.com/coreos/etcd 1 Introduction Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failure... 阅读全文

posted @ 2016-03-29 20:01 fxjwind 阅读(977) 评论(0) 推荐(0)

The world beyond batch: Streaming 101

摘要：https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 这篇文章，首先要说清的一个问题是，给‘Streaming’正名 What is streaming? The crux... 阅读全文

posted @ 2016-02-23 19:49 fxjwind 阅读(958) 评论(0) 推荐(0)

MillWheel: Fault-Tolerant Stream Processing at Internet Scale

摘要：http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/41378.pdf 为什么要做MillWheel？因为当前的其他的流式系统，无法同时满足 fault tolerance, versatility, and scalability 的需求。 Spark Streaming... 阅读全文

posted @ 2016-02-22 19:48 fxjwind 阅读(1402) 评论(0) 推荐(0)

The Dataflow Model 论文

摘要：A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing 这篇论文的副标题很长，说明几点: 1. 这篇文章的主要工作是，Balancing Correctness, Latency, and Cost，故它仍然不能... 阅读全文

posted @ 2016-01-12 14:02 fxjwind 阅读(3854) 评论(8) 推荐(3)

让Storm插上CEP的翅膀 - Siddhi调研和集成

摘要：什么是 Siddhi？Siddhi 是一种 lightweight, easy-to-use, open source CEP（Complex Event Processing）引擎，由wso2公司开发（http://wso2.com/about/）。像绝大多数的 CEP 系统一样，Siddhi 支... 阅读全文

posted @ 2015-12-15 16:03 fxjwind 阅读(9533) 评论(0) 推荐(0)

漫谈流式计算的一致性

摘要：参考， http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ http://www.confluent.io/blog/real-time-stream-processing-the-next-step-for-apache-flink... 阅读全文

posted @ 2015-11-18 17:59 fxjwind 阅读(3741) 评论(0) 推荐(0)

如何保障流式处理的数据一致性

摘要：背景相对于传统的Hadoop这样的batch分析平台，流式分析的优点就是实时性，即可以在秒级别延迟上得到分析结果。当然缺点是, 很难保证强一致性，即Exactly-Once语义（在海量数据的前提下，为了保障吞吐量，无法使用类似事务的强一致性的方案）。一般流式分析平台都会promise较弱的一致性，即Least-Once语义，保证数据不丢但允许数据重复。但这只是在正常... 阅读全文

posted @ 2015-07-30 15:55 fxjwind 阅读(1844) 评论(0) 推荐(0)

Big Data资料汇总

摘要：整理和翻新一下自己看过和笔记过的Big Data相关的论文和Blog Streaming & Spark In-Stream Big Data Processing Discretized Streams, 离散化的流数据处理 Spark - A Fault-Tolerant Abstraction for In-Memory Cluster Computing Mesos: A ... 阅读全文

posted @ 2014-01-27 17:12 fxjwind 阅读(700) 评论(0) 推荐(0)

Apache Samza - Reliable Stream Processing atop Apache Kafka and Hadoop YARN

摘要：http://engineering.linkedin.com/data-streams/apache-samza-linkedins-real-time-stream-processing-frameworkhttp://samza.incubator.apache.org/前两年一直在使用Kafka, 虽说Kafka一直说可用于online分析, 但是实际在使用的时候会发现问题很多, 比如deploy, 调度, failover等, 我们也做了一些相应的工作 Samza算是把这个补全了, 可以更加简单的在Kafka上进行online分析, 所以看着比较亲切1 Background首先对me 阅读全文

posted @ 2014-01-14 13:58 fxjwind 阅读(1555) 评论(0) 推荐(0)

Sparrow - Distributed, Low Latency Scheduling

摘要：http://www.cs.berkeley.edu/~matei/papers/2013/sosp_sparrow.pdf http://www.eecs.berkeley.edu/~keo/talks/sparrow-sosp-talk.pdf 解决的问题现有的scheduler方案, 都是基于master的, 因为schedule必须要知道所有slave的情况, 然后才能决定... 阅读全文

posted @ 2014-01-14 13:41 fxjwind 阅读(1370) 评论(0) 推荐(0)

The Log: What every software engineer should know about real-time data's unifying abstraction

摘要：http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying主要的思想，将所有的系统都可以看作两部分，真正的数据log系统和各种各样的query engine 所有的一致性由log系统来保证，其他各种query engine不需要考虑一致性，安全性，只需要不停的从log系统来同步数据，如果数据丢失或crash可以从log系统replay来恢复可以看出kafka系统在linke... 阅读全文

posted @ 2013-12-18 10:41 fxjwind 阅读(1455) 评论(2) 推荐(0)

fxjwind

随笔分类 - Big data