摘要:http://tez.incubator.apache.org/ http://dongxicheng.org/mapreduce-nextgen/apache-tez/ http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/ Tez aims to be a general purpose execut...
阅读全文
摘要:http://www.socc2013.org/home/program http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/ Hadoop V1.0的问题 Hadoop被发明的时候是用于index海量的web crawls, 所以它很适应那个场景, 但是现在Hadoop被当作一种通用的计算平台, 这个已经...
阅读全文
摘要:Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters 当前的流处理方案, Yahoo!’s S4, Twitter’s Storm, 都是采用传统的"record at-a-time”处理模式, 当收到一条record, 或者更新状态, 或者产生新...
阅读全文
摘要:http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/ Overview In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter’s Storm, Yahoo’...
阅读全文
摘要:Probabilistic Data Structures for Web Analytics and Data Mining 对于big data经常需要做如下的查询和统计, Cardinality Estimation (基数或势), 集合中不同元素的个数, 比如, 独立访客(Unique Vi
阅读全文
摘要:论文 Megastore: Providing Scalable, Highly Available Storage for Interactive Services http://blog.sciencenet.cn/blog-449420-444736.html 1. INTRODUCTION Interactive online services are forcing ...
阅读全文
摘要:The Chubby lock service for loosely-coupled distributed systems http://research.google.com/archive/chubby-osdi06.pdf http://blog.sina.com.cn/s/blog_5eb8ebcb0101dkvj.html http://blog.csdn.net/histor...
阅读全文
摘要:A brief history of Consensus_ 2PC and Transaction Commit (译) 对于一致性问题很好的综述 Time Clocks and the Ordering of Events in a Distributed System(译) --Leslie Lamport 偏序和全序 Lamport的“Time, Clocks and th...
阅读全文
摘要:The Part-Time Parliament,Lamport,1998,ACM Transactions on Computer Systems. 晦涩的原文 http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf Paxos Made Simple http://www.cs.utexas...
阅读全文
摘要:http://horicky.blogspot.com/2009/11/nosql-patterns.html A brief history of Consensus_ 2PC and Transaction Commit (译) 对于一致性问题很好的综述 2 Phase Commit(译) Master Slave (or Single Master)Model Under t...
阅读全文
摘要:http://spark-project.org/ 项目首页 http://shark.cs.berkeley.edu/ shark项目主页 Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 为什么需要Spark? 当前已经有比较多的compu...
阅读全文
摘要:http://incubator.apache.org/mesos/research.html, Mesos Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center 为什么需要Mesos? 现在有越来越多的compute framework, 并且每个framework都有自己的适用场景和优缺点...
阅读全文
摘要:http://the-paper-trail.org/blog/columnar-storage/ You’re going to hear a lot about columnar storage formats in the next few months, as a variety of distributed execution engines are beginning to c...
阅读全文
摘要:http://kafka.apache.org/07/design.html中文版的设计文档, http://www.oschina.net/translate/kafka-designOverviewUse cases for activity stream and operational data"动态汇总(News feed)"功能。将你朋友的各种活动信息广播给你相关性以及排序。通过使用计数评级(count rating)、投票(votes)或者点击率( click-through)判定一组给定的条目中那一项是最相关的.安全:网站需要屏蔽行为不端的网络爬虫(crawl
阅读全文
摘要:Kafka Refer–http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf- http://incubator.apache.org/kafka- http://prezi.com/sj433kkfzckd/kafka-bringing-reliable-stream-processing-to-a-cold-dark-world/- http://sna-projects.com/blog/2011/08/kafka/- http://sna-proj
阅读全文
摘要:Why?关系型数据库仍然作为主要的primary data store的方案 Relational Databases have been around for a long time and have become a trusted storage medium for all of a company's data. 传统的数据仓库的ETL和OLAP方案 Data is pulled off this primary data store, transformed, and then stored in a secondary data store, such as a...
阅读全文
摘要:http://highscalability.com/blog/2010/8/4/dremel-interactive-analysis-of-web-scale-datasets-data-as-a.html http://www.yankay.com/google-dremel-rational
阅读全文
摘要:http://esper.codehaus.org/tutorials/tutorial/tutorial.htmlhttp://esper.codehaus.org/esper-4.6.0/doc/reference/en-US/html/index.htmlhttp://www.slideshare.net/hemapani/siddhi-a-second-look-at-complex-event-processing-implementationsEsper Reference Version 4.6.01.1. Introduction to CEP and event stream
阅读全文
摘要:The Google File System http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.789&rep=rep1&type=pdf http://www.dbthink.com/?p=501, 中文翻译 Google牛人云集的地方, 但在设计系统时, 却非常务实, 没有采用什么复杂和时髦的算法和机制 ...
阅读全文
摘要:bigtable: A Distributed Storage System for Structured Data http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf http://www.dbthink....
阅读全文