随笔分类 -  Big data

上一页 1 2 3 4 下一页

Apache Tez Design
摘要:http://tez.incubator.apache.org/ http://dongxicheng.org/mapreduce-nextgen/apache-tez/ http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/ Tez aims to be a general purpose execut... 阅读全文

posted @ 2013-10-19 11:45 fxjwind 阅读(1826) 评论(0) 推荐(0)

YARN - Yet Another Resource Negotiator
摘要:http://www.socc2013.org/home/program http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/ Hadoop V1.0的问题 Hadoop被发明的时候是用于index海量的web crawls, 所以它很适应那个场景, 但是现在Hadoop被当作一种通用的计算平台, 这个已经... 阅读全文

posted @ 2013-10-18 11:11 fxjwind 阅读(941) 评论(0) 推荐(1)

Discretized Streams, 离散化的流数据处理
摘要:Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters 当前的流处理方案, Yahoo!’s S4, Twitter’s Storm, 都是采用传统的"record at-a-time”处理模式, 当收到一条record, 或者更新状态, 或者产生新... 阅读全文

posted @ 2013-09-22 15:42 fxjwind 阅读(1725) 评论(0) 推荐(1)

In-Stream Big Data Processing
摘要:http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/ Overview In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter’s Storm, Yahoo’... 阅读全文

posted @ 2013-08-30 17:58 fxjwind 阅读(1749) 评论(0) 推荐(0)

大数据处理中基于概率的数据结构
摘要:Probabilistic Data Structures for Web Analytics and Data Mining 对于big data经常需要做如下的查询和统计, Cardinality Estimation (基数或势), 集合中不同元素的个数, 比如, 独立访客(Unique Vi 阅读全文

posted @ 2013-08-29 15:21 fxjwind 阅读(10956) 评论(4) 推荐(1)

Megastore - Providing Scalable, Highly Available Storage for Interactive Services
摘要:论文 Megastore: Providing Scalable, Highly Available Storage for Interactive Services http://blog.sciencenet.cn/blog-449420-444736.html 1. INTRODUCTION Interactive online services are forcing ... 阅读全文

posted @ 2013-04-27 16:31 fxjwind 阅读(900) 评论(0) 推荐(0)

Chubby - lock service for loosely-coupled distributed systems
摘要:The Chubby lock service for loosely-coupled distributed systems http://research.google.com/archive/chubby-osdi06.pdf http://blog.sina.com.cn/s/blog_5eb8ebcb0101dkvj.html http://blog.csdn.net/histor... 阅读全文

posted @ 2013-04-27 14:09 fxjwind 阅读(2215) 评论(0) 推荐(0)

全序, 分布式一致性的本质
摘要:A brief history of Consensus_ 2PC and Transaction Commit (译) 对于一致性问题很好的综述 Time Clocks and the Ordering of Events in a Distributed System(译) --Leslie Lamport 偏序和全序 Lamport的“Time, Clocks and th... 阅读全文

posted @ 2013-04-13 11:24 fxjwind 阅读(3462) 评论(2) 推荐(1)

Paxos Made Simple
摘要:The Part-Time Parliament,Lamport,1998,ACM Transactions on Computer Systems. 晦涩的原文 http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf Paxos Made Simple http://www.cs.utexas... 阅读全文

posted @ 2013-04-11 15:45 fxjwind 阅读(1164) 评论(0) 推荐(0)

Strong Consistency, 强一致性技术概述
摘要:http://horicky.blogspot.com/2009/11/nosql-patterns.html A brief history of Consensus_ 2PC and Transaction Commit (译) 对于一致性问题很好的综述 2 Phase Commit(译) Master Slave (or Single Master)Model Under t... 阅读全文

posted @ 2013-04-03 17:36 fxjwind 阅读(1769) 评论(0) 推荐(0)

Spark - A Fault-Tolerant Abstraction for In-Memory Cluster Computing
摘要:http://spark-project.org/ 项目首页 http://shark.cs.berkeley.edu/ shark项目主页 Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 为什么需要Spark? 当前已经有比较多的compu... 阅读全文

posted @ 2013-03-30 14:46 fxjwind 阅读(2121) 评论(1) 推荐(0)

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
摘要:http://incubator.apache.org/mesos/research.html, Mesos Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center 为什么需要Mesos? 现在有越来越多的compute framework, 并且每个framework都有自己的适用场景和优缺点... 阅读全文

posted @ 2013-03-27 16:55 fxjwind 阅读(1034) 评论(0) 推荐(0)

Columnar Storage
摘要:http://the-paper-trail.org/blog/columnar-storage/ You’re going to hear a lot about columnar storage formats in the next few months, as a variety of distributed execution engines are beginning to c... 阅读全文

posted @ 2013-03-26 17:31 fxjwind 阅读(1162) 评论(0) 推荐(0)

Linkedin Kafka Design
摘要:http://kafka.apache.org/07/design.html中文版的设计文档, http://www.oschina.net/translate/kafka-designOverviewUse cases for activity stream and operational data"动态汇总(News feed)"功能。将你朋友的各种活动信息广播给你相关性以及排序。通过使用计数评级(count rating)、投票(votes)或者点击率( click-through)判定一组给定的条目中那一项是最相关的.安全:网站需要屏蔽行为不端的网络爬虫(crawl 阅读全文

posted @ 2013-03-22 14:45 fxjwind 阅读(1602) 评论(0) 推荐(0)

Kafka: a Distributed Messaging System for Log Processing
摘要:Kafka Refer–http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf- http://incubator.apache.org/kafka- http://prezi.com/sj433kkfzckd/kafka-bringing-reliable-stream-processing-to-a-cold-dark-world/- http://sna-projects.com/blog/2011/08/kafka/- http://sna-proj 阅读全文

posted @ 2013-03-19 17:45 fxjwind 阅读(2114) 评论(0) 推荐(0)

Linkedin Databus
摘要:Why?关系型数据库仍然作为主要的primary data store的方案 Relational Databases have been around for a long time and have become a trusted storage medium for all of a company's data. 传统的数据仓库的ETL和OLAP方案 Data is pulled off this primary data store, transformed, and then stored in a secondary data store, such as a... 阅读全文

posted @ 2013-03-05 18:17 fxjwind 阅读(2750) 评论(0) 推荐(1)

Dremel - Interactive Analysis of WebScale Datasets
摘要:http://highscalability.com/blog/2010/8/4/dremel-interactive-analysis-of-web-scale-datasets-data-as-a.html http://www.yankay.com/google-dremel-rational 阅读全文

posted @ 2012-11-21 17:33 fxjwind 阅读(1170) 评论(4) 推荐(0)

Esper Storm S4
摘要:http://esper.codehaus.org/tutorials/tutorial/tutorial.htmlhttp://esper.codehaus.org/esper-4.6.0/doc/reference/en-US/html/index.htmlhttp://www.slideshare.net/hemapani/siddhi-a-second-look-at-complex-event-processing-implementationsEsper Reference Version 4.6.01.1. Introduction to CEP and event stream 阅读全文

posted @ 2012-08-04 16:50 fxjwind 阅读(1995) 评论(0) 推荐(0)

GFS - The Google File System
摘要:The Google File System http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.789&rep=rep1&type=pdf http://www.dbthink.com/?p=501, 中文翻译 Google牛人云集的地方, 但在设计系统时, 却非常务实, 没有采用什么复杂和时髦的算法和机制 ... 阅读全文

posted @ 2012-07-17 17:00 fxjwind 阅读(9375) 评论(0) 推荐(0)

bigtable: A Distributed Storage System for Structured Data
摘要:bigtable: A Distributed Storage System for Structured Data http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf http://www.dbthink.... 阅读全文

posted @ 2012-07-07 17:46 fxjwind 阅读(2600) 评论(0) 推荐(0)

上一页 1 2 3 4 下一页