Storm,yahoo!S4比较
hadoop变得越来越热门,但是hadoop的设计是用来处理静态数据和批处理任务,流处理实施起来不是很方便,有些困难。而目前存在的比较普遍的分布式流处理框架有Storm和S4。两者各有特点,以下大致列出了网上对两者的比较,以便根据不同的任务或需求来选择合适的框架。
1.目前主要开源大数据解决方案
解决方案 | 开发商 | 类型 | 描述 |
---|---|---|---|
Storm | 流式处理 | Twitter 的新流式大数据分析解决方案 | |
S4 | Yahoo! | 流式处理 | 来自 Yahoo! 的分布式流计算平台 |
Hadoop | Apache | 批处理 | MapReduce 范式的第一个开源实现 |
Spark | UC Berkeley AMPLab | 批处理 | 支持内存中数据集和恢复能力的最新分析平台 |
Disco | Nokia | 批处理 | Nokia 的分布式 MapReduce 框架 |
HPCC | LexisNexis | 批处理 | HPC 大数据集群 |
PS:目前大数据处理框架的概述http://pan.baidu.com/share/link?shareid=828559866&uk=2248644272
spark的介绍:http://blog.csdn.net/dellme99/article/details/17076045
2.大致区别
Summary.
There are many other differences, but for sake of brevity I just present a short summary of the pros of each platform that the other one lacks.
S4 pros:
- Clean programming model.
- State recovery.
- Inter-app communication.
- Classpath isolation.
- Tools for packaging and deployment.
- Apache incubation.
Storm pros:
- Pull model.
- Guaranteed processing.
- More mature, more traction, larger community.
- High performance.
- Thread programming support.
- Advanced features (transactional topologies, Trident).
3.Storm is just awesome with a perfect blend of open source technologies used in the architecture. It is very easy to write real time distributed application on storm than S4 with high performance.
参考文献:
[1]:http://www.ibm.com/developerworks/cn/opensource/os-twitterstorm/
[2]:http://gdfm.me/2013/01/02/distributed-stream-processing-showdown-s4-vs-storm/