feisky

云计算、虚拟化与Linux技术笔记
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Amazon EBS 特性、故障和设计

Posted on 2012-03-29 10:30  feisky  阅读(2764)  评论(0编辑  收藏  举报
Amazon EBS全称是Elastice Block Store, 功能是为EC2提供可靠的块设备存储, 对于用户来说EBS就是一个磁盘, 磁盘上面可以安装数据库、文件系统等各种应用,EBS主要用在亚马逊RDS服务。

EBS的特性
  • EBS是可靠存储设备, 支持在同一AZ内部复制。AFR(annual failure fate)是0.1%到0.5%,低于普通磁盘4%.
  • 支持在线创建point-in-time快照,快照以copy-on-write方式实现,快照之后被修改的数据块才真正占用空间,快照数据以压缩方式存储到S3。支持从快照创建volume,采用延时加载机制,即真正被访问时,才从S3加载数据块。
  • EBS最大的缺点是性能方面无SLA,原因是EBS采用了1G以太网、低端存储设备,而且大量用户共享iops, 难免相互干扰。EBS的性能:
    • EBS响应时间一般是100ms以下(见 iostat的svctm统计)
    • 读吞吐率平均40MB左右,但是浮动范围较大20MB ~80MB,都有可能
    • iops在7000 ~ 200, 论坛回帖说78次也算正常范围内
2011年4月EBS发生了一次大规模宕机事件, 事后amazon对该次事件进行了总结[1],总结报告中透露了一点EBS的实现设计:
  • EBS由两个部分组成:一是一组EBS对等集群,EBS集群负责存储volume,并提供读写服务, 单个EBS集群运行在同一个AZ中; 二是一组控制服务器,用于管理EBS集群。 
  • EBS 节点之间通过两个网络连接,主网络吞吐率较大,用于数据访问,另外一个是备用网络,用于保障节点之间通讯可靠性。
  • EBS没有使用EMC、NetAPP等高端存储,而是使用普通 服务器,1Gbps网络,通过冗余保证数据可靠性。采用主从复制技术, 为保证强一致性, 主可读可写, 从只能写,不能读。 
  •  当EBS节点发现连不上其他节点时,就认为连不上的节点失效,启动re-mirror流程。为保证数据安全性,只有等到所有副本都re-mirror完成之后,失效节点的空间才能被重复利用;由于读写都要通过主,主复本失效时,则禁止数据访问,直到控制服务器选举出新的primary之后,数据才能被访问。
故障的还原
  • amazon运维升级某个AZ的网络时,一个误操作将主网络流量切换到备网,由于带宽不够,备网瘫痪,结果是每个EBS节点都无法与其它节点通讯。 当网络恢复正常时,这些EBS节点同时开始做re-mirror操作,集群的空闲空间迅速耗尽,由于无法分配空间,re-mirror出错。出错导致大量节点不停重试re-mirror,大量节点陷入re-mirror死循环,并最终导致re-mirror风暴发生,整个AZ中大约13%的EBS设备不可用。
  • 由于集群空闲空间耗尽,无法创建新volume , 而创建volume的超时时间很长,这导致控制集群累积了大量创建volume请求,响应变慢。而控制集群响应速度变慢影响到了region中的其他AZ(看起来AZ共享了控制集群)。
  • 此外,EBS节点存在并发问题,导致更多节点失效,re-mirror风暴更加严重了。
  • 为保证一致性,所有数据访问都要通过主本, 节点出错时,EC2、EBS节点、EBS控制服务器(扮演仲裁者角色)需要重新协商一主本, re-mirror过程产生了大量 协商请求,这给控制服务器带来了大量压力,  控制服务器几乎失去响应。
故障的解决
  •  为避免问题扩散, amazon的工作人员切断了出错AZ的网络,之后, region中其他AZ恢复正常。
  • 修改实现, 避免re-mirror风暴, 之后, AZ中仍然有13%的volume无法访问
  • 加磁盘空间, 使得出错volume能够完成re-mirror操作。为保证安全性,只有等到所有副本都re-mirror完成之后,失效节点的空间才能被重复利用,  这种设计是为了安全性考虑,当然也是造成空间不足的重要原因。只有加入新的磁盘,才能恢复出错的volume。本步骤之后,仍然剩余2.2%的volume无法恢复。
  • 通过工作人员手工创建的快照恢复,但是剩余1.04% volume无法访问(这些volume无法创建快照)。
  • 人工恢复,但是最终仍然有 0.07%的volume存在不一致问题。
吸取的教训
  • 运维自动化,减少失误
  • 预留更多空间,避免类似re-mirror风暴
  • re-mirror分配空间时,若分配失败,则采取指数退避算法,避免re-mirror风暴。
  • 解决了并发bug
  • 降低不同AZ之间存在依赖性
amazon没有公开EBS的实现原理,网上也鲜有相关资料,我认为EBS的设计起码要解决好如下问题:
  • 数据分片和定位:EBS 的volume大小在1G到1T之间,  volume是如何切片并映射到EBS集群, 映射表是怎么维护和查询的,是否可以只存储有效数据块?
  • 数据存储:  EBS节点如何存储和修改volume数据, 采用原地更新,还是采取类似LFS的非原地更新方式? 在线快照如何实现?
  • 集群成员关系: 维护EBS集群成员关系涉及到如何检测节点失效, 如何处理节点假死, 如何处理网络分区, 如何自动处理节点加入和节点失效?
  • 复制和一致性:  EBS实现了主从复制, 当主或者从宕机,或者同时宕机时,可能导致副本之间数据不一致, 如何检测并修复不一致? 如何选举主本? 
  • 性能SLA: EBS是基于以太网实现,且支持多租户。以太网稳定性较差, 响应时间波动大, 多租户导致IO隔离难度大, 如何充分保证响应时间和IOPS难度较大。
参考文献
[1]Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region。 http://aws.amazon.com/message/65648/

以上转自:http://peterylh.blog.163.com/blog/static/12033201221011205642/ 
 

下面是aws对EBS快照的介绍:

Amazon EBS Snapshots

Amazon EBS provides the ability to back up point-in-time snapshots of your data to Amazon S3 for durable recovery. Amazon EBS snapshots are incremental backups, meaning that only the blocks on the device that have changed since your last snapshot will be saved. If you have a device with 100 GBs of data, but only 5 GBs of data has changed since your last snapshot, only the 5 additional GBs of snapshot data will be stored back to Amazon S3. Even though the snapshots are saved incrementally, when you delete a snapshot, only the data not needed for any other snapshot is removed. So regardless of which prior snapshots have been deleted, all active snapshots will contain all the information needed to restore the volume. In addition, the time to restore the volume is the same for all snapshots, offering the restore time of full backups with the space savings of incremental.

Snapshots can also be used to instantiate multiple new volumes, expand the size of a volume or move volumes across Availability Zones. When a new volume is created, there is the option to create it based on an existing Amazon S3 snapshot. In that scenario, the new volume begins as an exact replica of the original volume. By optionally specifying a different volume size or a different Availability Zone, this functionality can be used as a way to increase the size of an existing volume or to create duplicate volumes in new Availability Zones. If you choose to use snapshots to resize your volume, you need to be sure your file system or application supports resizing a device.

New volumes created from existing Amazon S3 snapshots load lazily in the background. This means that once a volume is created from a snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your Amazon EBS volume before your attached instance can start accessing the volume and all of its data. If your instance accesses a piece of data which hasn’t yet been loaded, the volume will immediately download the requested data from Amazon S3, and then will continue loading the rest of the volume’s data in the background.

Amazon EBS shared snapshots allows you to share these snapshots, making it easy for you to share this data with your co-workers or others in the AWS community. With this feature, users that you have authorized can quickly use your Amazon EBS shared snapshots as the basis for creating their own Amazon EBS volumes. If you choose, you can also make your data available publicly to all AWS users. Users to whom you have granted access can create their own EBS volumes based on your snapshot; your original snapshot will remain intact. This is a great way for developers to easily share data with the rest of the Amazon EC2 community, and makes it easy for new customers to create Amazon EBS volumes from an existing snapshot. Because all the data is stored in the Amazon cloud, users don’t have to wait for time consuming downloads, and can access it within minutes. 

Snapshot storage is based on the amount of space your data consumes in Amazon S3. Because data is compressed before being saved to Amazon S3, and Amazon EBS does not save empty blocks, it is likely that the size of a snapshot will be considerably less than the size of your volume. For the first snapshot of a volume, Amazon EBS will save a full copy of your data to Amazon S3. However for each incremental snapshot, only the part of your Amazon EBS volume that has been changed will be saved to Amazon S3. 

无觅相关文章插件,快速提升流量