OpenStack中的Multipath faulty device的成因及解决(part 2)

| 版权：本文版权归作者和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接。如有问题，可以邮件：wangxu198709@gmail.com

简介

在上次的文章OpenStack中的Multipath faulty device的成因及解决(part 1)中，我详细解释了fault device的成因，这篇文章重点介绍下os-brick中是如何在并发的情况下，通过哪些具体的实现避免了faluty device的形成。

在讲具体实现前，有必要提到Linux上SCSI Block device（块设备）地址（寻址）的一些细节。

Linux kernel中通过如下的层次来定位特定的LUN:

SCSI adapter number [host]
channel number [bus]
id number [target]
lun [lun]

更多细节可以参考[SCSI Addressing]，也就是说，一个LUN可以用 [host-bus(channel)-target-lun] 来表示。

Linux每连接一个iscsi target，kernel都会在本地的 /sys/class/iscsi_host/host*/device/session 建立对应的目录结构，用来表示一个SCSI的设备。

$ ls -l /sys/class/iscsi_host/host3/device/session1/
total 0
drwxr-xr-x 4 root root    0 Apr 21 21:54 connection1:0
drwxr-xr-x 3 root root    0 Apr 21 21:54 iscsi_session
drwxr-xr-x 2 root root    0 Apr 21 21:55 power
drwxr-xr-x 5 root root    0 Apr 21 21:54 target3:0:0
-rw-r--r-- 1 root root 4096 Apr 21 21:54 uevent

上面的 3:0:0 就是一个iSCSI target所在host:channel:target

BTW: 如果你看不到如上的目录结构，你应该先要连接一个iSCSI target，下面是我连接的target：

$ sudo iscsiadm -m session
tcp: [1] 172.17.0.2:3260,1 tgt1 (non-flash)

方案

由于在上篇已经介绍过，os-brick使用的是连接(connect_volume)和断开(disconnect_volume)的时候，分别使用了 multipath -r 和 iscsiadm -m session -R

以上的命令会造成所有的iSCSI target对应的BUS的所有LUN都会被扫描一遍。

os-brick就对症下药，根据用户要连接的target和LUN，缩小扫描范围，只扫描特定target上的特定LUN。

具体的过程如下：

1. 首先根据用户的输入的session id和LUN id找到对应的h-c-t-l（代码LINK）：

 1     def get_hctl(self, session, lun):
 2         """Given an iSCSI session return the host, channel, target, and lun."""
 3         glob_str = '/sys/class/iscsi_host/host*/device/session' + session
 4         paths = glob.glob(glob_str + '/target*')
 5         if paths:
 6             __, channel, target = os.path.split(paths[0])[1].split(':')
 7         # Check if we can get the host
 8         else:
 9             target = channel = '-'
10             paths = glob.glob(glob_str)
11 
12         if not paths:
13             LOG.debug('No hctl found on session %s with lun %s', session, lun)
14             return None
15 
16         # Extract the host number from the path
17         host = paths[0][26:paths[0].index('/', 26)]
18         res = (host, channel, target, lun)
19         LOG.debug('HCTL %s found on session %s with lun %s', res, session, lun)
20         return res

上面的参数session就是 tcp: [1] 172.17.0.2:3260,1 tgt1 (non-flash) 中的[1],lun就是要连接的LUN的ID，一般由Cinder driver提供。

对于我的这个session，LUN=1对应的hctl为： HCTL ('3', '0', '0', 1) found on session 1 with lun 1

2. 扫描时使用上面的htcl:（代码link）

 1     def scan_iscsi(self, host, channel='-', target='-', lun='-'):
 2         """Send an iSCSI scan request given the host and optionally the ctl."""
 3         LOG.debug('Scanning host %(host)s c: %(channel)s, '
 4                   't: %(target)s, l: %(lun)s)',
 5                   {'host': host, 'channel': channel,
 6                    'target': target, 'lun': lun})
 7         self.echo_scsi_command('/sys/class/scsi_host/host%s/scan' % host,
 8                                '%(c)s %(t)s %(l)s' % {'c': channel,
 9                                                       't': target,
10                                                       'l': lun})

在log里面会看到类似的tee开头的scsi command，作用跟 echo '0 0 1' | tee -a /sys/class/scsi_host/host3/scan 一样，让kernel做一个小范围的host scan。

这样只有用户想要的一个LUN会被scan出来，而无关的LUN是不会被扫描出来，从而避免了fault device的形成。

参考资料

[SCSI Addressing]: http://www.tldp.org/HOWTO/SCSI-2.4-HOWTO/scsiaddr.html

[os-brick]: https://github.com/openstack/os-brick/

[Refactor iSCSI connect]: https://github.com/openstack/os-brick/commit/56c8665d3d342ce90f5d9433966c0f244063b4c1

posted @ 2018-04-22 11:21 孤独的居士阅读(1070) 评论(1) 收藏举报

刷新页面返回顶部

孤独的居士

OpenStack中的Multipath faulty device的成因及解决(part 2)

简介

方案

1. 首先根据用户的输入的session id和LUN id找到对应的h-c-t-l（代码LINK）：

2. 扫描时使用上面的htcl:（代码link）

公告