使用zookeeper封装组件curator的锁,发现zookeeper大量临时节点没有被删除

现象:zookeeper集群大量临时节点没有释放掉,导致集群响应很慢

分析过程:
通过工具排查,发现大量创建lock对象的节点没有释放,奇怪的是release的时候应该删除的呀!只能看源码罗。

private static final String LOCK_NAME = "lock-";

internals = new LockInternals(client, driver, path, lockName, maxLeases);

this.path = ZKPaths.makePath(path, lockName);

public static String makePath(String parent, String child)
{
    StringBuilder path = new StringBuilder();

    joinPath(path, parent, child);

    return path.toString();
}

发现了一段代码逻辑:InterProcessMutex加锁的时候,是创建'..../name/lock-'这样的节点,创建了一个父节点,父节点下面只有一个子节点,release的时候只删除了子节点'...../lock-','..../name'并没有删除,坑于此。明白了坑的地方,接下来就明了了:unlock的时候,顺便删除'..../name'这个节点。

思考为什么要这么设计? 先来看一下底层实现。发现InterProcessSemaphore也是这样设计的,但是为什么要设置子节点还是没有搞懂,直到我看到了InterProcessReadWriteLock的是现实,里面有两个字段

 private static final String READ_LOCK_NAME  = "__READ__";
 private static final String WRITE_LOCK_NAME = "__WRIT__";

再看到初始化读写锁的时候

public InterProcessReadWriteLock(CuratorFramework client, String basePath, byte[] lockData)
{
    lockData = (lockData == null) ? null : Arrays.copyOf(lockData, lockData.length);

    writeMutex = new InternalInterProcessMutex
    (
        client,
        basePath,
        WRITE_LOCK_NAME,
        lockData,
        1,
        new SortingLockInternalsDriver()
        {
            @Override
            public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception
            {
                return super.getsTheLock(client, children, sequenceNodeName, maxLeases);
            }
        }
    );

    readMutex = new InternalInterProcessMutex
    (
        client,
        basePath,
        READ_LOCK_NAME,
        lockData,
        Integer.MAX_VALUE,
        new SortingLockInternalsDriver()
        {
            @Override
            public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception
            {
                return readLockPredicate(children, sequenceNodeName);
            }
        }
    );
}          

明白了,读写锁需要在一个路径下创建两个节点。至此一切明了。

之前怀疑作者想做缓存,没有删除父节点,看了一下源码?原来时curator兼容老版本的bug,因为为了支持'CONTAINER'这种类型,如果zookeeper版本低,本来应该是临时节点的,被存储为了持久化节点

String    fixForNamespace(String path, boolean isSequential)
{
    if ( ensurePathNeeded.get() )
    {
        try
        {
            final CuratorZookeeperClient zookeeperClient = client.getZookeeperClient();
            RetryLoop.callWithRetry
            (
                zookeeperClient,
                new Callable<Object>()
                {
                    @Override
                    public Object call() throws Exception
                    {
                        ZKPaths.mkdirs(zookeeperClient.getZooKeeper(), ZKPaths.makePath("/", namespace), true, client.getAclProvider(), true);
                        return null;
                    }
                }
            );
            ensurePathNeeded.set(false);
        }
        catch ( Exception e )
        {
            client.logError("Ensure path threw exception", e);
        }
    }

    return ZKPaths.fixForNamespace(namespace, path, isSequential);
}
//这是创建父节点的入口
ZKPaths.mkdirs(zookeeperClient.getZooKeeper(), ZKPaths.makePath("/", namespace), true, client.getAclProvider(), true);

zookeeper.create(subPath, new byte[0], acl, getCreateMode(asContainers));
// asContainers = true 获取的CONTAINER类型的节点
return asContainers ? getContainerCreateMode() : CreateMode.PERSISTENT;

//兼容版本有段代码比较坑
try
        {
            localCreateMode = CreateMode.valueOf("CONTAINER");
        }
        catch ( IllegalArgumentException ignore )
        {
            localCreateMode = NON_CONTAINER_MODE;
            log.warn("The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.");
        }
低版本的没有这种类型,就处理成了
private static final CreateMode NON_CONTAINER_MODE = CreateMode.PERSISTENT;

这算是curator在处理版本兼容时的bug,踩了版本兼容的坑.

posted on 2017-11-01 12:00  岁月无痕之玻璃心  阅读(9856)  评论(1编辑  收藏  举报