levelDB跳表实现

跳表的原理就是利用随机性建立索引，加速搜索，并且简化代码实现难度。具体的跳表原理不再赘述，主要是看了levelDB有一些实现细节的东西，凸显自己写的实现不足之处。

去除冗余的key

  template<typename Key, class Comparator>
  struct SkipList<Key,Comparator>::Node {
    explicit Node(const Key& k) : key(k) { }

    Key const key;

    // Accessors/mutators for links.  Wrapped in methods so we can
    // add the appropriate barriers as necessary.
    Node* Next(int n) {
      assert(n >= 0);
      // Use an 'acquire load' so that we observe a fully initialized
      // version of the returned Node.
      return reinterpret_cast<Node*>(next_[n].Acquire_Load());
    }
    void SetNext(int n, Node* x) {
      assert(n >= 0);
      // Use a 'release store' so that anybody who reads through this
      // pointer observes a fully initialized version of the inserted node.
      next_[n].Release_Store(x);
    }

    // No-barrier variants that can be safely used in a few locations.
    Node* NoBarrier_Next(int n) {
      assert(n >= 0);
      return reinterpret_cast<Node*>(next_[n].NoBarrier_Load());
    }
    void NoBarrier_SetNext(int n, Node* x) {
      assert(n >= 0);
      next_[n].NoBarrier_Store(x);
    }

   private:
    // Array of length equal to the node height.  next_[0] is lowest level link.
    port::AtomicPointer next_[1];
  };

这里使用一个Node节点表示所有相同key，不同高度的节点集合，仅保留了key和不同高度的向右指针，并且使用NewNode来动态分配随即高度的向右指针集合，而next_就指向这指针集合。这也是c/c++ tricky的地方。

  #include <stdio.h>
  struct Node {
  	char str[1];
  };
  int main() {
  	char* mem = new char[4];
  	for (int i = 0; i < 4; i++) {
  		mem[i] = i + '0';
  	}	
  	Node* node = (Node*)mem;
  	char* const pstr = node->str;
  	for (int i = 0; i < 4; i++) {
  		printf("%c", pstr[i]);
  	}
  	return 0;
  }

就像上面这个简单的sample，成员str可以作为指针指向从数组下标0开始的元素，并且不受申明时的限制，不局限于大小1，索引至分配的最大的内存地址。

简易随机数生成

  uint32_t Next() {
      static const uint32_t M = 2147483647L;   // 2^31-1
      static const uint64_t A = 16807;  // bits 14, 8, 7, 5, 2, 1, 0
      // We are computing
      //       seed_ = (seed_ * A) % M,    where M = 2^31-1
      //
      // seed_ must not be zero or M, or else all subsequent computed values
      // will be zero or M respectively.  For all other values, seed_ will end
      // up cycling through every number in [1,M-1]
      uint64_t product = seed_ * A;

      // Compute (product % M) using the fact that ((x << 31) % M) == x.
      seed_ = static_cast<uint32_t>((product >> 31) + (product & M));
      // The first reduction may overflow by 1 bit, so we may need to
      // repeat.  mod == M is not possible; using > allows the faster
      // sign-bit-based test.
      if (seed_ > M) {
        seed_ -= M;
      }
      return seed_;
  }

可以看到，他使用A和M对种子进行运算，达到一定数据范围内不会重复的数集，而里面对于(product % M)，使用(product >> 31) + (product & M)进行运算优化，考虑右移和与操作的代价远小于取余操作。

简洁清晰的私有帮助方法，帮助寻找小于指定key的节点

  template<typename Key, class Comparator>
  typename SkipList<Key,Comparator>::Node*
  SkipList<Key,Comparator>::FindLessThan(const Key& key) const {
    Node* x = head_;
    int level = GetMaxHeight() - 1;
    while (true) {
      assert(x == head_ || compare_(x->key, key) < 0);
      Node* next = x->Next(level);
      if (next == NULL || compare_(next->key, key) >= 0) {
        if (level == 0) {
          return x;
        } else {
          // Switch to next list
          level--;
  	  }
      } else {
        x = next;
      }
    }
  }

posted @ 2016-03-02 23:22 Srggggg 阅读(635) 评论(0) 收藏举报

刷新页面返回顶部

Blue Bird Singing

levelDB跳表实现

公告