数据结构笔记1 搜索树

感觉一下子进度快了很多，中间有一些需要记的，还有一些偏工业的，现在补一补。

内容全部来自《算法导论》，啃了一遍之后觉得这本书真是强啊，理论上说用这个入门就没必要看别的书了.....也许这就是本意吧

这波主要是树相关，后面可能会持续更新，看我有没有空吧.....

是写给自己看的，也许会有一点电波

二叉搜索树(Binary Search Tree)

实际上是对链表和二分过程的结合。我们希望在保持二分性质的前提下支持快速插入和删除某些元素，这样的需求推动了BST的出现

定义

严格的二叉搜索树定义如下：

定义一个节点有lch,rch,fa,key这四个基础的指针，分别表示节点的左右孩子、父亲、键值(通常是我们所关心的数据)，还会有一些所谓卫星数据(也就是我们不太关心的但与节点相关数据)
单个节点是BST
\(T\)是BST当且仅当\(T\)的左右子树都是BST，且左子树\(T_L\)中的key全部小于\(T\)的根的key，\(T_R\)的全部key大于\(T\)的根的key

这里实际上是简化了讨论存在重复键值的情况，即我们认为每个key是唯一的。

操作

既然是数据结构就要有操作。BST支持经典的增删查改，还可以做第查找k大/第k小

查找/修改

实际上就是在序列上二分

前驱/后继

这俩是对称的。

设要找\(p\)的前驱，就先找到\(p\)，再分两种情况：

\(p\)有左儿子，那么就是\(p\)左子树的最大值
\(p\)没有左儿子，那么\(p\)的前驱一定是\(p\)的某个祖先（why?），向上跳直到当前节点是父节点的右儿子就好了

插入

实际上也是在序列上二分，然后用树的插入就好了

删除

分三种情况，设要删掉\(p\)

\(p\)是叶子，直接删掉就可以啦
\(p\)有一个孩子\(q\)，那么就删掉\(p\)，让\(p\)的父亲认\(q\)做新的儿子
\(p\)有两个孩子，那么就查找\(p\)的前驱(比他小的最大值)，交换二者key，问题规约为删除\(q\)。由\(q\)是\(p\)的前驱可知\(q\)没有右儿子(反证法就可以得到)

前两个都是简单的，第三个需要证明一下交换再删除的操作不改变BST的定义，也不太难

第k大

做这个要维护一下每棵子树的大小，再用k去二分就好了。

基本性质

size为\(n\)的BST的高度最小为\(\log_2(n)\)。直接由二叉树得出
BST的中序遍历是按key有序的。这个归纳即可得到
高度为\(h\)的BST的单次查找、删除、插入复杂度都是\(\Theta(h)\)的
随机数列生成的BST高度期望为\(\log_2(n)\)

B-树(B-Tree)

这里的“-”是连接符，不念“B减树”

B树的本质是索引树。在计算机中，不同层级的储存之间的速度存在显著差异。通常大量的数据储存在外存（硬盘）中，内存只保留对数据的索引。这类问题的硬盘读写时间远大于CPU时间，因此我们主要关心前者。

定义

一个节点有size个键值(key value)，有size+1个儿子节点，同时记录父节点parent
所有的键值按照顺序排放
任意节点x的键值key_i，若x是x.parent的第i个孩子，则必有x.parent.key_{i-1}<x.key_i<x.parent.key_{i}
所有叶子有相同的深度
除根外，每个节点的键值数量有上界和下界，即\([t-1,2t-1]\)，\(t\) 被称为最小度数

同样我们认为key唯一

操作

仍然是增删查改。多了分裂(split)和合并(merge)

查找

由定义2和定义3可知，我们在节点上枚举就好了（也可以二分）

插入

B树所有的插入都在叶子处进行。首先查找找到叶子，插入，然后视情况进行分裂(split)

分裂

首先定义满节点：我们称x是满的当且仅当x有恰好2t-1个键值

由B树的定义，当插入的叶子已满的时候，我们需要分裂该叶子。

具体说分裂x就是取出x的中间键值插入到x.parent中，再把x拆成xl，xr分别设为中间键值的左右儿子即可。

注意自下而上的分裂需要递归处理祖先的分裂情况，也可以在插入前预分裂来避免回溯分裂。

删除

首先定义富裕节点：我们称x富裕当且仅当x有多于t个键值

先找到对应键值的节点x

若x不是叶子，则类似BST找到处于叶子的后继（why？）交换，变成删后继，转到2
若x是叶子且富裕，则直接删除
若x是叶子且不富裕，则分情况讨论：
1. （我称为找兄弟借钱）x的兄弟（定义为在父亲中与x相邻的儿子）富裕，则取兄弟的最值给父亲，再把父亲中对应俩儿子的键值下放给x，此时x就富裕了
2. （我称为找兄弟搭伙）x的兄弟都不富裕，则取x、x的兄弟、父亲中对应俩儿子的键值三者合并为新的节点x'，挂在父亲下，递归维护父亲的性质5

很容易发现删除的3-2和插入的分裂是反着来的

性质

B树的每个节点储存着一整个页(page)的数据，因此读写整个节点的数据的时间是最小单位，也就是我们认为可以一次性完成。

树高是\(\Theta(\log_t(n))\) 级别的
单次插入和删除的硬盘复杂度都是\(\Theta(h)\)级别的，CPU时间复杂度是\(\Theta(ht)\)级别的

类似的延伸还有B+树（叶子是双向链表）、B*树（对单个节点的利用率要求更高）等等

代码实现

丢一段代码，和multiset拍了一下，应该是没有问题的

#include <iostream>
#include <algorithm>

#define rel(x) do { if (x != nullptr) delete x; x = nullptr; } while (0)

template <typename T_keys, typename T_data, int M> class BTree;

template <typename T> void read( T & );

template <typename T_data, typename T_keys, int M> class BTreeNode {
	typedef BTreeNode <T_data, T_keys, M> Node;
	typedef std:: pair <Node *, int> NodePos;

public:

	BTreeNode( bool is_leaf = false, Node *parent = nullptr ): parent( parent ), is_leaf( is_leaf ), size(0 ) {
		for ( int i = 0; i <= M; ++ i ) this->p[i] = nullptr;
		for ( int i = 0; i < M; ++ i ) {
			this->keys[i] = 0;
		}
	}

	bool isLeaf() {
		return this->is_leaf;
	}

	bool isRoot() {
		return this->parent == nullptr;
	}

	bool isFull() {
		return this->size == M;
	}

	// return true if node is rich enough to delete a key
	bool isRich() {
		return this->size > ( M / 2 );
	}

	bool isValid() {
		for ( int i = 0; i <= size; ++ i ) {
			if ( p[i] && p[i]->parent != this ) return false;
		}
		if ( size < ( M / 2 ) || size > M ) return false;
		return true;
	}

	// returns i if i is the first Node.keys[i] >= key, -1 if there's no such key
	int findKey( const T_keys &key ) {
		for ( int i = 0; i < this->size; ++ i ) {
			if ( key <= this->keys[i]) return i;
		}
		return this->size;
	}

	int findWhich( Node *x ) {
		for (int i = 0; i <= size; ++ i) {
			if (p[i] == x) return i;
		}
		return -1;
	}

	void insert_front( const T_keys &key, Node *inp ) {
		for ( int i = this->size; i > 0; -- i ) {
			this->keys[i] = this->keys[i - 1];
			this->p[i + 1] = this->p[i];
		}
		this->p[1] = this->p[0];
		this->p[0] = inp;
		this->keys[0] = key;
		if (inp != nullptr) {
			inp->parent = this;
		}
		this->size ++;
	}

	void insert_back( const T_keys &key, Node *inp ) {
		this->keys[size] = key;
		this->p[++ size] = inp;
		if (inp != nullptr) {
			inp->parent = this;
		}
	}

	// insert key & data in leaf nodes
	void insert( const T_keys &key, Node *inp = nullptr ) {
		int pos = this->findKey( key );
		for ( int i = this->size; i > pos; -- i ) {
			this->keys[i] = this->keys[i - 1];
			this->p[i + 1] = this->p[i];
		}
		this->keys[pos] = key;
		this->p[pos + 1] = inp;
		this->size ++;
		if (inp != nullptr) {
			inp->parent = this;
		}
	}

	// remote node( key, datum ) and set split nodes to be its children
	void insert_remote( const T_keys &key, Node *lch, Node *rch ) {
		int pos = this->findKey( key );
		for ( int i = this->size; i > pos; -- i ) {
			this->keys[i] = this->keys[i - 1];
			this->p[i + 1] = this->p[i];
		}
		this->keys[pos] = key;
		this->p[pos] = lch;
		this->p[pos + 1] = rch;
		this->size ++;
	}

	void remove_front() {
		for (int i = 0; i <= size; ++ i) {
			keys[i] = keys[i + 1];
			p[i] = p[i + 1];
		} size --;
	}

	void remove_back() {
		keys[size - 1] = 0;
		p[size] = nullptr;
		size--;
	}

	// remove keys[pos] in leaf node
	void removeByPos(const int &pos) {
		for (int i = pos; i < size; ++ i) {
			keys[i] = keys[i + 1];
			p[i] = p[i + 1];
		}
		keys[size] = 0;
		p[size --] = nullptr;
	}

	// split node *this
	Node *split() {
		if ( !this->isFull() ) return nullptr;
		
		// if this is root, then split this and set a new root
		if ( this->isRoot() ) {
			this->parent = new Node( false, nullptr );
		}
		
		Node *newNode = new Node( this->isLeaf() , this->parent );
		int mid = M / 2;

		T_keys mid_key = this->keys[mid];

		this->keys[mid] = 0;

		this->parent->insert_remote( mid_key, this, newNode );

		for ( int i = mid + 1; i < this->size; ++ i ) {
			newNode->keys[i - mid - 1] = this->keys[i];
			newNode->p[i - mid - 1] = this->p[i];
			if ( this->p[i] != nullptr ) this->p[i]->parent = newNode;

			this->p[i] = nullptr;
			this->keys[i] = 0;
		}
		newNode->p[M / 2] = this->p[this->size];
		if ( this->p[this->size] != nullptr ) this->p[this->size]->parent = newNode;
		this->p[this->size] = nullptr;

		this->size = newNode->size = M / 2;
		return newNode;
	}

	NodePos minimum() {
		if ( p[0] == nullptr ) return NodePos( this, 0 );
		return p[0]->minimum();
	}

	NodePos maximum( Node *x ) {
		if ( p[size] == nullptr ) return NodePos( this, size );
		return p[size]->minium();
	}

	void mergeLeft( int pos, Node *&right ) {
		keys[size] = parent->keys[pos];
		
		for (int i = 0; i < right->size; ++ i) {
			keys[size + i + 1] = right->keys[i];
			p[size + i + 1] = right->p[i];
			if (right->p[i] != nullptr) right->p[i]->parent = this;

			right->keys[i] = 0;
			right->p[i] = nullptr;
		}
		
		size += 1 + right->size;
		p[size] = right->p[right->size];
		right->p[right->size] = nullptr;
		if (p[size] != nullptr) p[size]->parent = this;
		rel(right);

		parent->removeByPos(pos);
	}

private:

	Node *p[M + 2], *parent;
	T_keys keys[M + 1];
	bool is_leaf;
	// the number of keys in a Node*
	int size;
	
	friend class BTree <T_keys, T_data, M>;
} ;

template <typename T_keys, typename T_data, int M> class BTree {
	typedef BTreeNode <T_keys, T_data, M> Node;
	typedef std:: pair <Node *, int> NodePos;
	
public:

	BTree() {
		this->root = new Node( true, nullptr );
	}

	NodePos find( const T_keys &key ) {
		for ( Node *x = root, *res = nullptr; x != nullptr; ) {
			int pos = x->findKey( key );
			Node *next = x->p[pos];
			int next_key = x->keys[pos];

			res = x->split();
			if (x == root && res != nullptr) {
				root = root->parent;
			}
			if ( next_key == key ) {
				if (res == nullptr) {
					return NodePos( x, pos );
				} else {
					if ( pos < (M / 2) ) {
						return NodePos( x, pos );
					} else if ( pos > (M / 2) ) {
						return NodePos( res, pos - (M / 2) - 1 );
					} else {
						return NodePos( x->parent, x->parent->findKey( next_key ) );
					}
				}
			} else if ( x->isLeaf() ) {
				break;
			}
			x = next;
		}
		return NodePos( nullptr, -1 );
	}

	void insert( T_keys key ) {
		Node *x = root, *res = nullptr;
		while ( !x->isLeaf() ) {
			int pos = x->findKey( key );
			Node *next = x->p[pos];

			res = x->split();
			if ( x == root && res != nullptr ) {
				root = root->parent;
			}

			x = next;
		}

		int pos = x->findKey( key );
		res = x->split();
		if ( x == root && res != nullptr ) {
			root = root->parent;
		}
		if ( res == nullptr || pos <= ( M / 2 ) ) {
			x->insert( key );
		} else {
			res->insert( key );
		}
	}

	void adjust(Node *x) {
		if ( x->isValid() ) return ;
		if ( x->isRoot() ) {
			if ( x->size == 0 ) {
				root = x->p[0];
				rel(root->parent);
			}
			return ;
		}
		int pos = x->parent->findWhich( x );
		Node *big_brother = x->parent->p[pos + 1];
		Node *little_brother = (pos > 0) ? (x->parent->p[pos - 1]) : nullptr;

		if ( big_brother != nullptr && big_brother->isRich() )
		{
			x->insert_back( x->parent->keys[pos], big_brother->p[0] );
			std:: swap( x->parent->keys[pos], big_brother->keys[0] );
			big_brother->remove_front();
		} 
		else if ( little_brother != nullptr && little_brother->isRich() ) 
		{
			x->insert_front( x->parent->keys[pos - 1], little_brother->p[little_brother->size] );
			std:: swap( x->parent->keys[pos - 1], little_brother->keys[little_brother->size - 1] );
			little_brother->remove_back();
		} 
		else {
			// neither brothers are rich, merge
			if ( big_brother != nullptr ) {
				std:: swap(x->parent->p[pos], x->parent->p[pos +1]);
				x->mergeLeft( pos, big_brother );
			} else {
				x->parent->p[pos] = little_brother;
				little_brother->mergeLeft( pos - 1, x );
				x = little_brother;
			}
			adjust(x->parent);
		}
	}

	void remove(NodePos p) {
		Node *x = p.first;
		// the position of x in its parent
		int pos = p.second;
		if ( !x->isLeaf() ) {
			NodePos successor = x->p[pos + 1]->minimum();
			Node *sx = successor.first; int spos = successor.second;
			std:: swap( sx->keys[spos], x->keys[pos] );
			remove( successor );
		} else {
			bool x_isRich = x->isRich();
			x->removeByPos( pos );
			if ( !x->isRoot() && !x_isRich ) {
				adjust( x );
			}
		}
	}

	void remove(T_keys key) {
		NodePos p = find(key);
		remove( p );
	}

	void preOrder() {
		myPreOrder( this->root );
	}

	bool isValid() {
		return myIsValid( this->root );
	}

private:

	void myPreOrder( Node *x ) {
		if ( x == nullptr ) return ;
		if ( x->isLeaf() ) {
			for ( int i = 0; i < x->size; ++ i ) {
				std:: cout << x->keys[i] << " ";
			}
			return ;
		}
		for ( int i = 0; i <= x->size; ++ i ) {
			myPreOrder( x->p[i] );
			if ( i != x->size ) {
				std:: cout << x->keys[i] << " ";
			}
		}
	}

	bool myIsValid( Node *x ) {
		bool flag = x->isValid() | (x == root);
		if ( x->isLeaf() ) return flag;
		for ( int i = 0; i < x->size; ++ i ) {
			if (x->p[i] == nullptr || x->p[i]->parent != x) return false;
			flag &= myIsValid( x->p[i]);
		}
		return flag;
	}

	friend class BTreeNode <T_keys, T_data, M>;
	Node *root;
} ;

template <typename T> void read( T &x ) {
	T v = 1; x = 0; char ch = getchar();
	for (; ch < '0' || ch > '9'; v = ( ch == '-' ) ? -1 : v, ch = getchar() );
	for (; ch <= '9' && ch >= '0'; x = x * 10 + ch - '0', ch = getchar() );
	x *= v;
}

int main( void ) {
	freopen("data.in","r",stdin );
	freopen("myp.out","w",stdout );
	BTree <int, int, 5> *T = new BTree <int, int, 5> ();
	int n;
	for ( read(n); n --; ) {
		int opt, x;
		read( opt );
		read( x );
		switch ( opt ) {
			case 1: {
				T->insert( x );
				break;
			}
			case 2: {
				T->remove( x );
				break;
			}
		}
		T->preOrder(); std:: cout << std:: endl;
	}
	return 0;
}

红黑树(RB-Tree)

BST的一大问题就是对于单调的数据容易退化成链表，复杂度没有保证，因此就引入了平衡二叉树的概念。

之前写过splay写过treap写过scapegoat，但是真正工业级的平衡树还是红黑树用的多（why？）

红黑树实际上是一种特殊的B-树，也称为2-3-4树。从这个角度来看，就很好理解了。

定义

红黑树是一棵二叉平衡搜索树(BST)，其满足：

每个节点都有唯一的颜色：红r或黑b
根节点是黑色的
任意红色节点不相邻
任意节点v为根的子树中的所有叶子到v的路径上黑色点数量都相同

平衡分析

关键在于性质3和4的结合。考虑任意两个叶子，它们到根的路径上黑点数量相同，不妨记为\(bh\)

因为不全为黑点，且红点不相邻，故\(rh_1\leqslant bh\and rh_2\leqslant bh\)，于是\(|H_1-H_2|=|(bh+rh_1)-(bh+rh_2)|=|rh_1-rh_2|\leqslant bh\)

可知红黑树是弱平衡的（相比AVL）

操作

查找

和一般的BST一样，红黑树的查找操作就是在树上二分

插入

和一般的BST一样，红黑树的插入就是在树上查找，然后增加一个新的节点

不同之处在于我们的节点有颜色。在红黑树中规定新节点染成R，这样得到的新树满足条件4

然而可能出现新节点v和它的父亲fa都是R的情况，这种时候就需要修正操作

插入修正染色

记新节点为x，其父亲为y，爷爷为z，爷爷的另一个儿子（伯父/叔父）为w

显然有y为R，z为B（在加入x之前这是一个合法的红黑树）

若y和w都是R，则将它们染成B，将z染成R，对z递归修正；
若w是B，则先确保x是y的左儿子（否则旋转x，交换指向x和y的指针），再染y为B，染z为R，旋转y

可以发现，1操作等价于把一个B点的颜色给了两个R儿子，这样做不会使得\(bh\)发生变化

类似的2操作也不会使\(bh\)增大

删除元素

好麻烦.....先吃饭

posted @ 2021-05-29 17:02 jjppp 阅读(90) 评论(0) 收藏举报

刷新页面返回顶部

jjppp的博客

那就这样吧!@#$%^&*_

数据结构笔记1 搜索树

二叉搜索树(Binary Search Tree)

定义

操作

查找/修改

前驱/后继

插入

删除

第k大

基本性质

B-树(B-Tree)

定义

操作

查找

插入

分裂

删除

性质

代码实现

红黑树(RB-Tree)

定义

平衡分析

操作

查找

插入

插入修正染色

删除元素

公告

jjppp的博客

那就这样吧!@#$%^&*_

数据结构 笔记1 搜索树

二叉搜索树(Binary Search Tree)

定义

操作

查找/修改

前驱/后继

插入

删除

第k大

基本性质

B-树(B-Tree)

定义

操作

查找

插入

分裂

删除

性质

代码实现

红黑树(RB-Tree)

定义

平衡分析

操作

查找

插入

插入修正染色

删除元素

公告

数据结构笔记1 搜索树