深入理解迭代器与集合 Part1【Linq学习系列】

Posted on 2010-06-25 23:26 肖敏阅读(2083) 评论(5) 收藏举报

Linq学习系列

第一节深入理解迭代器与集合

本节分五部分：

第一部分，迭代器的原理与实现；

第二部分，迭代器的高级用法

第三部分，使用迭代器常见的错误

第四部分，Linq中用到的迭代器

第五部分，扩展Linq（Linq To Reflection）

Part1 解析迭代器的各种特性

1.0 前言

不管是Linq To Objects,Linq To Sql,还是Linq To Xml等等，Linq都是用于对集合的操作。而迭代器是Linq的基础之一。

迭代器是一种模式，用于顺序访问一个聚合对象中的各个元素。更具体的说，在.Net中，是通过实现IEnumerable，IEnumerable<T>来实现的。

迭代器模式参看：http://blog.sina.com.cn/s/blog_45d7d4010100bz50.html。

也许有人觉得集合不就是数组吗，比如Int[] myIntegers=Int[100];直接指定下标myIntergers[x]不就可以访问任意元素了吗？为什么要这么麻烦？

集合有很多种，链表，树(比如xml)，图都是集合，而且集合中的元素也不一定就是同样的类型。有些集合无法使用任意指定的方式来遍历。最特殊的例子（也许不是很恰当）,互联网就是一个集合。google的网络爬虫就是一个迭代器，它遍历整个互联网，抓取的东西包括网页，图片，视频，音频等等。

我们都熟悉SQL语句，稍微深入一点就知道，当我们的查询条件中不包含索引时，其实数据库系统在后台要执行所谓的“全表扫描”，而迭代器就是执行“全集合扫描”的对象。（Linq操作的对象是不是也可以建立一个索引呢？可以思考一下。）

重要概念：

集合：对于Linq来说，它可操作的对象是实现了IEnumerable<T> 的类。因此，对于Linq来说，集合是指相同类型的集合。不管它们是以何种形式组合在一起，必须都是返回T类型。而且，这种集合都包含了迭代器。但是，请注意，集合所包含的类型跟迭代器返回的类型并不需要相同，并且这个特性在C#2.0就可以实现。比如，一个字符串的集合，可以实现一个返回每个字符串长度的迭代器（请看1.5节）。

迭代器：实现了IEnumerator<T>和IEnumerator接口的类都是迭代器。.NET Fraemwork实现了许多迭代器：CharEnumerator,SZGenericArrayEnumerator<T> 等等。我们可以实现这两个接口定制自己的迭代器。你可以直接使用迭代器的方法来遍历，也可以使用foreach语句，但是这并不是本质区别。

从上面的描述也可以看出，Linq所能操作的集合要比SQL丰富，SQL只能操作关系表；其次，它提供了对强类型集合的操作，减少了错误的发生。既然提供了这么多丰富的功能，肯定有无数背后的故事。

但是，我们先从最简单的数组开始。

1.1 数组

在没有迭代器的时代，如果我们要遍历一个数组，我们一般用这样的写法：

Int myIntegers=Int[100];

for(i=0;i<myIntegers.Length;i++)
{
Console.WriteLine(myIntegers[i]);
}

对于这种方式，我们经常碰到的问题就是：越界。编译器无法判断，因此这种错误只能在运行时查出来。我们看看迭代器遍历的方式：

foreach(int i in myIntegers)
{
Console.WriteLine(i);
}

其实对于上述写法，编译器完全可以使用语法糖的技巧，偷偷的编译成如下语句：

int current=0;

while(myIntegers.Lenght> current)
{
    int i= myIntegers[current];

    Console.WriteLine(i);

    current++;
}

这样也可以解决越界的问题。但是实际上并没有使用上述方法，而是使用迭代器实现的。在C#（2.0以后）中，每一个数组对象都实现了IEnumerable<T>接口。（还实现了ICollection<T>,IList<T>接口）。定义如下：

public interface IEnumerable<T> : IEnumerable
{
IEnumerator<T> GetEnumerator();
}

以Int型数组为例，当我们声明一个Int型数组Int[]myIntegers时，系统会自动生成一个类:System.Int[],名称空间跟Int类型是一样的。示例代码：

代码

public Int[]:Array,IEnumerable<Int>
{
    IEnumerator<Int> GetEnumerator()
    {
        return new ArrayEnumerator<Int>(this);
    }
}

public ArrayEnumerator<T>:IEnumerator<T>
{
    T[] _array;
    Int _current=-1;

    // 构造函数,传入数组
    public ArrayEnumerator(T[] array)
    {
        _array=array;
    }

    public T Current
    {
        get { return _array[current]; }
    }

    public bool MoveNext()
    {
        if(_array.Length==_current+1 || _array.Length==0)
            return false;
        else
        {
            _current++;
            return ture;
        }
    }

    public Reset()
    {
        _current=0;
    }
}

因此，在使用foreach语句的时候，实际上调用的IEnumerable<Int>接口，相当于以下语句：

IEnumerable<Int> enumerator= myIntegers.GetEnumerator();

while(enumerator.MoveNext())
{
int i=enumerator.Current;

Console.WriteLine(i);
}

enumerator.Reset();

在这里，我们用到了数组迭代器：ArrayEnumerator<T>，针对不同的集合，我们可以使用不同的迭代器，只要可以实现IEnumerator<T>接口就行。下一节,我们举一个单向链表的例子。

注：传说VB对foreach语句的编译跟C#略有不同，如果对象是数组，它将不会使用迭代器，这样使得效率更高，但是我没有验证过。

1.2 自定义单向链表迭代器

链表定义：

代码

    public class LinkTable<T> : IEnumerable<T>
    {
        TableNode<T> _firstNode;

        public void Add(T value)
        {
            if (_firstNode == null)
                _firstNode = new TableNode<T>(value);
            else
                _firstNode.Next = new TableNode<T>(value);
        }
        #region IEnumerable<T> Members

        public IEnumerator<T> GetEnumerator()
        {
            return new LinkTableEnumerator<T>(_firstNode);
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return new LinkTableEnumerator<T>(_firstNode);
        }

        #endregion
    }

链表节点定义：

代码

    public class TableNode<T>
    {
        public TableNode(T value)
        {
            _value = value;
        }

        T _value;

        public T Value
        {
            get { return _value; }
            set { _value = value; }
        }

        TableNode<T> _next;

        internal TableNode<T> Next
        {
            get { return _next; }
            set { _next = value; }
        }
    }

链表迭代器：

代码

    public class LinkTableEnumerator<T> : IEnumerator<T>
    {
        TableNode<T> _firstNode;
        TableNode<T> _current;

        public LinkTableEnumerator(TableNode<T> firstNode)
        {
            _firstNode = firstNode;
        }

        #region IEnumerator<T> Members

        public T Current
        {
            get { return _current.Value; }
        }

        #endregion

        #region IDisposable Members

        public void Dispose()
        {
            //throw new NotImplementedException();
        }

        #endregion

        #region IEnumerator Members

        object System.Collections.IEnumerator.Current
        {
            get { return _current.Value; }
        }

        public bool MoveNext()
        {
            if(_firstNode==null)
                return false;

            // 迭代器开始
            if(_current==null)
            {
                _current = _firstNode;
                return true;
            }
            else
            {
                if (_current.Next == null)
                    return false;
                else
                {
                    _current = _current.Next;
                    return true;
                }
            }
        }

        public void Reset()
        {
            _current = null;
        }

        #endregion
    }

【下载LinkTable源代码】

调用代码：

    LinkTable<string> sLink = new LinkTable<string>();
    sLink.Add("a");
    sLink.Add("b");
    foreach (string item in sLink)
    {
        Console.WriteLine(item);
    }

输出：

从MoveNext()方法可以看出来，针对不同的集合，迭代器的行为有可能完全不一样。在数组迭代器中，迭代器保存着数组的引用，而在单向链表迭代器中，只需要保存第一个节点就行。

再比如我们还可以自定义树的迭代器，显然，我们可以使用深度优先遍历或者广度优先遍历,那我们可以让一个集合对象实现两个迭代器，并且让用户自己选择需要的迭代器吗？

1.3 让集合拥有多个迭代器:扩展TreeView类

以System.Windows.Forms.TreeView类为例，这是一个树形结构，但是它并没有实现IEnumerable<T>接口因此我们并不能够对他使用迭代器进行方便的遍历。

显然，我们可以扩展TreeView类，让它实现IEnumerable<T>接口：

public class MyTreeView:TreeView,IEnumerator<TreeNode>

那么我们就可以使用如下语法来遍历MyTreeView类型对象：

MyTreeView tree=new MyTreeView ();

foreach(TreeNode node in tree)
    {
        // do someting...
    }

但是，如果我们要让用户选择是用深度遍历还是广度遍历，这种方法就不行了。因此我们考虑让MyTreeView拥有两个迭代器，其中关键部分是，在GetEnumerator()方法中，根据用户的选择（增加TreeEnumeratorType属性，该属性为枚举类型）返回不同的迭代器。完整代码如下：

    public enum EnumeratorType
    {
        DepthFirst,
        WidthFirst
    }

    public class MyTreeView : TreeView,IEnumerable<TreeNode>
    {
        public EnumeratorType TreeEnumeratorType = EnumeratorType.DepthFirst;

        IEnumerator<TreeNode> _depthFirst;

        IEnumerator<TreeNode> DepthFirst
        {
            get
            {
                if (_depthFirst == null)
                {
                    _depthFirst = new DepthFirstEnumerator(this);
                }
                return _depthFirst;
            }
        }

        IEnumerator<TreeNode> _widithFirst;

        IEnumerator<TreeNode> WidithFirst
        {
            get
            {
                if (_widithFirst == null)
                {
                    _widithFirst = new WidthFirstEnumerator(this);
                }
                return _widithFirst;
            }
        }

        #region IEnumerable<TreeNode> Members

        public IEnumerator<TreeNode> GetEnumerator()
        {
            switch (TreeEnumeratorType)
            {
                case EnumeratorType.DepthFirst:
                    return this.DepthFirst;
                case EnumeratorType.WidthFirst:
                    return this.WidithFirst;
            }
            return this.DepthFirst;
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            switch (TreeEnumeratorType)
            {
                case EnumeratorType.DepthFirst:
                    return this.DepthFirst;
                case EnumeratorType.WidthFirst:
                    return this.WidithFirst;
            }
            return this.DepthFirst;
        }

        #endregion
    }

使用：

    MyTreeView myTree = new MyTreeView();

    // 使用深度优先迭代器
    myTree.TreeEnumeratorType = EnumeratorType.DepthFirst;

    foreach (var item in myTree)
    {

    }

    // 使用广度优先迭代器
    myTree.TreeEnumeratorType = EnumeratorType.WidthFirst;
    foreach (var item in myTree)
    {

    }

如果再往下扩展，让GetEnumerator()函数自己根据数据集合的情况智能的选择迭代器，倒是一个不错的主意。

1.4 语法糖：yield return

我们可以使用这样的方法返回一个迭代器：

static IEnumerable<int> GetInt()
    {
        yield return 1;
        yield return 2;
    }

但是实际上，IL语言中并没有yield return这个关键字。它只不过是一个为了书写方便书写的语法糖而已。编译器偷偷的生成了一个迭代器。原理跟上面的例子没有任何区别。具体可以参照一下链接：

http://www.cnblogs.com/Mayvar/archive/2009/11/24/wanghonghua-11241934.html

刷新页面返回顶部

xiaomin

公告

深入理解迭代器与集合 Part1【Linq学习系列】