四、查找算法(基础)

一、顺序查找

也称之为线性查找，从线性表的一端顺序扫描，依次将扫描到的结果与给定关键字K想比较，如果相等则查找成功，如果扫描到末尾仍未找到，则扫描失败。

首先，我们查看如下代码：

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
    class Program
    {
        static void Main(string[] args)
        {
            int count = 100000000, key = 0, result = 0 ;
            int[] array = new int[count];
            init(array, count);
            Console.Write("Please enter the number you need to find:");
            string outkey = Console.ReadLine();
            if (Regex.IsMatch(outkey, @"^[+-]?\d*[0]?\d*$"))
            {
                key = int.Parse(outkey);
            }
            else
            {
                Console.WriteLine("The number you entered is not corrent.system exits.");
                return;
            }
            TestSequentialSearch seqsearch = new TestSequentialSearch();
            //Time test
            System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            stopwatch.Start();
            result = seqsearch.Search<int>(array, count, key);
            stopwatch.Stop();
            if (result>0)
                Console.WriteLine("Number has been found {0}, subscript is {1}",key.ToString(),result.ToString());
            else
                Console.WriteLine("Sorry. number not found {0}",key.ToString());
            Console.WriteLine("Time for array Search<T>:" + stopwatch.Elapsed.TotalMilliseconds);//key:99999999  output: 1455.*****
        }
        #region Data initialization.display and clear
        static void init(int[] array, int count)
        {
            Random random = new Random();
            for (int i = 0; i < count; i++)
            {
                array[i] = (i + 1);
                //array[i] = random.Next(0, count);
            }
        }
        static void display(int[] array)
        {
            foreach (var item in array)
            {
                Console.WriteLine(item);
            }
        }
        static void clear(int[] array, int count)
        {
            for (int i = 0; i < count; i++)
            {
                array[i] = 0;
            }
        } 
        #endregion
    }
    internal class TestSequentialSearch
    {
        public int Search<T>(T[] array,int count,T key)
        {
            for (int i = 0; i < array.GetUpperBound(0); i++)
            {
                if (array[i].Equals(key)) return i;
            }
            return -1;
        }
    }
}

View Code

看出来，当我们输入的数越小，循环次数越少，耗时越短。数越大，循环次数越多，耗时越久。从上述代码看来耗时是1455多毫秒。

然后在测试数组的Contains方法，代码如下：

    　　　　 System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            TestSequentialSearch seqsearch = new TestSequentialSearch();
            //Contains time test
            stopwatch.Start();
            array.Contains(key);
            stopwatch.Stop();
            Console.WriteLine("Time for array contains:" + stopwatch.Elapsed.TotalMilliseconds);//key:99999999  output: 106.*****

为什么Constains会比自己写的查找法快10倍呢？可以扒一扒.net framework的源码看看。

如何查看.net framework,请戳这里

根据断点调试器，可以看出来最终源码里边执行的是 System.Array

public static int IndexOf<T>(T[] array, T value, int startIndex, int count) {
            if (array==null) {
                throw new ArgumentNullException("array");
            }

            if (startIndex < 0 || startIndex > array.Length ) {
                throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_Index"));
            }

            if (count < 0 || count > array.Length - startIndex) {
                throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_Count"));
            }
            Contract.Ensures(Contract.Result<int>() < array.Length);
            Contract.EndContractBlock();

            return EqualityComparer<T>.Default.IndexOf(array, value, startIndex, count);
        }

最后一句跟踪进去，执行的是：System.Collections.Generic.EqualityComparer

internal override int IndexOf(T[] array, T value, int startIndex, int count) {
            int endIndex = startIndex + count;
            if (value == null) {
                for (int i = startIndex; i < endIndex; i++) {
                    if (array[i] == null) return i;
                }
            }
            else {
                for (int i = startIndex; i < endIndex; i++) {
                   if (array[i] != null && array[i].Equals(value)) return i;
                }
            }
            return -1;
        }

看来微软的底层实现也是用顺序查找实现的，至于数组的contains为什么会调用IndexOf，请自行翻阅理解 IEnumberable,ICollection，IList，List，Array，等

推荐：戳这里

那么我们稍微改进一下代码，不使用泛型，使用泛型，内部可能有些装箱之内的操作，会增加时耗。

public int Search(int[] array,int count,int key)
        {
            for (int i = 0; i < array.GetUpperBound(0); i++)
            {
                if (array[i].Equals(key)) return i;
            }
            return -1;
        }

 System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            TestSequentialSearch seqsearch = new TestSequentialSearch();
            stopwatch.Start();
            seqsearch.Search(array, count, key);
            stopwatch.Stop();
            Console.WriteLine("Time for array:" + stopwatch.Elapsed.TotalMilliseconds);//key:99999999  output: 946.*****

相对使用泛型方法来说，已经缩短了三分之一多的时耗了。在根据微软底层的实现，来优化，把GetUpperBound的调用去掉，调用方法会增加CPU的执行指令，而后，还有些临时变量

会在内存划分空间等等，都是耗时的操作.

        public int Search(int[] array,int count,int key)
        {
            for (int i = 0; i <count; i++)
            {
                if (array[i].Equals(key)) return i;
            }
            return -1;
        }

            System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            TestSequentialSearch seqsearch = new TestSequentialSearch();
            stopwatch.Start();
            seqsearch.Search(array, count, key);
            stopwatch.Stop();
            Console.WriteLine("Time for array:" + stopwatch.Elapsed.TotalMilliseconds);//key:99999999  output: 414.*****

相对上面的优化，又缩短了一半的时间，可是还达不到微软的效率，但是可以看出自己写的代码和微软底层执行的代码差不多一样了，这是为什么呢？？

终于，在一次偶然的机会，调试查看底层代码时，发现有这么一句提示 “ 当前方法的代码已经优化过，无法计算当前表达式的值”。

茅塞顿开，于是使用release编译项目运行，终于达到109毫秒的耗时，和微软底层一样了。

不简单啊，从1千多毫秒优化到1百多毫秒

二、折半查找

public int BinSearch(int[] array,int count,int key)
        {
            int upperbound, lowerbound,half;
            upperbound = count-1;
            lowerbound = 0;
            while (lowerbound <= upperbound)
            {
                half = (lowerbound + upperbound) / 2;
                if (array[half] == key) return half;
                if (array[half] > key)
                    //猜大了
                    upperbound = half - 1;
                else
                    //猜小了
                    lowerbound = half + 1;
            }
            return -1;
        }

需求：假设一组有序的数据，10个元素，值为1-10. 那么下标就从0-9。我们需要找此数据中是否包含元素值为7。如果包含返回下标，否则返回-1

伪步骤：

1、0-9 折半下标为4 值为5 猜小了（下标为4都值都比需要查找的值小，那么4和之前的就不需要比较了）

2、5-9 折半下标为7 值为8 猜大了（可以看出当前数肯定在5-6之间了，比下标为4的大，比下标为7的小）

3、5-6 折半下标为5 值为6 猜小了（肯定是6了）

4、7-7 折半下标为6 值为7 相等（返回下标6）

实现步骤：（经过上续伪代码，可以总结代码的实现步骤）

1、首先需要定义3个变量，一个是下限，一个是上限，一个是折半的下标，并且给下限和上限设置初始值(这里上限是9，下限是0)

2、编写循环，这里使用while比较合适，条件是下限小于等于上限（一旦下限都比上限大了，那么可以肯定值不在这组数据中了）

3、循环体中第一步就需要折半下标，（下限+上限）/2

4、在用折半的下标值和需要查找的值进行比较，相等则直接返回当前下标

5、如果折半的值比查找值小，那么是猜小了，下限应该在折半的下标基础上加1位

6、如果折半的值比查找值大，那么是猜大了，上限应该在折半的下标基础上减1位

采用折半查找法和顺序查找法进行耗时比较，100000000这么多个数中查找99999999.顺序查找耗时是102毫秒左右，而折半查找法耗时才0.47毫秒

上述折半算法还可以演变为递归折半法：

        public int ReBinSearch(int[] array, int upperbound, int lowerbound, int key)
        {
            if (lowerbound>upperbound)
                return -1;  //终止条件 terminate condition
            int half = (lowerbound + upperbound) / 2;
            if (array[half] == key) return half;
            if (array[half] > key)
                return ReBinSearch(array, half - 1, lowerbound, key);
            else
                return ReBinSearch(array, upperbound, half + 1, key);
        }

调用的关键性代码：

 System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            TestSequentialSearch seqsearch = new TestSequentialSearch();
            Algorithm algorithm = new Algorithm();
            stopwatch.Start();
            result =seqsearch.ReBinSearch(array, count-1,0, key);
            stopwatch.Stop();
            if (result>=0)
                Console.WriteLine("find number.");
            else
                Console.WriteLine("not find.");
            Console.WriteLine("Time for array:" + stopwatch.Elapsed.TotalMilliseconds);//key:99999999  output:0.4802

当然，FCL中Array.BinarySearch 底层实现同样用的是折半查找法，但是效率比自己定制的高，扒开源码一看，里面有些操作用了位运算，等后面了解深入一点

在重头扒源码优化。现在看不懂。。。

总之，有内置方法就别用用户定制方法。

posted @ 2016-04-12 20:36 HUCEMAIL 阅读(84) 评论(0) 收藏举报

刷新页面返回顶部

HUCEMAIL

四、查找算法(基础)

公告