代码改变世界

快速的字符串查找算法(Boyer-Moore)

2005-07-05 10:53  灵感之源  阅读(...)  评论(... 编辑 收藏

CodeProjectEfficient Boyer-Moore Search in Unicode Strings,作者leseul 展示了Boyer-Moore算法的威力,代码这里下载:
Download source - 10.2 Kb
 Download demo project - 5.18 Kb

我写了一个性能测试:

public static void Main()
{
 string pattern = "AbC";
 string target = "AbCaBc";
 string pressure;
 StringBuilder b = new StringBuilder();
 int count = 10000000;
 for (int i = 0; i < count; i++)
 {
  b.Append(target);
 }
 pressure = b.ToString();

 //BM without case senstive
 HiPerfTimer time = new HiPerfTimer();
 time.Start();
 CIBMSearcher BMS = new CIBMSearcher(pattern, false);
 int index = BMS.Search(pressure, 0);
 while (index >= 0)
 {
  index = BMS.Search(pressure, index + pattern.Length);
 }
 time.Stop();
 Console.WriteLine("BM without case senstive:" + time.Duration);
 GC.Collect();

 //BM with case senstive
 time = new HiPerfTimer();
 time.Start();
 BMS = new CIBMSearcher(pattern, true);
 index = BMS.Search(pressure, 0);
 while (index >= 0)
 {
  index = BMS.Search(pressure, index + pattern.Length);
 }
 time.Stop();
 Console.WriteLine("BM with    case senstive:" + time.Duration);
 GC.Collect();

 //SubString without case senstive
 time = new HiPerfTimer();
 time.Start();
 index = pressure.IndexOf(pattern);
 while (index >= 0)
 {
  index = pressure.IndexOf(pattern, index + pattern.Length);
 }
 time.Stop();
 Console.WriteLine("SS without case senstive:" + time.Duration);
 GC.Collect();

 Console.ReadLine();
}


结果如下:
BM without case senstive:1.2411443536895
BM with      case senstive:0.707685620917367
SS without  case senstive:1.77157282256596

SS是SubString。

我的电脑是PIV 2.8G + 1GRAM 。

BM的威力可见一斑,估计我之前写的高效的忽略大小写的字符串替换(Replace)函数(多种方法比较)可以大大改善了。

高效的算法的意义就在于此啊!这个算法现暂不研究,今天太忙,得看看今晚是否有时间研究一下。

注释:代码不格式化是因为代码插入功能有错误,无法使用。

点击这里下载我的写的测试代码