前些日子看了Jeff Atwood的一片文章:《Sorting for Humans
: Natural Sort Order》,地址如 下:http://www.codinghorror.com/blog/archives/001018.html
这篇文章主要讲了他关于以自然方式排序的一些看法,并例举了以下四人对此问题的解决方法。
Dave Koelle's The Alphanum Algorithm
Martin Pool's Natural Order String Comparison
Ian Griffiths' Natural Sorting in
C#
Ned Batchelder's Compact Python
Human Sort, along with Jussi Salmela's internationalized
version of same.
关于什么是以自然方式排序可以用以下这个问题引出:
比如我们用数码相机照相,它会对照好的想进行命名,例如:序列一:photo1.jpg、photo2.jpg、..........photo12.jpg.......,当把这些照片拷贝到电脑上时,它们的排序顺序往往是不对的,例如photo12.jpg会在photo1.jpg 和 photo2.jpg之间,因此希望能够以自然的顺序对这些照片排序,像序列一那样,其实导致上述问题的原因我认为是,一般对这些字符串排序的时候是根据它们的ASCII码来进行的,所以会有photo1.jpg <photo12.jpg<photo2.jpg。
那么对于这个问题,我的想法是对于由字符和数字组成的字符串的排序可以这样进行:对于字符用ASCII码来排序,对于出现的数字则不用ASCII码来排序而是用自然数本身的大小来排序,我使用现成的快速冒泡排序法进行排序(见《敏捷软件开发 原则、模式与实践》一书),程序如下:
1、ISortHandle.cs 封装排序操作的接口
ISortHandle
using System;
using System.Collections.Generic;
using System.Text;
namespace NameSorting
{
public interface ISortHandle
{
void Swap(Int32 index);
Int32 Length { get;}
Boolean Compare(Int32 index);
void SetCollection(Object obj);
}
}
2、StringSortHandle.cs 实现上述接口,定义对字符串的排序操作
StringSortHandle
using System;
using System.Collections.Generic;
using System.Text;
namespace NameSorting
{
public class StringSortHandle : ISortHandle
{
private IList<String> array = null;
/// <summary>
/// Swaps this instance.
/// </summary>
/// <param name="index">The index.</param>
public void Swap(Int32 index)
{
String temp = array[index];
array[index] = array[index + 1];
array[index + 1] = temp;
}
/// <summary>
/// Sets the collection.
/// </summary>
/// <param name="obj">The obj.</param>
public void SetCollection(Object obj)
{
this.array = obj as IList<String>;
}
/// <summary>
/// Gets the length.
/// </summary>
/// <value>The length.</value>
public Int32 Length
{
get
{
if (this.array != null)
{
return this.array.Count;
}
else
{
return -1;
}
}
}
/// <summary>
/// Compares the specified index.
/// </summary>
/// <param name="index">The index.</param>
/// <returns></returns>
public Boolean Compare(Int32 index)
{
if (index == array.Count - 1)
{
return true;
}
String first = this.array[index];
String second = this.array[index + 1];
Int32 fIndex = 0;
Int32 sIndex = 0;
while (fIndex < first.Length && sIndex < second.Length)
{
if (this.IsDigital(first[fIndex]) && this.IsDigital(second[sIndex]))
{
String fStr = String.Empty;
String sStr = String.Empty;
while (fIndex < first.Length && this.IsDigital(first[fIndex]))
{
fStr += first[fIndex];
fIndex++;
}
while (sIndex < second.Length && this.IsDigital(second[sIndex]))
{
sStr += second[sIndex];
sIndex++;
}
if (Int64.Parse(fStr) > Int64.Parse(sStr))
{
return true;
}
if (Int64.Parse(fStr) < Int64.Parse(sStr))
{
return false;
}
continue;
}
else
{
if (first[fIndex].CompareTo(second[sIndex]) > 0)
{
return true;
}
if (first[fIndex].CompareTo(second[sIndex]) < 0)
{
return false;
}
fIndex++;
sIndex++;
continue;
}
}
if (fIndex == first.Length)
{
return false;
}
else
{
return true;
}
}
/// <summary>
/// Determines whether the specified ch is digital.
/// </summary>
/// <param name="ch">The ch.</param>
/// <returns>
/// <c>true</c> if the specified ch is digital; otherwise, <c>false</c>.
/// </returns>
public Boolean IsDigital(Char ch)
{
if (ch.CompareTo('0') >= 0 && ch.CompareTo('9') <= 0)
{
return true;
}
else
{
return false;
}
}
}
}
3、QuickBubbleSorter.cs 快速冒泡排序
QuickBubbleSorter
using System;
using System.Collections.Generic;
using System.Text;
namespace NameSorting
{
public class QuickBubbleSorter
{
private Int32 operations = 0;
private Int32 length = 0;
private ISortHandle sortHandle = null;
/// <summary>
/// Initializes a new instance of the <see cref="QuickBubbleSorter"/> class.
/// </summary>
/// <param name="sh">The sh.</param>
public QuickBubbleSorter(ISortHandle sh)
{
this.sortHandle = sh;
}
/// <summary>
/// Sorts the specified obj.
/// </summary>
/// <param name="obj">The obj.</param>
/// <returns></returns>
public Int32 Sort(Object obj)
{
this.sortHandle.SetCollection(obj);
this.length = this.sortHandle.Length;
this.operations = 0;
if (this.length <= 1)
{
return this.operations;
}
Boolean thisPassInorder = false;
for (Int32 nextToLast = this.length - 2; nextToLast >= 0 && !thisPassInorder; nextToLast--)
{
thisPassInorder = true;
for (Int32 index = 0; index <= nextToLast; index++)
{
if (this.sortHandle.Compare(index))
{
this.sortHandle.Swap(index);
thisPassInorder = false;
}
this.operations++;
}
}
return this.operations;
}
}
}
4、FileUtils.cs
FileUtils
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace NameSorting
{
public static class FileUtils
{
public static IList<String> GetStringList(String fileName)
{
List<String> list=new List<String>();
StreamReader reader = new StreamReader(fileName, Encoding.ASCII);
String sLine = String.Empty;
while (sLine != null)
{
sLine = reader.ReadLine();
if (sLine != null)
{
list.Add(sLine);
}
}
reader.Close();
return list;
}
}
}
5、Program.cs
Program
using System;
using System.Collections.Generic;
using System.Windows.Forms;
namespace NameSorting
{
static class Program
{
/// <summary>
/// 应用程序的主入口点。
/// </summary>
[STAThread]
static void Main()
{
QuickBubbleSorter sorter = new QuickBubbleSorter(new StringSortHandle());
IList<String> list = FileUtils.GetStringList(@"D:\Values.txt");
Console.WriteLine("---------------------Before sorted------------------------");
Display(list);
sorter.Sort(list);
Console.WriteLine("---------------------After sorted------------------------");
Display(list);
System.Console.ReadKey();
}
public static void Display(IList<String> list)
{
if (list != null)
{
foreach (String str in list)
{
Console.WriteLine(str);
}
}
}
}
我使用的未排序前数据如下:
25
100
Banana
1.10.1
1_1
1_10
dumbuser10
9.9.08
12.25.2007
1.01
1.1
1.1.1
1.10
abc-d2-foo.txt
abc-d43-foo.txt
abc-23-foo.txt
smartdeveloper1
abc-100a-foo.txt
abc-12-foo.txt
abc-12a-foo.txt
banana
dumbdeveloper3
dumbdeveloper15
dumbuser2
apple
example-1.txt
example-2.txt
example-3.txt
example-4.txt
example1.txt
example2.txt
example3.txt
example-5.txt
example4.txt
example5.txt
z1.txt
z10.txt
z100.txt
z101.txt
z102.txt
z11.txt
z12.txt
z13.txt
z14.txt
z15.txt
z16.txt
z17.txt
z18.txt
z19.txt
z2.txt
z20.txt
z3.txt
z4.txt
z5.txt
z6.txt
z7.txt
z8.txt
z9.txt
photo2
photox001
photox10
photox3
排序结果如下:
1.01
1.1
1.1.1
1.10
1.10.1
1_1
1_10
9.9.08
12.25.2007
25
100
Banana
abc-12-foo.txt
abc-12a-foo.txt
abc-23-foo.txt
abc-100a-foo.txt
abc-d2-foo.txt
abc-d43-foo.txt
apple
banana
dumbdeveloper3
dumbdeveloper15
dumbuser2
dumbuser10
example-1.txt
example-2.txt
example-3.txt
example-4.txt
example-5.txt
example1.txt
example2.txt
example3.txt
example4.txt
example5.txt
photo2
photox001
photox3
photox10
smartdeveloper1
z1.txt
z2.txt
z3.txt
z4.txt
z5.txt
z6.txt
z7.txt
z8.txt
z9.txt
z10.txt
z11.txt
z12.txt
z13.txt
z14.txt
z15.txt
z16.txt
z17.txt
z18.txt
z19.txt
z20.txt
z100.txt
z101.txt
z102.txt