[转]文本比较(C#版本)

   文本比较(C#版本)

 

文本比较效果如下:

 


 

比如abcdefg和a123defghik相比较而言,相当于前者删除bc,插入123,然后再插入hik而得到的,这里提供的API便是来计算这个的。

关于此算法的论文可以参考这里: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4.6927&rep=rep1&type=pdf

如果看得费劲,那么可以参考这位仁兄的剖析:

http://blog.csdn.net/clariones/archive/2006/11/19/1396880.aspx, 

http://blog.csdn.net/clariones/archive/2006/11/24/1412394.aspx

有人根据他的讲解编写了该算法的Java版本
http://www.blogjava.net/phyeas/archive/2009/01/10/250807.html

我将其改写成了C#版,并且修改了一些Bug:
https://files.cnblogs.com/zhouyinhui/TextComparisonSln.zip  

 

另外,如果基于“文本行”的概念进行比较,可以参考这里:
http://www.mathertel.de/Diff/default.aspx

 

http://www.mathertel.de/Diff/Default.aspx

 

 

This page contains a step by step sample to show how to use the Diff class for comparing the characters of 2 strings.

For more information about Diff class see the main page.

This page can be used with 2 URL parameters (a and b) to test this implementation:

Current parameters:

Text a = "Default Text For Line A."
Text b = "Default textline for line B."

1. Parameter Preparation

Before the algorithm can be used the 2 input string must be converted into the datatype that is used by the algorithm: a int Array.

Because we compare on a character basis this task is very easy to complete by using the character code of each char. This is done by the DiffCharCodes:

private static int[] DiffCharCodes(string aText, bool ignoreCase) {
  int[] Codes;

  if (ignoreCase)
    aText = aText.ToUpperInvariant();

  Codes = new int[aText.Length];

  for (int n = 0; n < aText.Length; n++)
    Codes[n] = (int)aText[n];

  return (Codes);
} // DiffCharCodes

The codes for the 2 textlines are:

a_codes = 44 65 66 61 75 6c 74 20 54 65 78 74 20 46 6f 72 20 4c 69 6e 65 20 41 2e
b_codes = 44 65 66 61 75 6c 74 20 74 65 78 74 6c 69 6e 65 20 66 6f 72 20 6c 69 6e 65 20 42 2e

2. Calling the Diff Algorithm

The main entry point for the Algorithm is the LCS function that can take 2 int[] parameters and will return an array Diff.Item structures that are describing the difference details as identical, inserted or deleted subarrays.

Diff.Item[] diffs = Diff.DiffInt(a_codes, b_codes);

Here is a dump of the actual content of this structure:

The diff result has 5items.
StartA=8, StartB=8, deletedA=1, insertedB=1
StartA=12, StartB=12, deletedA=0, insertedB=4
StartA=13, StartB=17, deletedA=1, insertedB=1
StartA=17, StartB=21, deletedA=1, insertedB=1
StartA=22, StartB=26, deletedA=1, insertedB=1

3. Formatting the result

Now we can use the original data together with the result items and generate a intuitive readable form of the result:

int pos = 0;
for (int n = 0; n < diffs.Length; n++) {
  Diff.Item it = diffs[n];

  // write unchanged chars
  while ((pos < it.StartB) && (pos < b_line.Length)) {
    this.Response.Write(b_line[pos]);
    pos++;
  } // while

  // write deleted chars
  if (it.deletedA > 0) {
    this.Response.Write("<span class='cd'>");
    for (int m = 0; m < it.deletedA; m++) {
      this.Response.Write(a_line[it.StartA + m]);
    } // for
    this.Response.Write("</span>");
  }

  // write inserted chars
  if (pos < it.StartB + it.insertedB) {
    this.Response.Write("<span class='ci'>");
    while (pos < it.StartB + it.insertedB) {
      this.Response.Write(b_line[pos]);
      pos++;
    } // while
    this.Response.Write("</span>");
  } // if
} // while

// write rest of unchanged chars
while (pos < b_line.Length) {
  this.Response.Write(b_line[pos]);
  pos++;
} // while

And here is the formatted result:

Default Ttextline Ffor Lline AB.

This page is part of the http://www.mathertel.de/ web site.

posted @ 2018-11-29 10:53  Net-Spider  阅读(828)  评论(0)    收藏  举报