asp.net中正则表达式使用

一、限定符：限定符提供了一种简单方法，用于指定允许特定字符或字符集自身重复出现的次数。限定符始终引用限定符前（左边）的模式，通常是单个字符，除非使用括号创建模式组。

（一）非显示限定符

1、	*，描述“出现 0 或多次”。
2、	+，描述“出现 1 或多次”。
3、	?，描述“出现 0 或 1 次”。

（二）显式限定符

　　显式限定符使用花括号 {n,m} 及其中的数字值表示模式出现次数的上下限。
　　如果仅指定一个数字，则表示次数上限，例如，x{5} 将准确匹配 5 个 x 字符 (xxxxx)，如果数字后跟一个逗号，如 x{5,}，表示匹配任何出现次数大于 4 的 x 字符。

二、元字符

　　.（句点或点）元字符是最简单但最常用的一个字符。它可匹配任何单字符。如果要指定某些模式可包含任意组合的字符，使用句点非常有用，但一定要在特定长度范围内。
　　^ 元字符可指定字符串（或行）的开始。
　　 $ 元字符可指定字符串（或行）的结束。通过将这些字符添加到模式的开始和结束处，可强制模式仅匹配精确匹配的输入字符串。如果 ^ 元字符用在方括号 [ ] 指定的字符类的开头，将有特殊的含义。具体内容后。
　　\ （反斜杠）元字符既可根据特殊含义“转义”字符，也可指定预定义集合元字符的实例。同样，具体内容见下。为了在正则表达式中包括文字样式的元字符，必须使用反斜杠进行“转义”。例如，如果要匹配以“c:\”开始的字符串，可使用：^c:\\。注意，要使用 ^ 元字符指出字符串必须以此模式作为开始，然后用反斜杠元字符转义文字反斜杠。
　　|（管道）元字符用于交替指定，特别用于在模式中指定“此或彼”。例如，a|b 将匹配包含“a”或“b”的任何输入内容，这与字符类 [ab] 非常类似。
　　 ( ) 括号用于给模式分组。它允许使用限定符让一个完整模式出现多次。为了便于阅读，或分开匹配特定的输入部分，可能允许分析或重新设置格式。

三、字符类：

　　字符类是正则表达式中的“迷你”语言，在方括号 [ ] 中定义。在表达式中使用字符类时，可在模式的此位置使用其中任何一个字符（但只能使用一个字符，除非使用了限定符）。请注意，不能使用字符类定义单词或模式，只能定义单个字符。

　　通过在括号中使用连字符 - 来定义字符的范围。连字符在字符类中有特殊的含义（不是在正则表达式中，因此，准确地说它不能叫正则表达式元字符），且仅在连字符不是第一个字符时，连字符才在字符类中有特殊含义。要使用连字符指定任何数值数字，可以使用 [0-9]。小写字母也一样，可以使用 [a-z]，大写字母可以使用 [a-z]。连字符定义的范围取决于使用的字符集。因此，字符在（例如）ascii 或 unicode 表中出现的顺序确定了在范围中包括的字符。如果需要在范围中包括连字符，将它指定为第一个字符。例如：[-.?] 将匹配 4 个字符中任何一个字符（注意，最后的字符是个空格）。另请注意，正则表达式元字符在字符类中不做特殊处理，所以这些元字符不需要转义。考虑到字符类是与其他正则表达式语言分开的一种语言，因此字符类有自己的规则和语法。

　　如果使用字符 ^ 作为字符类的第一个字符来否定此类，也可以匹配字符类成员以外的任何字符。因此，要匹配任何非元音字符，可以使用字符类 [^aaeeiioouu]。注意，如果要否定连字符，应将连字符作为字符类的第二个字符，如 [^-]。记住，^ 在字符类中的作用与它在正则表达式模式中的作用完全不同。

四、预定义的集合元字符

元字符	等效字符类
\a	匹配铃声（警报）；\u0007
\b	匹配字符类外的字边界，它匹配退格字符，\u0008
\t	匹配制表符，\u0009
\r	匹配回车符，\u000d
\w	匹配垂直制表符，\u000b
\f	匹配换页符，\u000c
\n	匹配新行，\u000a
\e	匹配转义符，\u001b
\040	匹配 3 位 8 进制 ascii 字符。\040 表示空格（十进制数 32）。
\x20	使用 2 位 16 进制数匹配 ascii 字符。此例中，\x2- 表示空格。
\cc	匹配 ascii 控制字符，此例中是 ctrl-c。
\u0020	使用 4 位 16 进制数匹配 unicode 字符。此例中 \u0020 是空格。
\*	不代表预定义字符类的任意字符都只作为该字符本身对待。因此，\* 等同于 \x2a（是文字，不是元字符）。
\p{name}	匹配已命名字符类“name”中的任意字符。支持名称是 unicode 组和块范围。例如，ll、nd、z、isgreek、isboxdrawing 和 sc（货币）。
\p{name}	匹配已命名字符类“name”中不包括的文本。
\w	匹配任意单词字符。对于非 unicode 和 ecmascript 实现，这等同于 [a-za-z_0-9]。在 unicode 类别中，这等同于 [\p{ll}\p{lu}\p{lt}\p{lo}\p{nd}\p{pc}]。
\w	\w 的否定，等效于 ecmascript 兼容集合 [^a-za-z_0-9] 或 unicode 字符类别 [^\p{ll}\p{lu}\p{lt}\p{lo}\p{nd}\p{pc}]。
\s	匹配任意空白区域字符。等效于 unicode 字符类 [\f\n\r\t\v\x85\p{z}]。如果使用 ecmascript 选项指定 ecmascript 兼容方式，\s 等效于 [ \f\n\r\t\v] （请注意前导空格）。
\S	匹配任意非空白区域字符。等效于 unicode 字符类别 [^\f\n\r\t\v\x85\p{z}]。如果使用 ecmascript 选项指定 ecmascript 兼容方式，\s 等效于 [^ \f\n\r\t\v] （请注意 ^ 后的空格）。
\d	匹配任意十进制数字。在 ecmascript 方式下，等效于 unicode 的 [\p{nd}]、非 unicode 的 [0-9]。
\d	匹配任意非十进制数字。在 ecmascript 方式下，等效于 unicode 的 [\p{nd}]、非 unicode 的 [^0-9]。

五、asp.net中建立和使用RegularexPressions类（见《asp.net中正则表达式使用(二）》

Regex

类表示不可变（只读）正则表达式类。它还包含各种静态方法，允许在不显式创建其他类的实例的情况下使用其他正则表达式类。

以下代码示例创建了 Regex 类的实例并在初始化对象时定义一个简单的正则表达式。请注意，使用了附加的反斜杠作为转义字符，它将 \s 匹配字符类中的反斜杠指定为原义字符。

[Visual Basic]
    ' Declare object variable of type Regex.
    Dim r As Regex 
    ' Create a Regex object and define its regular expression.
    r = New Regex("\s2000")

[C#]
    // Declare object variable of type Regex.
    Regex r; 
    // Create a Regex object and define its regular expression.
    r = new Regex("\\s2000");

Match

Match 类表示正则表达式匹配操作的结果。以下示例使用 Regex 类的 Match 方法返回 Match 类型的对象，以便找到输入字符串中的第一个匹配项。此示例使用 Match 类的 Match.Success 属性来指示是否已找到匹配。

[Visual Basic]
    ' cCreate a new Regex object.
    Dim r As New Regex("abc") 
    ' Find a single match in the input string.
    Dim m As Match = r.Match("123abc456") 
    If m.Success Then
        ' Print out the character position where a match was found. 
        ' (Character position 3 in this case.)
        Console.WriteLine("Found match at position " & m.Index.ToString())
    End If

[C#]
    // Create a new Regex object.
    Regex r = new Regex("abc"); 
    // Find a single match in the string.
    Match m = r.Match("123abc456"); 
    if (m.Success) 
    {
        // Print out the character position where a match was found. 
        // (Character position 3 in this case.)
        Console.WriteLine("Found match at position " + m.Index);
    }

MatchCollection

MatchCollection 类表示成功的非重叠匹配的序列。该集合为不可变（只读）的，并且没有公共构造函数。MatchCollection 的实例是由 Regex.Matches 属性返回的。

以下示例使用 Regex 类的 Matches 方法，通过在输入字符串中找到的所有匹配填充 MatchCollection。该示例将此集合复制到一个字符串数组和一个整数数组中，其中字符串数组用以保存每个匹配项，整数数组用以指示每个匹配项的位置。

[Visual Basic]
    Dim mc As MatchCollection
    Dim results(20) As String
    Dim matchposition(20) As Integer

    ' Create a new Regex object and define the regular expression.
    Dim r As New Regex("abc")
    ' Use the Matches method to find all matches in the input string.
    mc = r.Matches("123abc4abcd")
    ' Loop through the match collection to retrieve all 
    ' matches and positions.
    Dim i As Integer
    For i = 0 To mc.Count - 1
        ' Add the match string to the string array.
        results(i) = mc(i).Value
        ' Record the character position where the match was found.
        matchposition(i) = mc(i).Index
    Next i

[C#]
    MatchCollection mc;
    String[] results = new String[20];
    int[] matchposition = new int[20];
    
    // Create a new Regex object and define the regular expression.
    Regex r = new Regex("abc"); 
    // Use the Matches method to find all matches in the input string.
    mc = r.Matches("123abc4abcd");
    // Loop through the match collection to retrieve all 
    // matches and positions.
    for (int i = 0; i < mc.Count; i++) 
    {
        // Add the match string to the string array.   
        results[i] = mc[i].Value;
        // Record the character position where the match was found.
        matchposition[i] = mc[i].Index;   
    }

GroupCollection

GroupCollection 类表示捕获的组的集合并返回单个匹配中捕获的组的集合。该集合为不可变（只读）的，并且没有公共构造函数。GroupCollection 的实例在 Match.Groups 属性返回的集合中返回。

以下控制台应用程序示例查找并输出由正则表达式捕获的组的数目。有关如何提取组集合的每一成员中的各个捕获项的示例，请参见下面一节的 Capture Collection 示例。

[Visual Basic]
    Imports System
    Imports System.Text.RegularExpressions

    Public Class RegexTest
        Public Shared Sub RunTest()
            ' Define groups "abc", "ab", and "b".
            Dim r As New Regex("(a(b))c") 
            Dim m As Match = r.Match("abdabc")
            Console.WriteLine("Number of groups found = " _
            & m.Groups.Count.ToString())
        End Sub    
    
        Public Shared Sub Main()
            RunTest()
        End Sub
    End Class

[C#]
    using System;
    using System.Text.RegularExpressions;

    public class RegexTest 
    {
        public static void RunTest() 
        {
            // Define groups "abc", "ab", and "b".
            Regex r = new Regex("(a(b))c"); 
            Match m = r.Match("abdabc");
            Console.WriteLine("Number of groups found = " + m.Groups.Count);
        }
        public static void Main() 
        {
            RunTest();
        }
    }

该示例产生下面的输出。

[Visual Basic]
    Number of groups found = 3

[C#]
    Number of groups found = 3

CaptureCollection

CaptureCollection 类表示捕获的子字符串的序列，并且返回由单个捕获组执行的捕获的集合。由于限定符，捕获组可以在单个匹配中捕获多个字符串。Captures 属性（CaptureCollection 类的对象）是作为 Match 和 group 类的成员提供的，以便于对捕获的子字符串的集合的访问。

例如，如果使用正则表达式 ((a(b))c)+（其中 + 限定符指定一个或多个匹配）从字符串“abcabcabc”中捕获匹配，则子字符串的每一匹配的 Group 的 CaptureCollection 将包含三个成员。

以下控制台应用程序示例使用正则表达式 (Abc)+ 来查找字符串“XYZAbcAbcAbcXYZAbcAb”中的一个或多个匹配。该示例阐释了使用 Captures 属性来返回多组捕获的子字符串。

[Visual Basic]
    Imports System
    Imports System.Text.RegularExpressions

    Public Class RegexTest
        Public Shared Sub RunTest()
            Dim counter As Integer
            Dim m As Match
            Dim cc As CaptureCollection
            Dim gc As GroupCollection
            ' Look for groupings of "Abc".
            Dim r As New Regex("(Abc)+") 
            ' Define the string to search.
            m = r.Match("XYZAbcAbcAbcXYZAbcAb")
            gc = m.Groups
            
            ' Print the number of groups.
            Console.WriteLine("Captured groups = " & gc.Count.ToString())
            
            ' Loop through each group.
            Dim i, ii As Integer
            For i = 0 To gc.Count - 1
                cc = gc(i).Captures
                counter = cc.Count
                
                ' Print number of captures in this group.
                Console.WriteLine("Captures count = " & counter.ToString())
                
                ' Loop through each capture in group.            
                For ii = 0 To counter - 1
                    ' Print capture and position.
                    Console.WriteLine(cc(ii).ToString() _
                        & "   Starts at character " & cc(ii).Index.ToString())
                Next ii
            Next i
        End Sub
    
        Public Shared Sub Main()
            RunTest()
         End Sub
    End Class

[C#]
    using System;
    using System.Text.RegularExpressions;

    public class RegexTest 
        {
        public static void RunTest() 
        {
            int counter;
            Match m;
            CaptureCollection cc;
            GroupCollection gc;

            // Look for groupings of "Abc".
            Regex r = new Regex("(Abc)+"); 
            // Define the string to search.
            m = r.Match("XYZAbcAbcAbcXYZAbcAb"); 
            gc = m.Groups;

            // Print the number of groups.
            Console.WriteLine("Captured groups = " + gc.Count.ToString());

            // Loop through each group.
            for (int i=0; i < gc.Count; i++) 
            {
                cc = gc[i].Captures;
                counter = cc.Count;
                
                // Print number of captures in this group.
                Console.WriteLine("Captures count = " + counter.ToString());
                
                // Loop through each capture in group.
                for (int ii = 0; ii < counter; ii++) 
                {
                    // Print capture and position.
                    Console.WriteLine(cc[ii] + "   Starts at character " + 
                        cc[ii].Index);
                }
            }
        }

        public static void Main() {
            RunTest();
        }
    }

此示例返回下面的输出结果。

[Visual Basic]
    Captured groups = 2
    Captures count = 1
    AbcAbcAbc   Starts at character 3
    Captures count = 3
    Abc   Starts at character 3
    Abc   Starts at character 6
    Abc   Starts at character 9

[C#]
    Captured groups = 2
    Captures count = 1
    AbcAbcAbc   Starts at character 3
    Captures count = 3
    Abc   Starts at character 3
    Abc   Starts at character 6
    Abc   Starts at character 9

Group

group 类表示来自单个捕获组的结果。因为 Group 可以在单个匹配中捕获零个、一个或更多的字符串（使用限定符），所以它包含 Capture 对象的集合。因为 Group 继承自 Capture，所以可以直接访问最后捕获的子字符串（Group 实例本身等价于 Captures 属性返回的集合的最后一项）。

Group 的实例是由 Match.Groups(groupnum) 属性返回的，或者在使用“(?<groupname>)”分组构造的情况下，是由 Match.Groups("groupname") 属性返回的。

以下代码示例使用嵌套的分组构造来将子字符串捕获到组中。

[Visual Basic]
    Dim matchposition(20) As Integer
    Dim results(20) As String
    ' Define substrings abc, ab, b.
    Dim r As New Regex("(a(b))c") 
    Dim m As Match = r.Match("abdabc")
    Dim i As Integer = 0
    While Not (m.Groups(i).Value = "")    
       ' Copy groups to string array.
       results(i) = m.Groups(i).Value     
       ' Record character position. 
       matchposition(i) = m.Groups(i).Index 
        i = i + 1
    End While

[C#]
    int[] matchposition = new int[20];
    String[] results = new String[20];
    // Define substrings abc, ab, b.
    Regex r = new Regex("(a(b))c"); 
    Match m = r.Match("abdabc");
    for (int i = 0; m.Groups[i].Value != ""; i++) 
    {
        // Copy groups to string array.
        results[i]=m.Groups[i].Value; 
        // Record character position.
        matchposition[i] = m.Groups[i].Index; 
    }

此示例返回下面的输出结果。

[Visual Basic]
    results(0) = "abc"   matchposition(0) = 3
    results(1) = "ab"    matchposition(1) = 3
    results(2) = "b"     matchposition(2) = 4

[C#]
    results[0] = "abc"   matchposition[0] = 3
    results[1] = "ab"    matchposition[1] = 3
    results[2] = "b"     matchposition[2] = 4

以下代码示例使用命名的分组构造，从包含“DATANAME:VALUE”格式的数据的字符串中捕获子字符串，正则表达式通过冒号“:”拆分数据。

[Visual Basic]
    Dim r As New Regex("^(?<name>\w+):(?<value>\w+)")
    Dim m As Match = r.Match("Section1:119900")

[C#]
    Regex r = new Regex("^(?<name>\\w+):(?<value>\\w+)");
    Match m = r.Match("Section1:119900");

此正则表达式返回下面的输出结果。

[Visual Basic]
    m.Groups("name").Value = "Section1"
    m.Groups("value").Value = "119900"

[C#]
    m.Groups["name"].Value = "Section1"
    m.Groups["value"].Value = "119900"

Capture

Capture 类包含来自单个子表达式捕获的结果。

以下示例在 Group 集合中循环，从 Group 的每一成员中提取 Capture 集合，并且将变量 posn 和 length 分别分配给找到每一字符串的初始字符串中的字符位置，以及每一字符串的长度。

[Visual Basic]
    Dim r As Regex
    Dim m As Match
    Dim cc As CaptureCollection
    Dim posn, length As Integer

    r = New Regex("(abc)*")
    m = r.Match("bcabcabc")
    Dim i, j As Integer
    i = 0
    While m.Groups(i).Value <> ""
        ' Grab the Collection for Group(i).
        cc = m.Groups(i).Captures
        For j = 0 To cc.Count - 1
            ' Position of Capture object.
            posn = cc(j).Index
            ' Length of Capture object.
            length = cc(j).Length
        Next j
        i += 1
    End While

[C#]
    Regex r;
    Match m;
    CaptureCollection cc;
    int posn, length;

    r = new Regex("(abc)*");
    m = r.Match("bcabcabc");
    for (int i=0; m.Groups[i].Value != ""; i++) 
    {
        // Capture the Collection for Group(i).
        cc = m.Groups[i].Captures; 
        for (int j = 0; j < cc.Count; j++) 
        {
            // Position of Capture object.
            posn = cc[j].Index; 
            // Length of Capture object.
            length = cc[j].Length; 
        }
    }

private void button1_Click(object sender, System.EventArgs e)
   {
    Regex regexTest = new Regex(@"[0-9]H",RegexOptions.IgnoreCase);
    Match match = regexTest.Match(this.textBox1.Text);
    System.Text.StringBuilder strb = new System.Text.StringBuilder();

GroupCollection groups = match.Groups;

    #region "while"
    while (match.Success)
    {
     if (strb.Length == 0)
     {
      strb.Append("Match Succeeded!");
     }
     for (int index = 0;index < groups.Count;index++)
     {
      strb.Append(Environment.NewLine);
      strb.Append(groups[index].Value);
     }
     match = match.NextMatch();
    }
    #endregion

    if (strb.Length == 0)
    {
     strb.Append("Match Failed!");
    }
    this.textBox2.Text = strb.ToString();

}

posted on 2010-01-22 14:22 VIP-爷阅读(5372) 评论(0) 收藏举报

刷新页面返回顶部

大哥

导航

公告