c# 中的正则表达式

正则表达式提供了强大，高效，灵活的文本处理方式

1.快速分析大量文本，找到特定的字符--海量字符串数据分析，从十万条邮箱账户中，查找一个账户？
2.提取，编辑，删除文本子字符串--文本处理，大量数据处理，特定字符串处理
3.将字符串添加到集合以生成报告--可以进行统计分析

正则表达式对象引擎

Regex类，Regex.IsMatch() 确定是否为查找的有效字符串

public class Example
{
   public static void Main()
   {
      string[] values = { "111-22-3333", "111-2-3333"};
      string pattern = @"^\d{3}-\d{2}-\d{4}$";
      foreach (string value in values) {
         if (Regex.IsMatch(value, pattern))
            Console.WriteLine("{0} is a valid SSN.", value);
         else   
            Console.WriteLine("{0}: Invalid", value);
      }
   }
}

输出

111-22-3333 is a valid SSN.
111-2-3333: Invalid

模式	描述
^	匹配输入字符串的开头部分
\d	匹配三个十进制数字
-	匹配连字符
\d	匹配两个十进制数字
-	匹配连字符
\d	匹配四个十进制数字
$	匹配输入字符串的末尾部分

Regex.Match(),Regex.Matchs() 第一个返回匹配文本信息的匹配项，第二个返回所有匹配的匹配项

public class Example
{
   public static void Main()
   {
      string input = "This is a a farm that that raises dairy cattle."; 
      string pattern = @"\b(\w+)\W+(\1)\b";
      Match match = Regex.Match(input, pattern);
      while (match.Success)//找到相应的字符串则返回true，否则则为false
      {
         Console.WriteLine("Duplicate '{0}' found at position {1}.",  
                           match.Groups[1].Value, match.Groups[2].Index);
         match = match.NextMatch();//匹配下一个
      }                       
   }
}

 \\这里用Regex.Matchs可以得到一样的效果
foreach (Match match in Regex.Matches(input, pattern))
      Console.WriteLine("Duplicate '{0}' found at position {1}.",  
                   match.Groups[1].Value, match.Groups[2].Index);

输出

Duplicate 'a' found at position 10.
Duplicate 'that' found at position 22.

模式	描述
\b	在单词边界处开始匹配。
(\w+)	匹配一个或多个单词字符,这是第一个捕获组
\W+	匹配一个或多个非单词字符
(\1)	与第一个捕获的字符串匹配。这是第二个捕获组
\b	在单词边界处结束匹配。

Regex.Replace 进行日期格式更改和字符串无效字符串的删除

using System;
using System.Text.RegularExpressions;
public class Example
{
   public static void Main()
   {
      string pattern = @"\b\d+\.\d{2}\b";
      string replacement = "$$$&"; 
      string input = "Total Cost: 103.64";
      Console.WriteLine(Regex.Replace(input, pattern, replacement));     
   }
}

输出：

Total Cost: $103.64

\b\d+.\d{2}\b 含义：

模式	描述
\b	在单词边界处开始匹配。
\d+	匹配一个或多个十进制数字
.	匹配句点
\d	匹配两个十进制数字
\b	在单词边界处结束匹配

\[$& 的含义: |模式| 替换字符串| |:---:|:------:| |$$ | 美元符号 ($) 字符| |$& | 整个匹配的子字符串| 组及组的捕获 ----------------------------------------------------------------------------------- ----------------------------------------------------------------------------------- Match.Groups 属性返回一个 GroupCollection 对象，该对象包含多个 Group 对象，这些对象表示单个匹配项中的捕获的组。 **单个组**： ```c# using System; using System.Text.RegularExpressions; public class Example { public static void Main() { string pattern = @"\b(\w+)\s(\d{1,2}),\s(\d{4})\b"; string input = "Born: July 28, 1989"; Match match = Regex.Match(input, pattern); if (match.Success) for (int ctr = 0; ctr < match.Groups.Count; ctr++) Console.WriteLine("Group {0}: {1}", ctr, match.Groups[ctr].Value); } } // 输出： // Group 0: July 28, 1989 // Group 1: July // Group 2: 28 // Group 3: 1989 ``` **多个组** ```c# List matchposition = new List(); List results = new List(); // 定义abc,ab,b子串 Regex r = new Regex("(a(b))c"); Match m = r.Match("abdabc"); for (int i = 0; m.Groups[i].Value != ""; i++) { // 将捕获组添加到Groups对象中 results.Add(m.Groups[i].Value); // 标记对象位置 matchposition.Add(m.Groups[i].Index); } // 输出程序结果 for (int ctr = 0; ctr < results.Count; ctr++) Console.WriteLine("{0} at position {1}", results[ctr], matchposition[ctr]); //输出： // abc at position 3 // ab at position 3 // b at position 4 ``` **捕获集合** ```c# using System; using System.Text.RegularExpressions; public class Example { public static void Main() { string pattern = "((a(b))c)+"; string input = "abcabcabc"; Match match = Regex.Match(input, pattern); if (match.Success) { Console.WriteLine("Match: '{0}' at position {1}", match.Value, match.Index); GroupCollection groups = match.Groups; for (int ctr = 0; ctr < groups.Count; ctr++) { Console.WriteLine(" Group {0}: '{1}' at position {2}", ctr, groups[ctr].Value, groups[ctr].Index); CaptureCollection captures = groups[ctr].Captures; for (int ctr2 = 0; ctr2 < captures.Count; ctr2++) { Console.WriteLine(" Capture {0}: '{1}' at position {2}", ctr2, captures[ctr2].Value, captures[ctr2].Index); } } } } } // 输出： // Match: 'abcabcabc' at position 0 // Group 0: 'abcabcabc' at position 0 // Capture 0: 'abcabcabc' at position 0 // Group 1: 'abc' at position 6 // Capture 0: 'abc' at position 0 // Capture 1: 'abc' at position 3 // Capture 2: 'abc' at position 6 // Group 2: 'ab' at position 6 // Capture 0: 'ab' at position 0 // Capture 1: 'ab' at position 3 // Capture 2: 'ab' at position 6 // Group 3: 'b' at position 7 // Capture 0: 'b' at position 1 // Capture 1: 'b' at position 4 // Capture 2: 'b' at position 7 ``` **单个捕获** ```c# using System; using System.Text.RegularExpressions; public class Example { public static void Main() { string input = "Miami,78;Chicago,62;New York,67;San Francisco,59;Seattle,58;"; string pattern = @"((\w+(\s\w+)*),(\d+);)+"; Match match = Regex.Match(input, pattern); if (match.Success) { Console.WriteLine("Current temperatures:"); for (int ctr = 0; ctr < match.Groups[2].Captures.Count; ctr++) Console.WriteLine("{0,-20} {1,3}", match.Groups[2].Captures[ctr].Value, match.Groups[4].Captures[ctr].Value); } } } // 输出 // Current temperatures: // Miami 78 // Chicago 62 // New York 67 // San Francisco 59 ```\]

posted @ 2016-08-08 11:28 {Black_Jack} 阅读(141) 评论(0) 收藏举报

刷新页面返回顶部

Black_Jack

雨脚半收檐断线，雪林初下瓦疏珠。

c# 中的正则表达式

正则表达式提供了强大，高效，灵活的文本处理方式

正则表达式对象引擎

公告