正则表达式之 C#后台应用

　　正则表达式在.Net就是用字符串表示，这个字符串格式比较特殊，无论多么特殊，在C#语言看来都是普通的字符串，具体什么含义由Regex类内部进行语法分析。

　　Regex 类存在于 System.Text.RegularExpressions 名称空间。

　　正则表达式可以进行字符串的匹配、字符串的提取、字符串的替换。

　　C#中分别对应正则表达式的三个重要方法。

1、 IsMatch() 返回值为bool类型：

格式：Regex.IsMatch("字符串", "正则表达式");

作用：判断字符串是否符合模板要求

例如：bool b =Regex.IsMatch("bbbbg","^b.*g$");判断字符串是否以b开头且以g结尾，中间可以有其他字符，若正确返回true，否则else。

2、 Match() 返回值为Match类型，只能匹配一个

Matches() 返回值为MatchCollection集合类型，匹配所有符合的

格式：Match match = Regex.Match("字符串", "正则表达式");

或MatchCollection matches= Regex. Matches ("字符串", "正则表达式");

作用：

①提取匹配的子字符串

②提取组。Groups的下标由1开始，0中存放match的value。

例如：

Match match = Regex.Match("age=30", @"^(.+)=(.+)$");
if (match.Success){     
    Console.WriteLine(match.Groups[0] .Value);//输出匹配的子字符串
    Console.WriteLine(match.Groups[1] .Value);//获取第一个分组的内容
    Console.WriteLine(match.Groups[2] .Value);//获取第二个分组的内容
}

或

MatchCollection matches = Regex.Matches("2010年10月10日", @"\d+");
for (int i = 0; i < matches.Count; i++)
{
    Console.WriteLine(matches[i].Value);
}

3、 Replace() 返回值为string

            //将所有的空格替换为单个空格
            string str = "   aa afds     fds  f ";
            str = Regex.Replace(str, @"\s+", " ");
            Console.WriteLine(str);

            string str = "hello“welcome to ”beautiful “China”";
            //hello"welcome to "beautiful "China"
            //$1表示引用第一组。$2表示用第二组。
            string strresult = Regex.Replace(str, "“(.+?)”", "\"$1\"");
            Console.WriteLine(strresult);

常用情况：

1、贪婪模式与终结贪婪模式

        string str = "1。 11。 111。 111。 ";
        //".+"表示匹配任意多个任意字符，会得到整个字符串
        //又因为需要匹配"。 "，所以得到结果为"1。 11。 111。 111。 "
        //贪婪模式
        Match matchA = Regex.Match(str, "^.+。 $");

        //"?"表示终极贪婪模式，匹配时会只取一个字符
        //又因为需要匹配"。 "，所以得到结果为"1。 "
        Match matchB = Regex.Match(str, "^.+?。 $");

　　如果发现结果与想象有差别，查看是否是贪婪模式造成的。

2、实际应用采集器（从某个网页上采集邮箱、图片或其他信息）、敏感词过滤、UBB翻译器。

[1]采集器

采集邮箱：

        string url = "http://www.example.com";
        //通过WebClient下载网页的源码
        System.Net.WebClient client = new System.Net.WebClient();
        client.Encoding = System.Text.Encoding.UTF8;
        string strHtml = client.DownloadString(url);
        //匹配邮箱
        MatchCollection collection = Regex.Matches(strHtml, @"^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$");
        for (int i = 0; i < collection.Count; i++)
        {
            Console.WriteLine(collection[i].Value);
        }

保存图片：

        string url = "https://www.example.com/";
        //通过WebClient下载网页的源码
        System.Net.WebClient client = new System.Net.WebClient();
        client.Encoding = System.Text.Encoding.UTF8;
        string strHtml = client.DownloadString(url);
        //假设需要匹配的字符串格式都为：<img alt="" src="img/example.jpeg" />
        //\s是为了针对写完<img回车的情况：<img\n alt="" src="img/example.jpeg" />
        MatchCollection collection = Regex.Matches(strHtml, "<img\\s*.*src=\"(.+?)\".*/>");
        for (int i = 0; i < collection.Count; i++)
        {
            string img = collection[i].Groups[1].Value.Replace("\"",string.Empty);
            client.DownloadFile(img, @"D:\Images\"+ Path.GetFileName(img));
            Console.WriteLine(collection[i].Value);
        }

网址处理：

        //把"http://www.example.com/"替换为
        //<a href="http://www.example.com/">http://www.example.com/</a>
        string url = "http://www.example.com/";
        url = Regex.Replace(url, @"(http://[a-zA-Z0-9_\-\?=\.&]+)", "<a href=\"$1\">$1</a>");

[2]敏感词过滤：

[3]UBB翻译：

posted on 2015-11-19 10:27 Now,DayBreak 阅读(1596) 评论(0) 编辑收藏举报

正则表达式 之 C#后台应用

正则表达式之 C#后台应用