Boost中的正则表达式

转自http://blog.sina.com.cn/s/blog_70dd03910100np6u.html

正则表达式的几个知识点：

1."." 代表任意字符

2. "*" 代表任意重复，同"+"

3. \d 数字; \s 空格; {} 指定长度

4. \A 缓冲区起始部分 \Z缓冲区结束部分

5. \w 匹配一个字符，支持宽字符

6. ? 表示匹配0次或者1次

可以用在*后面，组成*?，用于非贪婪的重复，即符合条件的最小字符串

(\d+),? 用","分割的数字,最后一个?保证最后一个数字可以被匹配

7. ^补集 [^123] 非123外的字符

8. {5}长度为5，{2,4}长度为2||3||4，{2,} 2或者更多

Boost中的正则表达式接口

1.头文件 <boost/regex.hpp>

2. boost::regex reg("(a.*)"); //声明一个正则表达式

3. boost::regex_match(); //完全匹配一个正则表达式

bool b=boost::regex_match(
"This expression could match from A and beyond.",
reg);

4. boost::regex_search()

使用示例：

#include <iostream>
#include <string>
#include "boost/regex.hpp"

int main() {
// "new" and "delete" 出现的次数是否一样？
boost::regex reg("(new)|(delete)");
boost::smatch m;
std::string s=
"Calls to new must be followed by delete. \
Calling simply new results in a leak!";
int new_counter=0;
int delete_counter=0;
std::string::const_iterator it=s.begin();
std::string::const_iterator end=s.end();

while (boost::regex_search(it,end,m,reg)) {
// 是 new 还是 delete?
m[1].matched ? ++new_counter : ++delete_counter;
it=m[0].second;
}

if (new_counter!=delete_counter)
std::cout << "Leak detected!\n";
else
std::cout << "Seems ok...\n";
}

由于正则表达式中可以是多个子表达式的或，所以有m[0] m[1]，代表匹配的是哪一个子表达式。其中boost::smatch 是match_result的typedef定义，boost中预定义了四种类型分别代表cchar,cwchar,cstring,cwstring. 我们也可以声明其他类型的match_result，如match_results<LPCWSTR> . match_result是一个集合,集合中的每个元素代表regex表达式中子表达式的匹配结果。对于一下两种表达式，match_result有何区别？

("(new)|(delete)") VS ("(new)(delete)");

对于第一个，是一个或的关系，所以在匹配的时候，只可能有一个子表达式获得匹配，所以此时的m[1]和m[2]只可能有一个是matched。

对于第二个，实际是一个与。因此每次匹配完成后，m[1]和m[2]都是matched.

另外，通过实验，m[0]应该是对整个表达式匹配的结果，如，对于上面第二个表达式，m[0].first = m[1].first而m[0].second = m[1].second。

5. boost::regex_replace

用正则表达式来进行字符替换。使用上和search差不多，不过多了个替换模式。

eg：

int main() {
boost::regex reg("(Colo)(u)(r)",
boost::regex::icase|boost::regex::perl);//icase代表忽略大小写，perl应该是默认的参数，设置自己的参数要记得或上原先的参数

std::string s="Colour, colours, color, colourize";

s=boost::regex_replace(s,reg,"$1$3"); //$2被去掉。也就是要删除u。
std::cout << s; //输出：Colour, colors, color, colorize
}

6. 使用regex_iterator来遍历匹配结果

class regex_callback
{
    int sum_;public:  regex_callback() : sum_(0) {}
    template <typename T> void operator()(const T& what) {    sum_+=atoi(what[1].str().c_str()); }
    int sum() const {    return sum_;  }
};

//main
      boost::regex reg("(\\d+),?");
      std::string s="1,1,2,3,5,8,13,21";
      boost::sregex_iterator it(s.begin(),s.end(),reg);  //构造函数需要指定区间以及所要用的正则式
      boost::sregex_iterator end;
      regex_callback c;
      int sum=for_each(it,end,c).sum();   //sum = 54
      std::cout<<sum;

如果不用以上方式，我们要用while循环来调用reg_search。

regex_token_iterator ，作用和regex_iterator相似，它返回由特定字符分割的字符串。上面程序重写如下：

class regex_callback
{
    int sum_;public:  regex_callback() : sum_(0) {}
    template <typename T> void operator()(const T& what) {    sum_+=atoi(what.str().c_str());  }
    int sum() const {    return sum_;  }
};

      boost::regex reg(",");
      std::string s="1,1,2,3,5,8,13,21";
      boost::sregex_token_iterator it(s.begin(),s.end(),reg, -1);
      boost::sregex_token_iterator end;
      regex_callback c;
      int sum=for_each(it,end,c).sum();
      std::cout<<sum;

posted on 2014-03-29 22:08 言止予思阅读(2625) 评论(0) 收藏举报