java正则表达式

java正则表达式类库java.util.regex主要有三个类：

1.Pattern：

Pattern对象是一个正则表达式的编译表示，该方法接受一个正则表达式作为它的第一个参数。

2.Matcher：

Matcher对象是对输入字符串进行解释和匹配操作的引擎。

3.PatternSyntaxException：

PatternSyntaxException是一个非强制异常类，它表示一个正则表达式模式中的语法错误。

模式匹配的典型调用方法:

public class Test {
    private static final String REGEX = "[a-zA-Z! ]*";
    private static final String INPUT = "Hello world!";

    public static void main(String[] args) {
        // m1
        boolean b1 = INPUT.matches(REGEX);

        // m2
        boolean b2 = Pattern.matches(REGEX, INPUT);

        // m3
        Pattern pattern = Pattern.compile(REGEX);
        Matcher matcher = pattern.matcher(INPUT);
        boolean b3 = matcher.matches();

    }
}

其中m1，m2的本质还是调用m3。

表1中为常用的正则表达式语法。

对于特殊字符需要用“\”符号转译，因为“\”也是java的转译字符，所以需要“\\”，而如果匹配“\”，则需要“\\\\”。除了用“\”，某些情况下用“[]”似乎也可以，下面为为简单的实例：

    public static void main(String[] args) {
        System.out.println("(".matches("[(]")); // output: true
        System.out.println("[".matches("\\[")); // output: true
        System.out.println("\\".matches("\\\\")); // output: true

        "*".matches("*"); // java.util.regex.PatternSyntaxException
        "*".matches("*"); // java.util.regex.PatternSyntaxException

    }

Matcher的常用方法

1.find方法

我们可以用find()方法找出所有匹配的字符串及其位置：

    public static void main(String[] args) {
        String regex = "o[a-z]";
        String input = "All for one, one for all";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        while (m.find()) {
            System.out.println("value:" + m.group() + "; start:" + m.start() + "; end:" + m.end());
        }
        /*
         output:
         value:or; start:5; end:7
         value:on; start:8; end:10
         value:on; start:13; end:15
         value:or; start:18; end:20
        */
    }

2.matches 和lookingAt 方法：

public static void main(String[] args) {
        String regex = "hello";
        String input = "hello world";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        System.out.println("matches:"+m.matches()+"; lookingAt:"+m.lookingAt());
        /*
         output:
         matches:false; lookingAt:true
         */
    }

find、matches、lookingAt三个方法都可用来匹配，其中find方法只要在字符串任意位置匹配成功即为true；matches需要整个字符串完全匹配才为true；lookingAt则只需要字符串的开始部分匹配就为true。

3.replaceFirst 和replaceAll 方法：

replaceFirst 和replaceAll 方法用来替换匹配正则表达式的文本。不同的是，replaceFirst 替换首次匹配，replaceAll 替换所有匹配，它们都是通过调用appendReplacement 方法实现的。

    public static void main(String[] args) {
        String regex = "o";
        String input = "hello world";
        String replace = "a";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        System.out.println(m.replaceAll(replace));
        System.out.println(m.replaceFirst(replace));
        /*
        output:
            hella warld
            hella world
        */
    }

4.appendReplacement 和 appendTail 方法：

appendReplacement方法用于文本替换，appendTail 方法用于添加最后一次匹配后多余的尾部

    public static void main(String[] args) {
        String regex = "hello";
        String input = "hello hello hhh";
        String replace = "hi";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        StringBuffer sb = new StringBuffer();
        while (m.find()) {
            m.appendReplacement(sb, replace);
        }
        System.out.println(sb.toString()); // output: hi hi
        m.appendTail(sb);
        System.out.println(sb.toString()); // output: hi hi hhh
    }

5.reset方法：

此方法可重置需要被处理的字符串，这里就不给示例了。

捕获组：

捕获组是把多个字符当一个单独单元进行处理的方法，它通过对括号内的字符分组来创建，group(0)为需要被处理的字符串。

public static void main(String[] args) {
        String regex = "(.*)(.{7,})";
        String input = "All for one, one for all";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        boolean found = m.find();
        int count = m.groupCount();
        if (found) {
            for (int i = 0; i <= count; i++) {
                System.out.println("count:" + i + "; value:" + m.group(i));
            }
        }

        /*
         output:
         count:0; value:All for one, one for all
         count:1; value:All for one, one 
         count:2; value:for all
         */
    }

Pattern的split方法可将字符串进行分割，split的第一个参数为待处理字符串，第二个参数为最多被分割的份数。

    public static void main(String[] args) {
        String regex = " ";
        String input = "dog cat mouse chicken duck rabbit";
        String[] animals = Pattern.compile(regex).split(input, 4);
        System.out.println(Arrays.toString(animals));
        
        //output: [dog, cat, mouse, chicken duck rabbit]
        
    }

String的split方法就是通过调用Pattern的split方法实现的，不过在String中，当regex为某些特殊情况时作了特殊处理。

表1.正则表达式常用语法

字符	说明
\	转义符。
^	匹配输入字符串开始的位置。
$	匹配输入字符串结尾的位置。
*	零次或多次匹配前面的字符或子表达式。等效于 {0,}。
+	一次或多次匹配前面的字符或子表达式。等效于 {1,}。
?	零次或一次匹配前面的字符或子表达式。等效于 {0,1}。
{n}	匹配 n 次。
{n,}	匹配 n 次或以上。
{n,m}	匹配至少 n 次，至多 m 次。注意：不能将空格插入逗号和数字之间。
.	匹配除"\r\n"之外的任何单个字符。若要包括"\r\n"，可用"[\s\S]"模式。
x\|y	匹配 x 或 y。
[xyz]	字符集。匹配包含的任一字符。
[^xyz]	反向字符集。匹配未包含的任何字符。
[a-z]	字符范围。
[^a-z]	反向范围字符。匹配不在指定的范围内的任何字符。
\b	匹配一个字边界，即字与空格间的位置。
\B	非字边界匹配。
\d	数字字符匹配。等效于 [0-9]。
\D	非数字字符匹配。等效于 [^0-9]。
\f	换页符匹配。等效于 \x0c 和 \cL。
\n	换行符匹配。等效于 \x0a 和 \cJ。
\r	匹配一个回车符。等效于 \x0d 和 \cM。
\s	匹配任何空白字符，包括空格、制表符、换页符等。与 [ \f\n\r\t\v] 等效。
\S	匹配任何非空白字符。与 [^ \f\n\r\t\v] 等效。
\w	匹配任何字类字符，包括下划线。与"[A-Za-z0-9_]"等效。
\W	与任何非单词字符匹配。与"[^A-Za-z0-9_]"等效。
\un	匹配 n，其中 n 是以四位十六进制数表示的 Unicode 字符。

表2.常用正则表达式

规则	正则表达式语法
汉字	[\u4e00-\u9fa5]
邮政编码	^[1-9]\d{5}$
QQ号码	^[1-9]\d{4,10}$
邮箱	^[a-zA-Z_]{1,}[0-9]{0,}@(([a-zA-z0-9]-*){1,}\.){1,3}[a-zA-z\-]{1,}$
手机号码	^1[3\|4\|5\|8][0-9]\d{8}$
URL	^((http\|https)://)?([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?$
18位身份证号	^(\d{6})(18\|19\|20)?(\d{2})([01]\d)([0123]\d)(\d{3})(\d\|X\|x)?$

posted @ 2016-03-25 18:54 maozs 阅读(369) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

maozs

java正则表达式

公告