JAVA笔记27-正则表达式(RegularExpressions)

正则表达式是字符串的处理利器。

用途:字符串匹配(字符匹配)、字符串查找、字符串替换

例如:IP地址是否正确、从网页中揪出email地址(如垃圾邮件)、从网页中揪出链接等

涉及到的类:java.lang.String, java.util.regex.Pattern, java.util.regex.Matcher

例1:Pattern是模式,Matcher是与模式匹配后的结果。

典型的调用顺序是

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();
import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        System.out.println("abc".matches("..."));  
        System.out.println("a3435f".replaceAll("\\d","-"));
        Pattern p = Pattern.compile("[a-z]{3}");
        Matcher m = p.matcher("fgh");
        System.out.println(m.matches());
        System.out.println("fgha".matches("[a-z]{3}"));
    }
}

 输出:

true
a----f
true
false

例2:

X? X,一次或一次也没有
X* X,零次或多次
X+ X,一次或多次
X{n} X,恰好 n
X{n,} X,至少 n
X{n,m} X,至少 n 次,但是不超过 m
import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        //?={0,1},      *={0,},     +={1,}  
        System.out.println("a".matches("."));  
        System.out.println("aa".matches("aa"));  
        System.out.println("aaaa".matches("a*"));  
        System.out.println("aaaa".matches("a+"));  
        System.out.println("aaaa".matches("a?"));  //false
        System.out.println("".matches("a*"));  
        System.out.println("".matches("a?"));  
        System.out.println("a".matches("a?"));  
        System.out.println("2455668678".matches("\\d{3,100}"));  
        System.out.println("192.168.0.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));//false
        System.out.println("192".matches("[0-2][0-9][0-9]"));  
    }
}

例3:[]代表其中任何一个字符,[^]代表除这些以外的一个字符

[abc] abc(简单类)
[^abc] 任何字符,除了 abc(否定)
[a-zA-Z] azAZ,两头的字母包括在内(范围)
[a-d[m-p]] admp[a-dm-p](并集)
[a-z&&[def]] def(交集)
[a-z&&[^bc]] az,除了 bc[ad-z](减去)
[a-z&&[^m-p]] az,而非 mp[a-lq-z](减去)
import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        System.out.println("a".matches("[abc]"));  
        System.out.println("a".matches("[^abc]"));  //除abc false
        System.out.println("A".matches("[a-zA-Z]"));  
        System.out.println("A".matches("[a-z]|[A-Z]")); 
        System.out.println("A".matches("[a-z[A-Z]]")); 
        System.out.println("R".matches("[A-Z&&[RFG]]")); 
    }
}

例4:

\d 数字:[0-9]
\D 非数字: [^0-9]
\s 空白字符:[ \t\n\x0B\f\r]
\S 非空白字符:[^\s]
\w 单词字符:[a-zA-Z_0-9]
\W 非单词字符:[^\w]

 

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        System.out.println(" \n\r\t".matches("\\s{4}"));  
        System.out.println(" ".matches("\\S"));  // false
        System.out.println("a_8".matches("\\w{3}"));  
        System.out.println("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+")); 
        System.out.println("\\".matches("\\\\")); 
    }
}

注意:正则表达式中,要匹配一个\,必须要用\\。而用字符串表示正则表达式时,正则表达式中的一个\就需要字符串中的两个\

 例5:POSIX字符类(不常用)

\p{Lower} 小写字母字符:[a-z]
\p{Upper} 大写字母字符:[A-Z]
\p{ASCII} 所有 ASCII:[\x00-\x7F]
\p{Alpha} 字母字符:[\p{Lower}\p{Upper}]
\p{Digit} 十进制数字:[0-9]
\p{Alnum} 字母数字字符:[\p{Alpha}\p{Digit}]
\p{Punct} 标点符号:!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph} 可见字符:[\p{Alnum}\p{Punct}]
\p{Print} 可打印字符:[\p{Graph}\x20]
\p{Blank} 空格或制表符:[ \t]
\p{Cntrl} 控制字符:[\x00-\x1F\x7F]
\p{XDigit} 十六进制数字:[0-9a-fA-F]
\p{Space} 空白字符:[ \t\n\x0B\f\r]
import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        System.out.println("a".matches("\\p{Lower}"));  
    }
}

例6:边界匹配

^ 行的开头
$ 行的结尾
\b 单词边界
\B 非单词边界
\A 输入的开头
\G 上一个匹配的结尾
\Z 输入的结尾,仅用于最后的结束符(如果有的话)
\z 输入的结尾

注:^在[]中是取反的意思,在[]外表示行的开头。

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        System.out.println("hello sir".matches("^h.*")); 
        System.out.println("hello sir".matches(".*ir$"));  
        System.out.println("hello sir".matches("^h[a-z]{1,3}o\\b.*"));  
        System.out.println("hellosir".matches("^h[a-z]{1,3}o\\b.*"));  //false
        System.out.println(" \n".matches("^[\\s&&[^\\n]]*\\n$"));//空白行
    }
}

 

练习1:true or false?

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        System.out.println("aaa 8888c".matches(".*\\d{4}.")); 
        System.out.println("aaa 8888c".matches(".*\\b\\d{4}."));  //true!
        System.out.println("aaa 8888c".matches(".{3}\\b\\d{4}."));  //false
        System.out.println("aaa8888c".matches(".*\\d{4}."));  
        System.out.println("aaa8888c".matches(".*\\b\\d{4}."));  //false
    }
}

例7:matches find lookingAt

matches是匹配整个字符串,find是找子串,两者会相互影响,它们都会吃掉已经判断过的字符串。 

find不必须从头开始匹配,只要找到匹配的就可以

lookingAt每次都从开头找

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        String s = "123-34545-234-00";
        Pattern p = Pattern.compile("\\d{3,5}");
        Matcher m = p.matcher(s);
        System.out.println(m.matches());//false
        m.reset();
        System.out.println(m.find()); 
        System.out.println(m.find()); 
        System.out.println(m.find()); 
        System.out.println(m.find()); //false
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
    }
}
import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        String s = "123-34545-234-00";
        Pattern p = Pattern.compile("\\d{3,5}");
        Matcher m = p.matcher(s);
        System.out.println(m.matches());//false
        //m.reset();
        System.out.println(m.find()); 
        System.out.println(m.find()); 
        System.out.println(m.find()); //false
        System.out.println(m.find()); //false
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
    }
}

 例8:[start end)    包含start,不包含end

 

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        String s = "123--34545--234-00";
        Pattern p = Pattern.compile("\\d{3,5}");
        Matcher m = p.matcher(s);
        System.out.println(m.matches());//false
        m.reset();
        System.out.println(m.find()); 
        System.out.println(m.start()+"-"+m.end()); 
        System.out.println(m.find()); 
        System.out.println(m.start()+"-"+m.end()); 
        System.out.println(m.find()); 
        System.out.println(m.start()+"-"+m.end()); 
        System.out.println(m.find()); //false
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
        System.out.println(m.lookingAt()); 
    }
} 

输出:

false
true
0-3
true
5-10
true
12-15
false
true
true
true
true

例9:替换

(1)

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        Pattern p = Pattern.compile("java");
        Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
        while(m.find()){
            System.out.println(m.group()); 
        }
    }
}

输出:

java
java
java
java

(2)

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
        while(m.find()){
            System.out.println(m.group()); 
        }
    }
}

输出:

java
Java
JAva
java
JAVA
java
java

(3)

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
        System.out.println(m.replaceAll("JAVA")); 
    }
}

输出:

JAVA JAVA JAVA JAVA IloveJAVA YOUhateJAVAJAVA end

(4)

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end");
        StringBuffer buf = new StringBuffer();
        int i = 0 ;
        while(m.find()){
            i++;
            if(i%2 == 0){
                m.appendReplacement(buf,"java");
            }else{
                m.appendReplacement(buf,"JAVA");
            }
        }
        m.appendTail(buf);
        System.out.println(buf); 
    }
}

输出:

JAVA java JAVA java IloveJAVA YOUhatejavaJAVA end

例10:分组:标号是左小括号数。

import java.util.regex.*;
public class Test{
    public static void main(String args[]){
        Pattern p = Pattern.compile("(\\d{3,5})([a-z]{2})");
        String s = "123aa-34556bb-456cc-00";
        Matcher m = p.matcher(s);
        while(m.find()){
            System.out.println(m.group(1));
        }
        
    }
}

输出:

123
34556
456

如果是group(),则输出

123aa
34556bb
456cc

练习1:抓取网页中的email地址

import java.util.regex.*;
import java.io.*;
public class Test{
    public static void main(String args[]){
        try{
            BufferedReader br = new     BufferedReader(new FileReader("abc.htm"));
            String s = null ;
            while((s = br.readLine())!= null){
                parse(s);
            }
        }catch(FileNotFoundException e){
            e.printStackTrace();
        }catch(IOException e){
            e.printStackTrace();
        }
    }
    private static void parse(String s){                
        Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
        Matcher m = p.matcher(s);
        while(m.find()){
            System.out.println(m.group());
        }
    }
}

 存入文件:

import java.util.regex.*;
import java.io.*;
public class Test{
    public static void main(String args[]){
        try{
            BufferedReader br = new     BufferedReader(new FileReader("abc.htm"));
            BufferedWriter bw = new BufferedWriter(new FileWriter("email.txt"));
            String s = null ;
            while((s = br.readLine())!= null){
                parse(s,bw);
            }
        bw.close();
        }catch(FileNotFoundException e){
            e.printStackTrace();
        }catch(IOException e){
            e.printStackTrace();
        }
    }
    private static void parse (String s, BufferedWriter bw) throws IOException{                
        Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
        Matcher m = p.matcher(s);
        while(m.find()){
            bw.write(m.group());
            bw.newLine();
        }
        bw.flush();
    }
}

 练习2:统计代码行数

import java.util.regex.*;
import java.io.*;
public class CodeCounter{
    static long normalLines = 0;
    static long commentLines = 0;
    static long whiteLines = 0;
    public static void main(String args[]){
        File f = new File("E:/javacode/20140426");
        File[] codeFiles = f.listFiles();
        for(File child : codeFiles){
            if(child.getName().matches(".*\\.java$"))
                parse(child);
        }
    System.out.println("normalLines: "+normalLines);
    System.out.println("commentLines: "+commentLines);
    System.out.println("whiteLines: "+whiteLines);
    }

    private static void parse(File f){
        BufferedReader br = null ;
        boolean comment = false;
        try{
            br = new BufferedReader(new FileReader(f));
            String line = "";
            while((line = br.readLine())!=null){
                line = line.trim();
                if(line.matches("^[\\s&&[^\\n]]*$")){
                    whiteLines++;
                }else if(line.startsWith("/*")&&line.endsWith("*/")){
                    commentLines++;
                }else if(line.startsWith("/*")&&!line.endsWith("*/")){
                    commentLines++;
                    comment=true;
                }else if(true == comment){
                    commentLines++;
                    if(line.endsWith("*/")){
                        comment=false;
                    }
                }else if(line.startsWith("//")){
                    commentLines++;
                }else{
                    normalLines++;
                }
            }
        }catch(FileNotFoundException e){
            e.printStackTrace();
        }catch(IOException e){
            e.printStackTrace();
        }finally{
            if(br!=null){
                try{
                    br.close();
                    br=null;
                }catch(IOException e){
                    e.printStackTrace();
                }
            }
        }
    }
}

 

posted @ 2014-04-26 22:09  seven7seven  阅读(317)  评论(0编辑  收藏  举报