[Python] Regular Expressions

1. regular expression

Regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.

 

2.re module

re module supports Perl-like regular expression.

The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

 

To avoid any confusion while dealing with regular expressions, we would use Raw Strings as r'expression'.

 

 3. match function

Syntax:
re.match(pattern, string, flags=0)
pattern #a regular expression to be matched
string #a string will be searched to match the pattern at the beginning of string
flags #modifiers. You can specify different flags using bitwise OR (|).

  

returns a match object on success, None on failure

 

Example:

import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

#group() is Match Object Methods
#group() represent all the string
#group(1) represent one word before pattern in the string
#group(2) represent one word after pattern in the string

  

4. search function

#Syntax:
re.search(pattern, string, flags=0)
#pattern: This is the regular expression to be matched.
#string: This is the string, which would be searched to match the pattern anywhere in the string.
#flags: the same as match()  

  

returns a match object on success, none on failure

 

Its group method is the same as match.

 

import re

line = "Cats are smater than dogs."

searchObj = re.search(r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
    print "searchObj.group(): ", searchObj.group()
    print "searchObj.group(1): ", searchObj.group(1)
    print "searchObj.group(2): ", searchObj.group(2)
else:
    print "no match"

  

5. Match VS Search

match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string

import re

line = "Cats are smater than dogs."

searchObj = re.search(r'dogs', line, re.M|re.I)
matchObj = re.match(r'dogs', line, re.M|re.I)

if searchObj:
    print "searchObj.group(): ", searchObj.group()
else:
    print "no match\n"

if matchObj:
    print "matchObj.group(): ", matchObj.group()
else:
    print "no match\n

  

When the code is executed, it produced the following result:

searchObj.group(): Cats are smater than dogs.
no match

  

6. sub

#syntax:
re.sub(pattern, repl, string, max=0)
#This method replaces all occurrences of the RE pattern in string with repl,
#substituting all occurrences unless max provided. 
#This method returns modified string.

  

Explame:

import re

phone = "32580-110-517 #nhmhhh"

#Delete python style comment
num = re.sub(r'#.*$', "", phone)
print "phone num:", num

#Delete non-digit characters
num = re.sub(r'\D', "", phone)
print "phone num:", num

  

When the above code is executed, it produces the following result −

 

phone num:32580-110-517 
phone num:32580110517 

  

7. Regular Expression Modifiers: Option flags

 You can provide multiple modifiers using exclusive OR (|).

re.I #Performs case-insensitive matching.
re.L #Interprets words according to the current locale.
re.M #Makes $ match the end of a line
#(not just the end of the string)
#makes ^ match the start of any line
#(not just the start of the string)
re.S #Makes a period (dot) match any character, including a newline.
re.U #Interprets letters according to the Unicode character set.
re.X #Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.

  

8. Regular Expression Patterns

https://www.tutorialspoint.com/python/python_reg_expressions.htm

  

 

posted @ 2017-02-06 01:19  KennyRom  阅读(327)  评论(0编辑  收藏  举报