Golang基础-正则表达式

backticks

When using backticks (`) to make strings(Raw string literals), backslashes (\) don't have any special meaning and don't mark the beginning of special characters like tabs \t or newlines \n:

"\t\n" // regular string literal with 2 characters: a tab and a newline
`\t\n`// raw string literal with 4 characters: two backslashes, a 't', and an 'n'

"\\" // string with a single backslash
`\\` // string with 2 backslashes
  • 可以多行
  • 反斜杠不表示转义,仅表示自己
  • 因为正则表达式里有很多反斜杠,这样就不用一直转义反斜杠了

常用函数

import (
	"fmt"
	"regexp"
)

re, err := regexp.Compile(`(a|b)+`)
// MustCompile要保证正则表达式没错误
re = regexp.MustCompile(`[a-z]+\d*`)

b = re.MatchString("[a12]")       // => true
b = re.MatchString("12abc34(ef)") // => true
b = re.MatchString(" abc!")       // => true
b = re.MatchString("123 456")     // => false    

s = re.FindString("[a12]")       // => "a12"
s = re.FindString("12abc34(ef)") // => "abc34"
s = re.FindString(" abc!")       // => "abc"
s = re.FindString("123 456")     // => ""

re = regexp.MustCompile(`[a-z]+(\d*)`)
sl = re.FindStringSubmatch("[a12]")       // => []string{"a12","12"}
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"}
sl = re.FindStringSubmatch(" abc!")       // => []string{"abc",""}
sl = re.FindStringSubmatch("123 456")     // => <nil>

s = re.ReplaceAllString("[a12]", "X")       // => "[X]"
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)"
s = re.ReplaceAllString(" abc!", "X")       // => " X!"
s = re.ReplaceAllString("123 456", "X")     // => "123 456"

sl = re.Split("[a12]", -1)      // => []string{"[","]"}
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"}
sl = re.Split(" abc!", -1)      // => []string{" ","!"}
sl = re.Split("123 456", -1)    // => []string{"123 456"}
  • [, 标记一个中括号表达式的开始。要匹配 [,请使用 [。
  • +, 匹配前面的子表达式一次或多次。要匹配 + 字符,请使用 +。
  • \d, 匹配一个数字字符。等价于 [0-9]。
  • *, 匹配前面的子表达式零次或多次。例如,zo* 能匹配 "z" 以及 "zoo"。* 等价于{0,}。

所以上面的正则表达式[a-z]+\d*表示a到z出现一次或多次,数字出现零次或多次。

FindStringSubmatch

从字符串中提取符合要求的子串。

  • (), 标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符,请使用 \( 和 \)。

也就是说把要提取内容的正则表达式放在括号里就能提出来了。

package main

import (
	"fmt"
	"regexp"
)

func main() {
	re := regexp.MustCompile(`I am ([A-Za-z]+), (\d+) years old`)

	s := re.FindStringSubmatch("I am GG, 18 years old")
	fmt.Printf("%T\n", s)
	for i := range s {
		fmt.Println(s[i])
	}
}

// =>
// []string
// I am GG, 18 years old
// GG
// 18

返回一个string slice,第一个元素是整个匹配结果,后面是匹配到的每个分组

Exercise

  • Task1
    You need some idea of how many log lines in your archive do not comply with current standards. You believe that a simple test reveals whether a log line is valid. To be considered valid a line should begin with one of the following strings:
    [TRC]
    [DBG]
    [INF]
    [WRN]
    [ERR]
    [FTL]
  • Task2
    A new team has joined the organization, and you find their log files are using a strange separator for "fields". Instead of something sensible like a colon ":" they use a string such as "<--->" or "<=>" (because it's prettier) in fact any string that has a first character of "<" and a last character of ">" and any combination of the following characters "~", "*", "=" and "-" in between.
  • Task3
    The team needs to know about references to passwords in quoted text so that they can be examined manually.
  • Task4
    You have found that some upstream processing of the logs has been scattering the text "end-of-line" followed by a line number (without an intervening space) throughout the logs.
  • Task5
    You have noticed that some of the log lines include sentences that refer to users. These sentences always contain the string "User", followed by one or more space characters, and then a user name. You decide to tag such lines.
package parsinglogfiles

import (
	"fmt"
	"regexp"
)

func IsValidLine(text string) bool {
	re := regexp.MustCompile(`^\[(TRC|DBG|INF|WRN|ERR|FTL)\]`)
	return re.MatchString(text)
}

func SplitLogLine(text string) []string {
	re := regexp.MustCompile(`<[~*=-]*>`)
	return re.Split(text, -1)
}

func CountQuotedPasswords(lines []string) int {
	re := regexp.MustCompile(`(?i)".*password.*"`)
	res := 0
	for _, l := range lines {
		if re.MatchString(l) {
			res++
		}
	}
	return res
}

func RemoveEndOfLineText(text string) string {
	re := regexp.MustCompile(`end-of-line\d*`)
	return re.ReplaceAllString(text, "")
}

func TagWithUserName(lines []string) []string {
	re := regexp.MustCompile(`User\s+(\w+)`)
	for i, l := range lines {
		founds := re.FindStringSubmatch(l)
		if founds != nil {
			lines[i] = fmt.Sprintf("[USR] %s %s", founds[1], l)
		}
	}
	return lines
}
posted @ 2023-02-20 17:06  roadwide  阅读(145)  评论(0编辑  收藏  举报