grep的学习

grep

grep：global search regular expression(RE) and print out the line (全面搜索正则表达式并把行打印出来)

grep在数据中查找一个字符串时，是以“整行”为单位进行数据选取的

1. 定义

1) grep是一种强大的文本搜索工具，它能使用正则表达式搜索文本，并把匹配的行打印出来。Unix的grep家族包括grep、egrep和fgrep。egrep和fgrep的命令只跟grep有很小不同。egrep是grep的扩展，支持更多的re元字符， fgrep就是fixed grep或fast grep，它们把所有的字母都看作单词，也就是说，正则表达式中的元字符表示回其自身的字面意义，不再特殊。linux使用GNU版本的grep。它功能更强，可以通过-G、-E、-F命令行选项来使用egrep和fgrep的功能。

2) grep的工作方式是这样的，它在一个或多个文件中搜索字符串模板。如果模板包括空格，则必须被引用，模板后的所有字符串被看作文件名。搜索的结果被送到屏幕，不影响原文件内容。

3) grep可用于shell脚本，因为grep通过返回一个状态值来说明搜索的状态，如果模板搜索成功，则返回0，如果搜索不成功，则返回1，如果搜索的文件不存在，则返回2。我们利用这些返回值就可进行一些自动化的文本处理工作。

2. 语法：grep [options] filename

1） -A NUM，--after-context=NUM 除了列出匹配行之外，还列出其后NUM行

范例1：

ompmsc35 chuntaoh> cat test1

ompmsc35 chuntaoh> grep -A 1 'b' test1

2） -a或--text

grep原本是搜寻文字文件，若拿二进制的档案作为搜寻的目标，则会显示如下的讯息: Binary file 二进制文件名matches 然后结束。

若加上-a参数则可将二进制档案视为文本文件搜寻，相当于--binary-files=text这个参数。

范例2：

ompmsc35 chuntaoh> grep 'redistribute' /bin/mv

Binary file /bin/mv matches

ompmsc35 chuntaoh> grep -a 'redistribute' /bin/mv

This is free software. You may redistribute copies of it under the terms of

范例3：

(1)找出一个二进制文件。如/usr/bin/[,

ompmsc35 chuntaoh> file /usr/bin/[

/usr/bin/[: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.18, dynamically linked (uses shared libs), stripped

(2)二进制文件用strings查看

ompmsc35 chuntaoh> strings [

/lib64/ld-linux-x86-64.so.2

__gmon_start__

libc.so.6

setlocale

mbrtowc

optind

fflush_unlocked

dcgettext

error

__lxstat

iswprint

(2）使用grep -a

ompmsc35 chuntaoh> grep -a shell /usr/bin/[

3） -B NUM，--before-context=NUM

与-A NUM 相对，但这此参数是显示除符合行之外，并显示在它之前的NUM行。

范例3：

ompmsc35 chuntaoh> cat test1

ompmsc35 chuntaoh> grep -B 1 'b' test1

4） -b, --byte-offset 打印匹配行前面的文本总共有多少byte

范例4：

ompmsc35 chuntaoh> cat test1

ompmsc35 chuntaoh> grep -b 'a' test1 #前面的行有0字节

0:a1

ompmsc35 chuntaoh> grep -b '1' test1

0:a1

ompmsc35 chuntaoh> grep -b 'b' test1 #前面的行有3个字节，因为\n也算一个字节

3:b2

ompmsc35 chuntaoh> grep -b '2' test1

3:b2

ompmsc35 chuntaoh> od -N4 test1 -t c #N只查看4个字节，显t按ASCII显示，可知\n也算一个字节

0000000 a 1 \n b

0000004

ompmsc35 chuntaoh> od -N6 test1 -t c

0000000 a 1 \n b 2 \n

0000006

5)-C [NUM]

-NUM

--context[=NUM] 列出匹配行之外并列出上下各NUM行，默认值是2，为什么不能用默认的两行

范例5：

ompmsc35 chuntaoh> grep -C 2 '3' test1

ompmsc35 chuntaoh> grep -2 '3' test1

ompmsc35 chuntaoh> grep --context=2 '3' test1

6）-c：计算找到”搜索字符串”的个数。不显示符合样式行，只显示符合的总行数。

若再加上-v,--invert-match，参数显示不符合的总行数。

ompmsc35 chuntaoh> grep -c '3' test1

ompmsc35 chuntaoh> grep -c '3' test1 -v

7） -d ACTION, --directories=ACTION

若输入的档案是一个目录，使用ACTION去处理这个目录。
预设ACTION是read(读取)，也就是说此目录会被视为一般的档案；
若ACTION是skip(略过)，目录会被grep略过
若ACTION是recurse(递归)，grep会去读取目录下所有的档案，此相当于-r 参数

ompmsc35 chuntaoh> cp test1 ./dir/test1

ompmsc35 chuntaoh> grep -d recurse '3' dir

dir/test1:c3

ompmsc35 chuntaoh> grep -r '3' dir

dir/test1:c3

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir

/home/chuntaoh/dir/test1:c3

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir -d skip #跳过目录，没有输出

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir -d read #看作一般文档，没有输出

8）-E, --extended-regexp 采用规则表示式去解释样式。相当于egrep

ompmsc35 chuntaoh> grep '3|4' test1 #一般情况下，不能用|分隔两个匹配方式

ompmsc35 chuntaoh> grep '3\|4' test1 #但是如果加了\转义，则可以

ompmsc35 chuntaoh> egrep '3|4' test1 #egrep可用|

9)-e PATTERN, --regexp=PATTERN

指定多个匹配模式，很到满足两个模式中任意一个的所有结果

通常用在避免partern用-开始。

ompmsc35 chuntaoh> cat test1

-c3

ompmsc35 chuntaoh> grep '-c' test1 #没有输出

ompmsc35 chuntaoh> grep -e -c test1

-c3

范例6：-e: 指定多个匹配模式，很到满足两个模式中任意一个的所有结果

ompmsc35 chuntaoh> cat test

one

two

three

four

five

six

ompmsc35 chuntaoh> grep -e t -e f test

two

three

four

five

输出了含有字符t或字符f的所有行，也可使用正则表达式

ompmsc35 chuntaoh> grep [tf] test

two

three

four

five

10）-f FILE, --file=FILE

事先将要搜寻的样式写入到一个档案，一行一个样式。然后采用档案搜寻。空的档案表示没有要搜寻的样式，因此也就不会有任何符合。

ompmsc35 chuntaoh> cat test1

-c3

ompmsc35 chuntaoh> cat reg

ompmsc35 chuntaoh> grep -f reg test1

-c3

11）-G, --basic-regexp 将样式视为基本的规则表示式解释。(此为预设)

12）-H, --with-filename 在每个符合样式行前加上符合的文件名称，若有路径会显示路径

ompmsc35 chuntaoh> grep -H 'c' /home/chuntaoh/test1

/home/chuntaoh/test1:-c3

ompmsc35 chuntaoh> pwd

/home/chuntaoh

ompmsc35 chuntaoh> grep -H 'c' test1

test1:-c3

13）-h, --no-filename 与-H参数相类似，但在输出时不显示文件名

ompmsc35 chuntaoh> grep -h 'c' /home/chuntaoh/test1

-c3

ompmsc35 chuntaoh> grep -h 'c' test1

-c3

14） --help 产生简短的help讯息。

15）-I grep会强制认为此二进制档案没有包含任何搜寻样式，与--binary-files=without-match参数相同

ompmsc35 chuntaoh> grep -a 'redistribute' /bin/mv

This is free software. You may redistribute copies of it under the terms of

ompmsc35 chuntaoh> grep -I 'redistribute' /bin/mv

16) --binary-files=TYPE
此参数TYPE预设为binary(二进制)

若以普通方式搜寻，只有2种结果:
1.若有符合的地方：显示Binary file 二进制文件名matches
2.若没有符合的地方：什么都没有显示。
若TYPE为without-match，遇到此参数，grep会认为此二进制档案没有包含任何搜寻样式，与-I 参数相同。
若TPYE为text, grep会将此二进制文件视为text档案，与-a 参数相同。
　Warning: --binary-files=text 若输出为终端机，可能会产生一些不必要的输出。

以普通方式搜寻:

ompmsc35 chuntaoh> grep 'redistribute' /bin/mv

Binary file /bin/mv matches

ompmsc35 chuntaoh> grep --binary-files=text 'redistribute' /bin/mv

This is free software. You may redistribute copies of it under the terms of

17）-i, --ignore-case 忽略大小写，包含要搜寻的样式及被搜寻的档案。

ompmsc35 chuntaoh> grep -i 'C' test1

-c3

18） -L, --files-without-match 不显示平常一般的输出结果，反而显示出没有符合的文件名称

ompmsc35 chuntaoh> grep -L 'c' test1 test2

test2

19) -l, --files-with-matches 不显示平常一般的输出结果，只显示符合的文件名称

ompmsc35 chuntaoh> grep -l 'c' test1 test2

test1

20）--mmap 不懂

如果可能，使用mmap系统呼叫去读取输入，而不是预设的read系统呼叫。

在某些状况，--mmap 能产生较好的效能。然而，--mmap如果运作中档案缩短，或I/O 错误发生时，可能造成未定义的行为(包含core dump)。

21）-n, --line-number 在显示行前，标上行号。

ompmsc35 chuntaoh> grep -n '3' test1

3:-c3

22）-q, --quiet, --silent 不显示任何的一般输出。请参阅-s或--no-messages

grep -q用于if逻辑判断

突然发现grep -q 用于if 逻辑判断很好用。

-q : 安静模式，不打印任何标准输出。如果有匹配的内容则立即返回状态值0。

# cat a.txt

nihao

nihaooo

hello

# if grep -q hello a.txt ; then echo yes;else echo no; fi

yes

# if grep -q word a.txt; then echo yes; else echo no; fi

23） -R -r, --recursive 递归地，读取每个资料夹下的所有文件，此相当于-d recsuse 参数

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir

/home/chuntaoh/dir/test1:c3

-r/-R

ompmsc35 chuntaoh> grep -R 'goface' /home/chuntaoh

/home/chuntaoh/goface.txt:goface

/home/chuntaoh/goface.txt:gofaceme

ompmsc35 chuntaoh> grep -r 'goface' /home/chuntaoh

/home/chuntaoh/goface.txt:goface

/home/chuntaoh/goface.txt:gofaceme

24） -s, --no-messages 不显示关于不存在或无法读取的错误信息。

不懂

小注: 不像GNU grep，传统的grep不符合POSIX.2协议，因为缺乏-q参数，且他的-s 参数表现像GNU grep的 -q 参数。
Shell Script倾向将传统的grep移植，避开-q及-s参数，且将输出限制到/dev/null。POSIX: 定义UNIX及UNIX-like系统需要提供的功能

ompmsc35 chuntaoh> grep 'c3' test1 test2 test3

test1:-c3

grep: test3: No such file or directory

ompmsc35 chuntaoh> grep -s 'c3' test1 test2 test3

test1:-c3

25） -V, --version显示出grep的版本号到标准错误。

当在回报有关grep的bugs时，grep版本号是必须要包含在内的。

26）-v, --invert-match 显示除搜寻样式行之外的全部。

ompmsc35 chuntaoh> grep -v 'c3' test1

27）w, –word-regexp 意思就是精确匹配，匹配单词还不是字符串，如想匹配“is”,”this”就不会被匹配

ompmsc35 chuntaoh> cat goface.txt

goface

gofaceme

ompmsc35 chuntaoh> grep 'goface' goface.txt

goface

gofaceme

ompmsc35 chuntaoh> grep -w 'goface' goface.txt

goface

28）-x, --line-regexp 将搜寻样式视为一行去搜寻，完全符合该"行"的行才会被列出

ompmsc35 chuntaoh> cat test1

bb2

-c3

ompmsc35 chuntaoh> grep -x 'b2' test1

ompmsc35 chuntaoh> grep -x 'bb2' test1

bb2

3.grep正则表达式元字符集(基本集)

锚定行的开始如：'^grep'匹配所有以grep开头的行。

锚定行的结束如：'grep$'匹配所有以grep结尾的行。

匹配一个非换行符的字符如：'gr.p'匹配gr后接一个任意字符，然后是p。

匹配零个或多个先前字符如：'*grep'匹配所有0个或多个空格后紧跟grep的行。.*一起用代表任意字符。

[]

匹配一个指定范围内的字符，如：'[Gg]rep'匹配Grep和grep。

[^]

匹配一个不在指定范围内的字符，如：'[^A-FH-Z]rep'匹配不包含A-F和H-Z的一个字母开头，紧跟rep的行。

$..$

标记匹配字符，如：'$love$'，love被标记为1。

锚定单词的开始，如：'\<grep'匹配包含以grep开头的单词的行。

锚定单词的结束，如'grep\>'匹配包含以grep结尾的单词的行。

x\{m\}

连续重复字符x，m次，如：'o\{5\}'匹配包含连续5个o的行。

x\{m,\}

连续重复字符x,至少m次，如：'o\{5,\}'匹配至少连续有5个o的行。

x\{m,n\}

连续重复字符x，至少m次，不多于n次，如：'o\{5,10\}'匹配连续5--10个o的行。

匹配一个文字和数字字符，也就是[A-Za-z0-9]，如：'G\w*p'匹配以G后跟零个或多个文字或数字字符，然后是p。

w的反置形式，匹配一个非单词字符，如点号句号等。\W*则可匹配多个。

单词锁定符，如: '\bgrep\b'只匹配grep，即只能是grep这个单词，两边均为空格。

4. 用于egrep和 grep -E的元字符扩展集

匹配一个或多个先前的字符。如：'[a-z]+able'，匹配一个或多个小写字母后跟able的串，如loveable,enable,disable等。

匹配零个或多个先前的字符。如：'gr?p'匹配gr后跟一个或没有字符，然后是p的行。

a|b|c

匹配a或b或c。如：grep|sed匹配grep或sed

()

分组符号，如：love(able|rs)ov+匹配loveable或lovers，匹配一个或多个ov。

x,x,x

作用同x,x,x

5. POSIX字符类

为了在不同国家的字符编码中保持一至，POSIX(The Portable Operating System Interface)增加了特殊的字符类，如[:alnum:]是A-Za-z0-9的另一个写法。要把它们放到[]号内才能成为正则表达式，如[A-Za-z0-9]或[[:alnum:]]。在linux下的grep除fgrep外，都支持POSIX的字符类。

fgrep把所有的字母都看作单词，也就是说，正则表达式中的元字符表示回其自身的字面意义，不再特殊。

类等价的正则表达式解释

[[:upper:]] [A-Z] 小写字符

[[:lower:]] [a-z] 大写字符

[[:alpha:]] [a-zA-Z] 文字字符

[[:alnum:]] [0-9a-zA-Z] 文字数字字符

[[:digit:]] [0-9] 数字字

[[:space:]] [空格或tab键等] 所有空白字符（新行，空格，制表符）

[:graph:] 空字符（非空格、控制字符）

[:cntrl:] 控制字符

[:print:] 非空字符（包括空格）

[:punct:] 标点符号

[:xdigit:] 十六进制数字（0-9，a-f，A-F）

6. 特殊范例

范例7：匹配每段为3个数字的IP地址

grep "[0-9]\{3\}.[0-9]\{3\}.[0-9]\{3\}.[0-9]\{3\}" file

范例8：匹配所有的IP地址

grep "[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9]\{1,3\}" file

范例9：显示所有包含每个字符串至少有5个连续小写字符的字符串的行。

grep '[a-z]\{5\}' aa

范例10：

grep 'w$es$t.*\1' aa

如果west被匹配，则es就被存储到内存中，并标记为1，然后搜索任意个字符（.*），这些字符后面紧跟着另外一个es（），找到就显示该行。如果用egrep或grep -E，就不用"\"号进行转义，直接写成'w(es)t.*'就可以了。

ompmsc35 chuntaoh> cat test2

westljfldes

ompmsc35 chuntaoh> grep 'w$es$.*\1' test2

westljfldes

ompmsc35 chuntaoh> egrep 'w(es).*\1' test2

westljfldes

范例11：

ompmsc35 chuntaoh> cat file1.txt

dfdf

This is the first line of file1.txt.

fdf

ompmsc35 chuntaoh> cat file2.txt

fdfldjf

This is the First line of file2.txt.

fdfdlfjl

ompmsc35 chuntaoh> grep '[Ff]irst' *.txt #此时能找到两个文件中的两行

file1.txt:This is the first line of file1.txt.

file2.txt:This is the First line of file2.txt.

ompmsc35 chuntaoh> grep [Ff]irst *.txt #此时能找到两个文件中的两行

file1.txt:This is the first line of file1.txt.

file2.txt:This is the First line of file2.txt.

ompmsc35 chuntaoh> touch first #如果不加'',且当前目录下存在first文件,那么只能找到在别一文件中找到含有first的那一行

ompmsc35 chuntaoh> grep [Ff]irst *.txt

file1.txt:This is the first line of file1.txt.

ompmsc35 chuntaoh> rm first

ompmsc35 chuntaoh> grep [Ff]irst *.txt

file1.txt:This is the first line of file1.txt.

file2.txt:This is the First line of file2.txt.

ompmsc35 chuntaoh> touch First #如果不加'',且当前目录下存在First文件,那么只能找到在别一文件中找到含有First的那一行

ompmsc35 chuntaoh> grep [Ff]irst *.txt

file2.txt:This is the First line of file2.txt.

范例12：在所有文件中查询单词”sort it”
grep “sort it” *

范例13：查询空行，查询以某个条件开头或者结尾的行。
结合使用^和$可查询空行。使用- n参数显示实际行数
ompmsc35 chuntaoh> grep -n '^$' test1 #说明第3行，第4行是空行
3:
4:
ompmsc35 chuntaoh>

http://blog.chinaunix.net/uid-7294334-id-168180.html

posted on 2014-05-29 14:44 hank1982 阅读(126) 评论(0) 收藏举报

刷新页面返回顶部

grep的学习

导航

公告