您的位置:首页 > 运维架构 > Linux

Linux: 文本查找包含某个关键字的段落(awk实现)

2017-09-22 23:04 597 查看
Linux: 文本查找包含某个关键字的段落(awk实现)

AIX中的grep命令的"-p"选项可以查找包含某个关键字的段落(我们这里把段落定义为由空行分开的记录,段与段之间有至少一个空行),比如下面的文本中有两段:

$ cat test.txt

Hello,world

This is a file with

two paragraph.

下面的命令可以查找db2diag.log中每个数据库取消激活的段落:
$ grep -ip 'DEACTIVATED' db2diag.log

$ grep -ip 'DEACTIVATED' db2diag.log
2017-09-17-12.03.33.048373+480 E1594733A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-81                 APPID: *LOCAL.e105q5a.170917035458
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

2017-09-17-12.03.58.149245+480 E1601224A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-109                APPID: *LOCAL.e105q5a.170917040333
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

2017-09-17-12.16.49.507211+480 E1609705A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-125                APPID: *LOCAL.e105q5a.170917040401
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO


linux中的grep命令的-p选项有完全不同的含义,而且没有其他选项来实现这个功能。换个思路,能不能把“一段”作为“一行”来处理呢?答案是可以的,这时候就显示awk的强大了,awk有两个关键字,如下:

 ORS terminates each record on output, initially = "\n".

 RS input record separator, initially = "\n".

RS表示行分割符,默认是换行符'\n'。如果把“一段”当作“一行”,那么“行”与“行”之间的分割符就是两个或以上的换行符,所以,只需要指定RS为"\n\n+"就可以了,awk手册中提供了一个绝佳的范本:
12. Multi-line records
Since mawk interprets RS as a regular expression, multi-line records are easy.  Setting RS = "\n\n+",
makes  one or more blank lines separate records.  If FS = " " (the default), then single newlines, by
the rules for <SPACE> above, become space and single newlines are field separators.

For example, if a file is "a b\nc\n\n", RS = "\n\n+" and FS = " ", then there  is  one  record
"a b\nc"  with  three fields "a", "b" and "c".  Changing FS = "\n", gives two fields "a b" and
"c"; changing FS = "", gives one field identical to the record.

If you want lines with spaces or tabs to be considered blank, set RS = "\n([ \t]*\n)+".  For compati-
bility  with  other awks, setting RS = "" has the same effect as if blank lines are stripped from the
front and back of files and then records are determined as if RS = "\n\n+".  Posix requires that "\n"
always separates records when RS = "" regardless of the value of FS.  mawk does not support this con-
vention, because defining "\n" as <SPACE> makes it unnecessary.

Most of the time when you change RS for multi-line records, you will  also  want  to  change  ORS  to
"\n\n" so the record spacing is preserved on output.


所以,在linux下面,命令如下:   

$ awk 'BEGIN {RS = "\n\n+";ORS = "\n\n"} /DEACTIVATED/ {print $0}' db2diag.log

如果要反选,即不包含关键字的段落,在关键字前加上!

$ awk 'BEGIN {RS = "\n\n+";ORS = "\n\n"} !/DEACTIVATED/ {print $0}' db2diag.log

另外,也可以直接将RS设置为空串,效果是一样的

awk 'BEGIN {RS = "";ORS = "\n\n"} /DEACTIVATED/ {print $0}' db2diag.log

也可以按照其他方式分段,只需要指定正确的RS值即可。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  linux awk