您的位置：首页 > 运维架构 > Linux

Linux: 文本查找包含某个关键字的段落（awk实现）

2017-09-22 23:04 597 查看

Linux: 文本查找包含某个关键字的段落（awk实现）

AIX中的grep命令的"-p"选项可以查找包含某个关键字的段落（我们这里把段落定义为由空行分开的记录，段与段之间有至少一个空行），比如下面的文本中有两段：

$ cat test.txt

Hello,world

This is a file with

two paragraph.

下面的命令可以查找db2diag.log中每个数据库取消激活的段落：
$ grep -ip 'DEACTIVATED' db2diag.log

$ grep -ip 'DEACTIVATED' db2diag.log
2017-09-17-12.03.33.048373+480 E1594733A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-81                 APPID: *LOCAL.e105q5a.170917035458
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

2017-09-17-12.03.58.149245+480 E1601224A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-109                APPID: *LOCAL.e105q5a.170917040333
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

2017-09-17-12.16.49.507211+480 E1609705A513         LEVEL: Event
PID     : 19726438             TID : 3343           PROC : db2sysc 0
INSTANCE: e105q5a              NODE : 000           DB   : SAMPLE
APPHDL  : 0-125                APPID: *LOCAL.e105q5a.170917040401
AUTHID  : E105Q5A              HOSTNAME: db2b
EDUID   : 3343                 EDUNAME: db2agent (idle) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FreeResourcesOnDBShutdown, probe:15127
STOP    : DATABASE: SAMPLE   : DEACTIVATED: NO

linux中的grep命令的-p选项有完全不同的含义，而且没有其他选项来实现这个功能。换个思路，能不能把“一段”作为“一行”来处理呢？答案是可以的，这时候就显示awk的强大了，awk有两个关键字，如下：

ORS terminates each record on output, initially = "\n".

RS input record separator, initially = "\n".

RS表示行分割符，默认是换行符'\n'。如果把“一段”当作“一行”，那么“行”与“行”之间的分割符就是两个或以上的换行符，所以，只需要指定RS为"\n\n+"就可以了，awk手册中提供了一个绝佳的范本：

12. Multi-line records
Since mawk interprets RS as a regular expression, multi-line records are easy.  Setting RS = "\n\n+",
makes  one or more blank lines separate records.  If FS = " " (the default), then single newlines, by
the rules for <SPACE> above, become space and single newlines are field separators.

For example, if a file is "a b\nc\n\n", RS = "\n\n+" and FS = " ", then there  is  one  record
"a b\nc"  with  three fields "a", "b" and "c".  Changing FS = "\n", gives two fields "a b" and
"c"; changing FS = "", gives one field identical to the record.

If you want lines with spaces or tabs to be considered blank, set RS = "\n([ \t]*\n)+".  For compati-
bility  with  other awks, setting RS = "" has the same effect as if blank lines are stripped from the
front and back of files and then records are determined as if RS = "\n\n+".  Posix requires that "\n"
always separates records when RS = "" regardless of the value of FS.  mawk does not support this con-
vention, because defining "\n" as <SPACE> makes it unnecessary.

Most of the time when you change RS for multi-line records, you will  also  want  to  change  ORS  to
"\n\n" so the record spacing is preserved on output.

所以，在linux下面，命令如下：

$ awk 'BEGIN {RS = "\n\n+";ORS = "\n\n"} /DEACTIVATED/ {print $0}' db2diag.log

如果要反选，即不包含关键字的段落，在关键字前加上!

$ awk 'BEGIN {RS = "\n\n+";ORS = "\n\n"} !/DEACTIVATED/ {print $0}' db2diag.log

另外，也可以直接将RS设置为空串，效果是一样的

$ awk 'BEGIN {RS = "";ORS = "\n\n"} /DEACTIVATED/ {print $0}' db2diag.log

也可以按照其他方式分段，只需要指定正确的RS值即可。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： linux awk

相关文章推荐

新的分享

章节导航