您的位置:首页 > 数据库 > Oracle

ORACLE 11g 由新特性引发lsnrctl hang住卡死迷雾的详细剖析历程

2016-10-17 21:20 453 查看
 
 

1、问题描述

同事说卡住了,连接oracle数据库很慢,需要很久,连上了做一个简单的查询也非常慢,感觉像是hang主了一般。
 
 

2、分析oracle服务器负载

一开始登录进去,查看oracle服务器,负载很低,服务器毫无压力,感觉不是服务器卡的问题了:
[root@pldb236 data]# w
 19:59:47 up 122 days,  4:32,  4 users,  load average: 0.65, 0.71, 0.59
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     tty1     :0               17Jun16 122days  5:00   5:00  /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-YoRXlJ/database -nolisten tcp vt1
root     pts/0    192.168.120.154  12:49    4:42m  0.04s  0.04s -bash
root     pts/1    192.168.120.154  12:49    0.00s  0.13s  0.05s w
root     pts/2    192.168.120.154  13:41    6:09m  0.06s  0.02s -bash
[root@pldb236 data]#
 
 
 
 

3、分析oracle服务器lsnrctl监听

既然不是oracle服务器负载的问题,那么就再看看lsnrctl监听状态,比较慢,而且看到最后有异常信息:
[oracle@pldb236 admin]$ lsnrctl status
 
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:27:51
 
Copyright (c) 1991, 2009, Oracle.  All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
 
# 然后卡住了,卡住了,hang主了,没有任何动弹,需要很久很久才能释放出来。
 
 
# 再看下tnsping下服务,也比较慢,发现有错误问题出现,如下所示
[oracle@pldb236 admin]$ tnsping PD236
 
TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 13:21:24
 
Copyright (c) 1997, 2009, Oracle.  All rights reserved.
 
Used parameter files:
 
 
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.10.104)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = powerdes)))
TNS-12547: TNS:lost contact
[oracle@pldb236 admin]$
 
 
 
 
 
 
 
 

4、分析监听lsnrctl的trace跟踪文件

进去lsnrctl管理界面看trace文件目录:
(1)进去lsnrctl管理命令行
[oracle@pldb236 oradata]$ lsnrctl
 
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-OCT-2016 20:36:34
 
Copyright (c) 1991, 2009, Oracle.  All rights reserved.
 
Welcome to LSNRCTL, type "help" for information.
 
LSNRCTL>
(2)查看命令
LSNRCTL> show
The following operations are available after show
An asterisk (*) denotes a modifier or extended command:
 
rawmode                            displaymode                       
rules                              trc_file                          
trc_directory                      trc_level                         
log_file                           log_directory                     
log_status                         current_listener                  
inbound_connect_timeout            startup_waittime                  
snmp_visible                       save_config_on_stop               
dynamic_registration               enable_global_dynamic_endpoint    
oracle_home                        pid                               
 
(3)查看跟踪文件已经文件目录
LSNRCTL> show trc_file
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
LISTENER parameter "trc_file" set to ora_2994_139957275965184.trc
The command completed successfully
LSNRCTL>
LSNRCTL> show trc_directory
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
LISTENER parameter "trc_directory" set to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace
The command completed successfully
LSNRCTL>
 
 
 
然后去看后台的日志,看到显示正常,没有发现路径错误之类的问题:
[root@pldb236 ~]# tail -f /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_2994_139957275965184.trc
2016-10-17 13:42:38.443831 : nsbrfr:nsbfs at 0x25d94c0, data at 0x25aa940.
2016-10-17 13:42:38.443843 : nsbrfr:nsbfs at 0x256f3b0, data at 0x2576610.
2016-10-17 13:42:38.443855 : nsbrfr:nsbfs at 0x25a4ac0, data at 0x25af850.
2016-10-17 13:42:38.443867 : nsbrfr:nsbfs at 0x25a4b70, data at 0x25b52c0.
2016-10-17 13:42:38.443880 : nsbrfr:nsbfs at 0x25b1fb0, data at 0x25b8a10.
2016-10-17 13:42:38.443891 : nsbrfr:nsbfs at 0x2568f10, data at 0x2566670.
2016-10-17 13:42:38.443903 : nsbrfr:nsbfs at 0x256afe0, data at 0x256b090.
2016-10-17 13:42:38.443916 : nsbrfr:nsbfs at 0x250fd00, data at 0x2568fc0.
2016-10-17 13:42:38.443937 : nlse_term_audit:entry
2016-10-17 13:42:38.443951 : nlse_term_audit:exit
 

 

5、出绝招lsnrctl重启

绝招无效,还是很慢,甚至lsnrctl stop以及lsnrctl start都非常慢。
# stop很慢,卡住了,至少需要50秒才能完成
LSNRCTL> stop
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
……卡呀卡……卡到外婆家
The command completed successfully
LSNRCTL> start   # start会快一些,快到10秒到20秒就完成了。
Starting /oracle/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
 
TNSLSNR for Linux: Version 11.2.0.1.0 - Production
System parameter file is /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                17-OCT-2016 20:46:58
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File         /oracle/app/oracle/diag/tnslsnr/pldb236/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.104)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Services Summary...
Service "powerdes" has 1 instance(s).
  Instance "powerdes", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
LSNRCTL>
 
 
PS:重启之后,远程连接oracle实例还是非常慢的,所以重启也没有啥效果
 
 
 

6、开启lsnrctl的trace 16分析

尝试开启更高一级别的trace16,看看,这回stop快了许多,但是远程连接还是非常慢的,开启过程如下:
LSNRCTL> trace 16
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.104)(PORT=1521)))
Opened trace file: /oracle/app/oracle/diag/tnslsnr/pldb236/listener/trace/ora_14662_140432957417216.trc
The command completed successfully
LSNRCTL>
PS:然后去查看这个日志,也没有发现异常信息,看来此途径无效了。
 
 
 

7、迷雾中的灵光一闪

各种地方都考虑检查到了,问题还是没有解决,仔细想来,这种情况也就是做了rman恢复之后的事情,在没有做rman恢复之前,都是正常的;
 
Rman恢复、rman恢复、rman恢复、rman恢复……
 
对了,rman恢复的是2015年的数据,那么数据库用户名密码也是2015年的,2016年数据库用户密码修改过了,想起了rman恢复是恢复了所有的东西包括环境变量博包括系统参数等等。那么意味着现在应用连接这个数据库所使用的用户名密码是连接不上的,连接不上就会报错啊,但是alert日志后台没有看到,那会是啥问题呢,突然脑中亮光一闪,oracle11g新特性密码验证延迟啊,会卡住数据库,用户连接不上去了。
 
赶紧去屏蔽密码延迟验证:
SQL> ALTER SYSTEM SET EVENT = '28401 TRACE NAME CONTEXT FOREVER, LEVEL 1' SCOPE = SPFILE;
 
System altered.
 
SQL> create pfile from spfile;
 
File created.
 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
 
Total System Global Area 6680915968 bytes
Fixed Size              2213936 bytes
Variable Size              4898949072 bytes
Database Buffers    1744830464 bytes
Redo Buffers                34922496 bytes
Database mounted.
Database opened.
SQL>
 
 
然后皆大欢喜,连接数据库也不卡了,上去查询个数据也不卡了,一切都很快速了。因此分析是就是oracle11g的新特性密码验证延迟引发lsnrctl hang住了,以前也遇到类似的由于oracle11g新特性引发的问题,当时花了漫长的时间去分析,去求证,而且详情参见我以前整理得博客blog记录,里面针对oracle11g特性有比较深入的探讨,blog连接地址为:http://blog.csdn.net/mchdba/article/details/51794443
 
 
 
另外trace参考文章地址:http://blog.itpub.net/17203031/viewspace-713587/
 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: