用oradebug short_stack及strace -p分析oracle进程是否dead或出现故障
2017-09-10 20:26
471 查看
1,可以采用oradebug或者strace -p跟踪后台或前台进程是否dead或hang住
2,如果进程出现故障,必会在对应的TRC文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断
日志文件在background_dump_dest
3,采用 ll -lhrt *lgwr*|tail -10f 获取最新的进程的TRC文件
4,而且出现故障时,多半会在ALERT日志记录相关信息,此是排除故障重要且首要的方法及思路
5,oradebug setospid ospid
oradebug short_stack
会显示进程的堆栈信息,注意:可以间隔多次运行,如果多次显示的堆栈信息一致,可以肯定此进程肯定是dead或出现故障了
6,可以用strace -p ospid跟踪分析,
---hang或故障时的类似信息如下
semtimedop(9273344, 0x7fffe66199d0, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
---正常时的类似信息如下
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440015944
semtimedop(9273344, 0x7fffe661b1f0, 1, {1, 800000000}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440017025
open("/proc/4385/stat", O_RDONLY) = 35
read(35, "4385 (oracle) S 1 4385 4385 0 -1"..., 999) = 225
说白了,就是看信息有没有变化,有变化就说明进程是正常的,否则就说明是不正常的
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
查看后台进程
SQL> select pid,spid,pname,username from v$process order by 1;
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
1
2 4385 PMON oracle
3 4387 VKTM oracle
4 4391 GEN0 oracle
5 4393 DIAG oracle
6 4395 DBRM oracle
7 4397 PSP0 oracle
8 4399 DIA0 oracle
9 4401 MMAN oracle
10 4403 DBW0 oracle
11 4405 LGWR oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
12 4407 CKPT oracle
13 4409 SMON oracle
14 4411 RECO oracle
15 4413 MMON oracle
16 4415 MMNL oracle
17 4417 D000 oracle
18 4419 S000 oracle
19 4652 SMCO oracle
20 5266 W000 oracle
21 4936 oracle
27 4468 ARC0 oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
28 4481 ARC1 oracle
29 4486 ARC2 oracle
30 4489 ARC3 oracle
31 4496 QMNC oracle
32 4549 Q000 oracle
33 4551 Q001 oracle
34 4568 oracle
29 rows selected.
SQL>
---查看TRC文件目录
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 213 Dec 14 19:05 guowang_lgwr_5297.trm
-rw-r----- 1 oracle oinstall 2.4K Dec 14 19:05 guowang_lgwr_5297.trc
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
---HANG LGWR
SQL> oradebug setospid 4405
Oracle pid: 11, Unix process pid: 4405, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.
--ALERT同步记录上述信息
Tue Dec 15 04:46:15 2015
Unix process pid: 4405, image: oracle@seconary (LGWR) flash frozen [ command #1 ]
---TRC目录同步记录上述信息
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 04:46 guowang_lgwr_4405.trm
-rw-r----- 1 oracle oinstall 896 Dec 15 04:46 guowang_lgwr_4405.trc
[oracle@seconary trace]$
2,如果进程出现故障,必会在对应的TRC文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断
日志文件在background_dump_dest
3,采用 ll -lhrt *lgwr*|tail -10f 获取最新的进程的TRC文件
4,而且出现故障时,多半会在ALERT日志记录相关信息,此是排除故障重要且首要的方法及思路
5,oradebug setospid ospid
oradebug short_stack
会显示进程的堆栈信息,注意:可以间隔多次运行,如果多次显示的堆栈信息一致,可以肯定此进程肯定是dead或出现故障了
6,可以用strace -p ospid跟踪分析,
---hang或故障时的类似信息如下
semtimedop(9273344, 0x7fffe66199d0, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
---正常时的类似信息如下
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440015944
semtimedop(9273344, 0x7fffe661b1f0, 1, {1, 800000000}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440017025
open("/proc/4385/stat", O_RDONLY) = 35
read(35, "4385 (oracle) S 1 4385 4385 0 -1"..., 999) = 225
说白了,就是看信息有没有变化,有变化就说明进程是正常的,否则就说明是不正常的
测试
SQL> select * from v$version where rownum=1;BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
查看后台进程
SQL> select pid,spid,pname,username from v$process order by 1;
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
1
2 4385 PMON oracle
3 4387 VKTM oracle
4 4391 GEN0 oracle
5 4393 DIAG oracle
6 4395 DBRM oracle
7 4397 PSP0 oracle
8 4399 DIA0 oracle
9 4401 MMAN oracle
10 4403 DBW0 oracle
11 4405 LGWR oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
12 4407 CKPT oracle
13 4409 SMON oracle
14 4411 RECO oracle
15 4413 MMON oracle
16 4415 MMNL oracle
17 4417 D000 oracle
18 4419 S000 oracle
19 4652 SMCO oracle
20 5266 W000 oracle
21 4936 oracle
27 4468 ARC0 oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
28 4481 ARC1 oracle
29 4486 ARC2 oracle
30 4489 ARC3 oracle
31 4496 QMNC oracle
32 4549 Q000 oracle
33 4551 Q001 oracle
34 4568 oracle
29 rows selected.
SQL>
---查看TRC文件目录
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 213 Dec 14 19:05 guowang_lgwr_5297.trm
-rw-r----- 1 oracle oinstall 2.4K Dec 14 19:05 guowang_lgwr_5297.trc
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
---HANG LGWR
SQL> oradebug setospid 4405
Oracle pid: 11, Unix process pid: 4405, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.
--ALERT同步记录上述信息
Tue Dec 15 04:46:15 2015
Unix process pid: 4405, image: oracle@seconary (LGWR) flash frozen [ command #1 ]
---TRC目录同步记录上述信息
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 04:46 guowang_lgwr_4405.trm
-rw-r----- 1 oracle oinstall 896 Dec 15 04:46 guowang_lgwr_4405.trc
[oracle@seconary trace]$
相关文章推荐
- 用oradebug short_stack及strace -p分析oracle进程是否dead或出现故障
- win7系统IE浏览器提示“出现运行错误,是否纠正错误”的故障分析及2种解决方法
- [故障分析]出现大量僵尸进程(zombie)
- 一个杀不死的小强,kill进程无效的原因 记录故障排查过程中kill进程无效的分析过程
- 使用ltrace、strace跟踪Linux进程事件,辅助分析疑难杂症
- oracle故障时,分析参考点
- 电脑系统经常出现蓝屏现象, 分析各种蓝屏故障分析
- Oracle安装出现 安装检测到系统的主IP地址是DHCP分配的地址 及 无法与该代理取得联系。请验证此代理的 url 是否为 null 问题解决
- oracle ----系统服务 --- 文件体系结构 ----网络配置 -----利用企业管理器登录数据库 -----利用SQL Plus登录数据库 -------运行时故障分析与解决
- 分析网络出现故障的原因
- oracle 死锁故障分析与诊断解决
- 关于Mac终端故障一直出现 [进程已完成]
- mysql,sql server,oracle 唯一索引字段是否允许出现多个 null 值?
- oracle出现无法启动OracleXETNSListener服务故障怎么办?
- Win7安装更新出现8024402f错误代码的故障分析
- Oracle10g出现两个监听进程的故障 tnslsnr listener -inherit
- ASM单实例由Oracle Restart引发的系列故障分析(Final Version)
- Oracle进程连接数过多时的Statspack分析报告
- oracle v$sqlarea 分析SQL语句使用资源情况 确认是否绑定变量
- Oracle启动监听服务时,出现错误1067:进程意外终止