用oradebug short_stack及strace -p分析oracle进程是否dead或出现故障
2017-09-10 20:27
465 查看
1,可以采用oradebug或者strace -p跟踪后台或前台进程是否dead或hang住
2,如果进程出现故障,必会在对应的TRC文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断
日志文件在background_dump_dest
3,采用 ll -lhrt *lgwr*|tail -10f 获取最新的进程的TRC文件
4,而且出现故障时,多半会在ALERT日志记录相关信息,此是排除故障重要且首要的方法及思路
5,oradebug setospid ospid
oradebug short_stack
会显示进程的堆栈信息,注意:可以间隔多次运行,如果多次显示的堆栈信息一致,可以肯定此进程肯定是dead或出现故障了
6,可以用strace -p ospid跟踪分析,
---hang或故障时的类似信息如下
semtimedop(9273344, 0x7fffe66199d0, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
---正常时的类似信息如下
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440015944
semtimedop(9273344, 0x7fffe661b1f0, 1, {1, 800000000}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440017025
open("/proc/4385/stat", O_RDONLY) = 35
read(35, "4385 (oracle) S 1 4385 4385 0 -1"..., 999) = 225
说白了,就是看信息有没有变化,有变化就说明进程是正常的,否则就说明是不正常的
SQL> select * from v$version where rownum=1;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
查看后台进程
SQL> select pid,spid,pname,username from v$process order by 1;
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
1
2 4385 PMON oracle
3 4387 VKTM oracle
4 4391 GEN0 oracle
5 4393 DIAG oracle
6 4395 DBRM oracle
7 4397 PSP0 oracle
8 4399 DIA0 oracle
9 4401 MMAN oracle
10 4403 DBW0 oracle
11 4405 LGWR oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
12 4407 CKPT oracle
13 4409 SMON oracle
14 4411 RECO oracle
15 4413 MMON oracle
16 4415 MMNL oracle
17 4417 D000 oracle
18 4419 S000 oracle
19 4652 SMCO oracle
20 5266 W000 oracle
21 4936 oracle
27 4468 ARC0 oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
28 4481 ARC1 oracle
29 4486 ARC2 oracle
30 4489 ARC3 oracle
31 4496 QMNC oracle
32 4549 Q000 oracle
33 4551 Q001 oracle
34 4568 oracle
29 rows selected.
SQL>
---查看TRC文件目录
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 213 Dec 14 19:05 guowang_lgwr_5297.trm
-rw-r----- 1 oracle oinstall 2.4K Dec 14 19:05 guowang_lgwr_5297.trc
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
---HANG LGWR
SQL> oradebug setospid 4405
Oracle pid: 11, Unix process pid: 4405, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.
--ALERT同步记录上述信息
Tue Dec 15 04:46:15 2015
Unix process pid: 4405, image: oracle@seconary (LGWR) flash frozen [ command #1 ]
---TRC目录同步记录上述信息
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 04:46 guowang_lgwr_4405.trm
-rw-r----- 1 oracle oinstall 896 Dec 15 04:46 guowang_lgwr_4405.trc
[oracle@seconary trace]$
2,如果进程出现故障,必会在对应的TRC文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断
日志文件在background_dump_dest
3,采用 ll -lhrt *lgwr*|tail -10f 获取最新的进程的TRC文件
4,而且出现故障时,多半会在ALERT日志记录相关信息,此是排除故障重要且首要的方法及思路
5,oradebug setospid ospid
oradebug short_stack
会显示进程的堆栈信息,注意:可以间隔多次运行,如果多次显示的堆栈信息一致,可以肯定此进程肯定是dead或出现故障了
6,可以用strace -p ospid跟踪分析,
---hang或故障时的类似信息如下
semtimedop(9273344, 0x7fffe66199d0, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
---正常时的类似信息如下
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440015944
semtimedop(9273344, 0x7fffe661b1f0, 1, {1, 800000000}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440017025
open("/proc/4385/stat", O_RDONLY) = 35
read(35, "4385 (oracle) S 1 4385 4385 0 -1"..., 999) = 225
说白了,就是看信息有没有变化,有变化就说明进程是正常的,否则就说明是不正常的
测试
SQL> select * from v$version where rownum=1;BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
查看后台进程
SQL> select pid,spid,pname,username from v$process order by 1;
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
1
2 4385 PMON oracle
3 4387 VKTM oracle
4 4391 GEN0 oracle
5 4393 DIAG oracle
6 4395 DBRM oracle
7 4397 PSP0 oracle
8 4399 DIA0 oracle
9 4401 MMAN oracle
10 4403 DBW0 oracle
11 4405 LGWR oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
12 4407 CKPT oracle
13 4409 SMON oracle
14 4411 RECO oracle
15 4413 MMON oracle
16 4415 MMNL oracle
17 4417 D000 oracle
18 4419 S000 oracle
19 4652 SMCO oracle
20 5266 W000 oracle
21 4936 oracle
27 4468 ARC0 oracle
PID SPID PNAME USERNAME
---------- ---------- ---------- ------------------------------
28 4481 ARC1 oracle
29 4486 ARC2 oracle
30 4489 ARC3 oracle
31 4496 QMNC oracle
32 4549 Q000 oracle
33 4551 Q001 oracle
34 4568 oracle
29 rows selected.
SQL>
---查看TRC文件目录
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 213 Dec 14 19:05 guowang_lgwr_5297.trm
-rw-r----- 1 oracle oinstall 2.4K Dec 14 19:05 guowang_lgwr_5297.trc
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
---HANG LGWR
SQL> oradebug setospid 4405
Oracle pid: 11, Unix process pid: 4405, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.
--ALERT同步记录上述信息
Tue Dec 15 04:46:15 2015
Unix process pid: 4405, image: oracle@seconary (LGWR) flash frozen [ command #1 ]
---TRC目录同步记录上述信息
[oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
-rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
-rw-r----- 1 oracle oinstall 27K Dec 15 01:05 guowang_lgwr_22295.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:18 guowang_lgwr_31280.trm
-rw-r----- 1 oracle oinstall 903 Dec 15 02:18 guowang_lgwr_31280.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 02:44 guowang_lgwr_32077.trm
-rw-r----- 1 oracle oinstall 906 Dec 15 02:44 guowang_lgwr_32077.trc
-rw-r----- 1 oracle oinstall 62 Dec 15 03:27 guowang_lgwr_1032.trm
-rw-r----- 1 oracle oinstall 887 Dec 15 03:27 guowang_lgwr_1032.trc
-rw-r----- 1 oracle oinstall 63 Dec 15 04:46 guowang_lgwr_4405.trm
-rw-r----- 1 oracle oinstall 896 Dec 15 04:46 guowang_lgwr_4405.trc
[oracle@seconary trace]$
相关文章推荐
- 用oradebug short_stack及strace -p分析oracle进程是否dead或出现故障
- [故障分析]出现大量僵尸进程(zombie)
- win7系统IE浏览器提示“出现运行错误,是否纠正错误”的故障分析及2种解决方法
- 在启动Oracle服务的时候出现:错误1067 进程意外终止
- Oracle 11.2.0.3数据库CJQ进程造成row cache lock等待事件影响job无法停止问题分析
- Oracle启动监听服务时,出现错误1067:进程意外终止
- 转载:用oralce连接.net客户端出现问题:“数据连接不成功,请检查该数据库是否已启动尝试加载oracle客户端时引发BadImageFormatException.如果在安装32位Oracle客户端组件的情况下以64位模式运行,”的解
- Oracle ORA-03137: TTC protocol internal error : [12333] 故障分析
- [Oracle 11g r2(11.2.0.4.0)]案例分析4-由gipc 进程导致的节点无法启动
- << Oracle高可用>>部分书面作业 - 第八课 Data Gaurd 故障分析和处理
- 用oralce连接.net客户端出现问题:“数据连接不成功,请检查该数据库是否已启动尝试加载oracle客户端时引发BadImageFormatException.如果在安装32位Oracle客户端组件的情况下以64位模式运行,”的解决办法
- strace 分析 跟踪 进程错误
- ResultSet:判断变量是否为null时出现deadcode
- oracle ----系统服务 --- 文件体系结构 ----网络配置 -----利用企业管理器登录数据库 -----利用SQL Plus登录数据库 -------运行时故障分析与解决
- Oracle安装出现 安装检测到系统的主IP地址是DHCP分配的地址 及 无法与该代理取得联系。请验证此代理的 url 是否为 null 问题解决
- Oracle ORA-03137: TTC protocol internal error : [12333] 故障分析
- Oracle ORA-03137: TTC protocol internal error : [12333] 故障分析
- Win7系统打开默认程序出现“软件管理-打开未知文件”窗口的故障分析及解决方法
- [Oracle]Redo log日志组故障分析
- oracle出现无法启动OracleXETNSListener服务故障怎么办?