您的位置:首页 > 运维架构 > Linux

Linux C/C++ 多线程死锁的gdb调试方法

2017-07-31 11:14 381 查看
死锁的原因就不多说了,本质上,就是有一些线程在请求锁的时候,永远也请求不到。

 

先把有死锁的多线程代码贴出来

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t g_smutex ;

void * func(void *arg)
{
int i=0;

//lock

pthread_mutex_lock( &g_smutex);

for(i = 0 ;i < 0x7fffffff; i++)
{

}

//forget unlock

return NULL;
}

int main()
{
pthread_t  thread_id_01;
pthread_t  thread_id_02;
pthread_t  thread_id_03;
pthread_t  thread_id_04;
pthread_t  thread_id_05;

pthread_mutex_init( &g_smutex, NULL );

pthread_create(&thread_id_01, NULL, func, NULL);
pthread_create(&thread_id_02, NULL, func, NULL);
pthread_create(&thread_id_03, NULL, func, NULL);
pthread_create(&thread_id_04, NULL, func, NULL);
pthread_create(&thread_id_05, NULL, func, NULL);

while(1)
{
sleep(0xfff);
}
return 0;
}


第一个线程启动func函数后,忘记unlock解锁了,导致其他线程怎么也获得不到锁,这里就举这种比较简单的死锁。

 

编译:

gcc New0001.c -g -lpthread -o a.out

 

这里加上-g是有必要的,加上-g可以产生调试信息,符号信息等。千万不要对生成的a.out文件执行strip命令,strip会导致调试时看不到哪行代码有问题。

 

 

第一种方法:

1.使用gdb a.out(可执行文件),并输入r命令运行程序

 

gdb a.out

GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10

Copyright (C) 2015 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "i686-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from a.out...done.

 

 

 

(gdb) r

Starting program: /share/a.out

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

[New Thread 0xb7de6b40 (LWP 15436)]

[New Thread 0xb75e5b40 (LWP 15437)]

[New Thread 0xb6de4b40 (LWP 15438)]

[New Thread 0xb65e3b40 (LWP 15439)]

[New Thread 0xb5de2b40 (LWP 15440)]

[Thread 0xb7de6b40 (LWP 15436) exited]

 

2.在运行的过程中按下ctrl + c,

 

^C

Program received signal SIGINT, Interrupt.

0xb7fdbbe8 in __kernel_vsyscall ()

 

3.查看线程栈信息,info stack,这个命令只能查看当前正在运行的某个线程的栈信息

 

(gdb) info stack

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7e9c3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81

#2  0xb7e9c1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138

#3  0x08048679 in main () at New0001.c:46

 

4.info threads查看所有线程id,前面有*的,代表正在运行的线程,其他没有*的极有可能是在阻塞或者死锁的。

 

(gdb) info threaads

  Id   Target Id         Frame

  6    Thread 0xb5de2b40 (LWP 15440) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

  5    Thread 0xb65e3b40 (LWP 15439) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

  4    Thread 0xb6de4b40 (LWP 15438) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

  3    Thread 0xb75e5b40 (LWP 15437) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

* 1    Thread 0xb7de7700 (LWP 15432) "a.out" 0xb7fdbbe8 in __kernel_vsyscall ()

 

 

5. thread apply all bt (thread apply all  命令,gdb会让所有线程都执行这个命令,比如命令为bt,查看所有线程的具体的栈信息)

 

需要注意的是:如果系统运行着很多线程的时候,不可能使用thread  id(这个id比如上面的1 ,2 ,3, ,4, 5, 6),这样要查到什么时候呢
,100个线程你还输入100次吗

 

因此最好还是直接使用thread apply all bt

 

 

(gdb)thread apply all bt

 

Thread 6 (Thread 0xb5de2b40 (LWP 15440)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb5de2b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 5 (Thread 0xb65e3b40 (LWP 15439)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb65e3b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 4 (Thread 0xb6de4b40 (LWP 15438)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb6de4b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 3 (Thread 0xb75e5b40 (LWP 15437)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb75e5b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

Thread 1 (Thread 0xb7de7700 (LWP 15432)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

---Type <return> to continue, or q <return> to quit---

#1  0xb7e9c3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81

#2  0xb7e9c1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138

#3  0x08048679 in main () at New0001.c:46

 

 

6.看到的lock_wait就是被死锁的线程

 

多按照上述步骤运行几次,看到那些线程老是出现lock_wait的,就很明显可能是死锁的线程了。

 

比如线程3吧

Thread 3 (Thread 0xb75e5b40 (LWP 15437)):

#0  0xb7fdbbe8 in __kernel_vsyscall ()

#1  0xb7fb2302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144

#2  0xb7fac5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

#4  0xb7faa1aa in start_thread (arg=0xb75e5b40) at pthread_create.c:333

#5  0xb7ed2fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122

 

 

#3  0x080485b5 in func (arg=0x0) at New0001.c:16

就是死锁的位置,可以从这里开始定位代码,看看哪个地方可能没有释放锁。

 

第二种方法

 

先让程序跑起来,打开另外一个会话,通过ps -aux| grep
可执行文件 ,

找到程序的进程号

 

ps -axu | grep a.out

root     15463  0.4  0.1  43320   732 pts/4    Sl+  19:29   0:03 ./a.out

root     15476  0.0  0.3   4540  1864 pts/6    S+   19:44   0:00 grep --color=auto a.out

 

由上可知进程号是 15463

 

 

1.使用gdb  attach  进程号

  或者是进入gdb后,
attach 进程号

  或者是 gdb 可执行文件  进程号,此时也会自动attach

 

 

root@ubuntu:/share# gdb

GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10

Copyright (C) 2015 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "i686-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word".

(gdb)

 

attach 进程号

 

(gdb) attach 15463

Attaching to process 15463

Reading symbols from /share/a.out...done.

Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libpthread-2.21.so...done.

done.

[New LWP 15467]

[New LWP 15466]

[New LWP 15465]

[New LWP 15464]

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/libc-2.21.so...done.

done.

Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/ld-2.21.so...done.

done.

0xb773abe8 in __kernel_vsyscall ()

 

2.查看线程信息

 

(gdb) info threads

  Id   Target Id         Frame

  5    Thread 0xb7545b40 (LWP 15464) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  4    Thread 0xb6d44b40 (LWP 15465) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  3    Thread 0xb6543b40 (LWP 15466) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  2    Thread 0xb5d42b40 (LWP 15467) "a.out" 0xb773abe8 in __kernel_vsyscall ()

* 1    Thread 0xb7546700 (LWP 15463) "a.out" 0xb773abe8 in __kernel_vsyscall ()

 

 

3.查看所有线程信息并执行bt

 
(gdb) thread apply all bt
 
Thread 5 (Thread 0xb7545b40 (LWP 15464)):
#0  0xb773abe8 in __kernel_vsyscall ()
#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485b5 in func (arg=0x0) at New0001.c:16
#4  0xb77091aa in start_thread (arg=0xb7545b40) at pthread_create.c:333
#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
 
Thread 4 (Thread 0xb6d44b40 (LWP 15465)):
#0  0xb773abe8 in __kernel_vsyscall ()
#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485b5 in func (arg=0x0) at New0001.c:16
#4  0xb77091aa in start_thread (arg=0xb6d44b40) at pthread_create.c:333
#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
 
Thread 3 (Thread 0xb6543b40 (LWP 15466)):
#0  0xb773abe8 in __kernel_vsyscall ()
#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485b5 in func (arg=0x0) at New0001.c:16
#4  0xb77091aa in start_thread (arg=0xb6543b40) at pthread_create.c:333
#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
 
Thread 2 (Thread 0xb5d42b40 (LWP 15467)):
#0  0xb773abe8 in __kernel_vsyscall ()
#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485b5 in func (arg=0x0) at New0001.c:16
#4  0xb77091aa in start_thread (arg=0xb5d42b40) at pthread_create.c:333
#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
 
Thread 1 (Thread 0xb7546700 (LWP 15463)):
#0  0xb773abe8 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#1  0xb75fb3e6 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#2  0xb75fb1a9 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3  0x08048679 in main () at New0001.c:46
 
 
 
4.选有lock_wait的来查看以下
比如线程 gdb
的id 为4的线程
(gdb) thread 4
[Switching to thread 4 (Thread 0xb6d44b40 (LWP 15465))]
#0  0xb773abe8 in __kernel_vsyscall ()
(gdb) bt
#0  0xb773abe8 in __kernel_vsyscall ()
#1  0xb7711302 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2  0xb770b5fe in __GI___pthread_mutex_lock (mutex=0x804a030 <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485b5 in func (arg=0x0) at New0001.c:16
#4  0xb77091aa in start_thread (arg=0xb6d44b40) at pthread_create.c:333
#5  0xb7631fde in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:122
 
查看栈上的第三帧
(gdb) frame 3
#3  0x080485b5 in func (arg=0x0) at New0001.c:16
16 pthread_mutex_lock( &g_smutex);
调用锁阻塞了
 
(gdb) p  g_smutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 15468, __kind = 0, __nusers = 1, {__elision_data = {__espins = 0,
        __elision = 0}, __list = {__next = 0x0}}},
  __size = "\002\000\000\000\000\000\000\000l<\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}
 
 
锁的拥有者线程id为15468,但该线程id已经结束,说明是线程结束了,忘记解锁了。
 
附上第二步看到的仅剩下的线程
(gdb) info threads

  Id   Target Id         Frame

  5    Thread 0xb7545b40 (LWP 15464) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  4    Thread 0xb6d44b40 (LWP 15465) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  3    Thread 0xb6543b40 (LWP 15466) "a.out" 0xb773abe8 in __kernel_vsyscall ()

  2    Thread 0xb5d42b40 (LWP 15467) "a.out" 0xb773abe8 in __kernel_vsyscall ()

* 1    Thread 0xb7546700 (LWP 15463) "a.out" 0xb773abe8 in __kernel_vsyscall ()

 

第三种方法不是gdb,是pstack工具

使用方法:pstack   进程号

注意pstack不支持64位

并且我的ubuntu系统莫名使用不了pstack来查看,pstack已经安装了。

root@ubuntu:/share# pstack 15463

15463: ./a.out

(No symbols found in )

(No symbols found in /lib/i386-linux-gnu/libc.so.6)

(No symbols found in /lib/ld-linux.so.2)

0xb773abe8: _fini + 0x25f14 (0, 0, 0, 0, 0, 0) + 400d04fc

crawl: Input/output error

Error tracing through process 15463

谁懂这是啥原因!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  linux死锁调试