MPI调试--出错信息整理
2011-06-18 18:29
78 查看
如果是用FORTRAN写程序,建议加上implicit
none,特别是代码比较多时,可以检查出编译过程中的很多问题。
1、
[root@c0108 parallel]# mpiexec -n 5 ./simple
aborting job:
Fatal error in MPI_Irecv: Invalid rank, error stack:
MPI_Irecv(143): MPI_Irecv(buf=0x25dab60, count=0, MPI_DOUBLE_PRECISION, src=5, tag=99, MPI_COMM_WORLD, request=0x7fffa02ca86c) failed
MPI_Irecv(95): Invalid rank has value 5 but must be nonnegative and less than 5
rank 4 in job 5 c0108_52041 caused collective abort of all ranks
exit status of rank 4: return code 13
上面的意思是,进程号为5的无效,因为[root@c0108 parallel]# mpiexec -n 5 ./simple运行的时候,开了5个进程:0 1 2 3 4,所以一定是代码本身的问题,但不一定是某个进程号本身,也有可能是某个参数传递未成功等,MPI总会出现许多莫名的错误。。。
我的代码中MPI_Irecv语句有限,于是通过添加print语句的方法进行调试,找出错误代码所在的行,如下
print *, myid+1,'111111111111111111'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
call MPI_Irecv(P(1,1,location),IMAX*JMAX*MIN(ITSP, ke-myke),
&MPI_DOUBLE_PRECISION,MYID+1,RELY,MPI_COMM_WORLD,REQ,IERR)
2、
[root@c0109 test]# mpiexec -n 5 ./simple
rank 3 in job 22 c0109_51164 caused collective abort of all ranks
exit status of rank 3: killed by signal 11
[root@c0109 test]#
其中signal 11是段错误。Signal 11, or officially know as "segmentation fault", means that the program accessed a memory location that was not assigned. That's usually
a bug in the program.
3、
[root@c0108 test]# mpirun -np 4 ./simple
aborting job:
Fatal error in MPI_Wait: Invalid MPI_Request, error stack:
MPI_Wait(139): MPI_Wait(request=0x7fff1f675228, status0x7fff1f675218) failed
MPI_Wait(75): Invalid MPI_Request
rank 2 in job 24 c0108_52041 caused collective abort of all ranks
exit status of rank 2: return code 13
solution:
generally it's because MPI_Test of MPI_Wait is supplied a request thatis unknown to MPICH (the request wasn't the one returned by MPICH
whenyou made the Isend/Irecv/send_init/recv_init)就是说MPI_Irecv没有和MPI_Wait(req,status,IERR)对应,句柄对错号了。。如果MPI_Wait()函数有很多,可以采用注释的方法一个个锁定错误。。。另外:如果是FORTRAN程序,请首先检查一下status变量定义:integer
req,status(MPI_STATUS_SIZE),ierr
4、
aborting job: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(195): Initialization failed MPID_Init(170): failure during portals initialization MPIDI_Portals_Init(321): progress_init failed MPIDI_PortalsI_Progress_init(653):
Out of memory
There is not enough memory on the nodes for the program plus MPI buffers to fit.
You can decrease the amount of memory that MPI is using for buffers by using MPICH_UNEX_BUFFER_SIZE environment variable.
欢迎批评指正,多多交流,谢谢!
none,特别是代码比较多时,可以检查出编译过程中的很多问题。
1、
[root@c0108 parallel]# mpiexec -n 5 ./simple
aborting job:
Fatal error in MPI_Irecv: Invalid rank, error stack:
MPI_Irecv(143): MPI_Irecv(buf=0x25dab60, count=0, MPI_DOUBLE_PRECISION, src=5, tag=99, MPI_COMM_WORLD, request=0x7fffa02ca86c) failed
MPI_Irecv(95): Invalid rank has value 5 but must be nonnegative and less than 5
rank 4 in job 5 c0108_52041 caused collective abort of all ranks
exit status of rank 4: return code 13
上面的意思是,进程号为5的无效,因为[root@c0108 parallel]# mpiexec -n 5 ./simple运行的时候,开了5个进程:0 1 2 3 4,所以一定是代码本身的问题,但不一定是某个进程号本身,也有可能是某个参数传递未成功等,MPI总会出现许多莫名的错误。。。
我的代码中MPI_Irecv语句有限,于是通过添加print语句的方法进行调试,找出错误代码所在的行,如下
print *, myid+1,'111111111111111111'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
call MPI_Irecv(P(1,1,location),IMAX*JMAX*MIN(ITSP, ke-myke),
&MPI_DOUBLE_PRECISION,MYID+1,RELY,MPI_COMM_WORLD,REQ,IERR)
2、
[root@c0109 test]# mpiexec -n 5 ./simple
rank 3 in job 22 c0109_51164 caused collective abort of all ranks
exit status of rank 3: killed by signal 11
[root@c0109 test]#
其中signal 11是段错误。Signal 11, or officially know as "segmentation fault", means that the program accessed a memory location that was not assigned. That's usually
a bug in the program.
3、
[root@c0108 test]# mpirun -np 4 ./simple
aborting job:
Fatal error in MPI_Wait: Invalid MPI_Request, error stack:
MPI_Wait(139): MPI_Wait(request=0x7fff1f675228, status0x7fff1f675218) failed
MPI_Wait(75): Invalid MPI_Request
rank 2 in job 24 c0108_52041 caused collective abort of all ranks
exit status of rank 2: return code 13
solution:
generally it's because MPI_Test of MPI_Wait is supplied a request thatis unknown to MPICH (the request wasn't the one returned by MPICH
whenyou made the Isend/Irecv/send_init/recv_init)就是说MPI_Irecv没有和MPI_Wait(req,status,IERR)对应,句柄对错号了。。如果MPI_Wait()函数有很多,可以采用注释的方法一个个锁定错误。。。另外:如果是FORTRAN程序,请首先检查一下status变量定义:integer
req,status(MPI_STATUS_SIZE),ierr
4、
aborting job: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(195): Initialization failed MPID_Init(170): failure during portals initialization MPIDI_Portals_Init(321): progress_init failed MPIDI_PortalsI_Progress_init(653):
Out of memory
There is not enough memory on the nodes for the program plus MPI buffers to fit.
You can decrease the amount of memory that MPI is using for buffers by using MPICH_UNEX_BUFFER_SIZE environment variable.
欢迎批评指正,多多交流,谢谢!
相关文章推荐
- IDEA远程调试Hadoop步骤及出错解决整理
- DNN调试利器DNNDebug.aspx--如何调试出错信息不具体的程序错误
- ios-真机调试出错信息
- 试图运行项目时出错: 无法在 Web 服务器上启动调试。您不具备调试此应用程序的权限。此项目的 URL 位于 Internet 区域。有关更多信息,请单击“帮助”。
- linux 系统调用 出错信息 调试 strerror errno
- android的2.3.7 rom修改调试与logcat看到的出错信息的可能原因与解决(补充中...)
- 微信开发获取出错信息,进行debug错误调试-微信开发教程8
- libpcap出错信息调试函数pcap_geterr, pcap_perror
- mysql出错信息对照整理
- ios-真机调试出错信息
- IDEA远程调试Hadoop步骤及出错解决整理
- 试图运行项目时出错: 无法在 Web 服务器上启动调试。您不具备调试此应用程序的权限。此项目的 URL 位于 Internet 区域。有关更多信息,请单击“帮助”。
- 在Application_EndRequest中加入调试信息导致WebHtmlEditor出错
- 试图运行项目时出错:无法在Web服器上启动调试。您不具备调试此应用程序的权限。此项目的URL位于Internet区域。有关更多信息,请单击“帮助”
- ASP.NET调试时的出错信息看不到
- DNN调试利器DNNDebug.aspx--如何调试出错信息不具体的程序错误
- 试图运行项目时出错: 无法在 Web 服务器上启动调试。您不具备调试此应用程序的权限。此项目的 URL 位于 Internet 区域。有关更多信息,请单击“帮助”。终极解决方案
- iPhone/iOS获得基站信息[整理]
- SpringMVC注记方式验证学习笔记——验证出错提示信息文本与验证注记的关联
- Android程序的调试-输出日志信息的几种方法