您的位置:首页 > 其它

MPI调试--出错信息整理

2011-06-18 18:29 78 查看
如果是用FORTRAN写程序,建议加上implicit
none,特别是代码比较多时,可以检查出编译过程中的很多问题。

1、

[root@c0108 parallel]# mpiexec -n 5 ./simple

aborting job:

Fatal error in MPI_Irecv: Invalid rank, error stack:

MPI_Irecv(143): MPI_Irecv(buf=0x25dab60, count=0, MPI_DOUBLE_PRECISION, src=5, tag=99, MPI_COMM_WORLD, request=0x7fffa02ca86c) failed

MPI_Irecv(95): Invalid rank has value 5 but must be nonnegative and less than 5

rank 4 in job 5 c0108_52041 caused collective abort of all ranks

exit status of rank 4: return code 13

上面的意思是,进程号为5的无效,因为[root@c0108 parallel]# mpiexec -n 5 ./simple运行的时候,开了5个进程:0 1 2 3 4,所以一定是代码本身的问题,但不一定是某个进程号本身,也有可能是某个参数传递未成功等,MPI总会出现许多莫名的错误。。。

我的代码中MPI_Irecv语句有限,于是通过添加print语句的方法进行调试,找出错误代码所在的行,如下

print *, myid+1,'111111111111111111'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

call MPI_Irecv(P(1,1,location),IMAX*JMAX*MIN(ITSP, ke-myke),

&MPI_DOUBLE_PRECISION,MYID+1,RELY,MPI_COMM_WORLD,REQ,IERR)

2、

[root@c0109 test]# mpiexec -n 5 ./simple

rank 3 in job 22 c0109_51164 caused collective abort of all ranks

exit status of rank 3: killed by signal 11

[root@c0109 test]#

其中signal 11是段错误。Signal 11, or officially know as "segmentation fault", means that the program accessed a memory location that was not assigned. That's usually
a bug in the program.

3、

[root@c0108 test]# mpirun -np 4 ./simple

aborting job:

Fatal error in MPI_Wait: Invalid MPI_Request, error stack:

MPI_Wait(139): MPI_Wait(request=0x7fff1f675228, status0x7fff1f675218) failed

MPI_Wait(75): Invalid MPI_Request

rank 2 in job 24 c0108_52041 caused collective abort of all ranks

exit status of rank 2: return code 13

solution:

generally it's because MPI_Test of MPI_Wait is supplied a request thatis unknown to MPICH (the request wasn't the one returned by MPICH
whenyou made the Isend/Irecv/send_init/recv_init)就是说MPI_Irecv没有和MPI_Wait(req,status,IERR)对应,句柄对错号了。。如果MPI_Wait()函数有很多,可以采用注释的方法一个个锁定错误。。。另外:如果是FORTRAN程序,请首先检查一下status变量定义:integer
req,status(MPI_STATUS_SIZE),ierr

4、

aborting job: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(195): Initialization failed MPID_Init(170): failure during portals initialization MPIDI_Portals_Init(321): progress_init failed MPIDI_PortalsI_Progress_init(653):
Out of memory

There is not enough memory on the nodes for the program plus MPI buffers to fit.

You can decrease the amount of memory that MPI is using for buffers by using MPICH_UNEX_BUFFER_SIZE environment variable.

欢迎批评指正,多多交流,谢谢!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐