您的位置:首页 > Web前端

caffe:cudaSuccess (2 vs. 0) out of memory

2016-11-25 11:07 267 查看
问题描述:

我试图在Caffe上训练一个网络。 我有512x640的图像大小。 批量大小是1.我试图实现FCN-8s。

我目前在一个带有4GB GPU内存的Amazon EC2实例(g2.2xlarge)上运行。 但是当我运行solver,它立即抛出一个错误。

Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Aborted (core dumped)


The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note the the error comes from CUDA).
Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost of gradient accuracy), but since you are already at batch size = 1...
Are you sure batch size is 1 for both TRAIN and TEST phases?

你得到的错误确实是内存不足,但它不是RAM,而是GPU内存(注意错误来自CUDA)。
通常,当caffe内存不足时 - 首先要做的是减少批量大小(以梯度精度为代价),但由于您已经在批处理大小= 1 ...
您确定TRAIN和TEST阶段的批量大小是1吗?


I guessed so. And yes, both train and test phases' batch size is 1. I think I have resize the training images to something smaller and try it out. But why is 4GB of GPU Memory turning out to be less space? It says The total number of bytes read was 537399810 which is much smaller than 4GB. – Abhilash Panigrahi Nov 19 '15 at 8:11

@AbhilashPanigrahi is it possible some other processes are using GPU at the same time? try command line nvidia-smi to see what's going on on your GPU. – Shai Nov 19 '15 at 8:18

I did. No other process is running apart from this (which automatically quits after a few seconds because of the error). – Abhilash Panigrahi Nov 19 '15 at 8:21
1
I just reduced the image and label size to about 256x320. It runs successfully. I saw it is using around 3.75 GB of GPU memory. Thanks for the help. – Abhilash Panigrahi Nov 19 '15 at 8:47
是的,train和测试阶段的批量大小是1.我认为我已经将训练图像调整为很小,并测试。 但是为什么4GB的GPU内存出来更少的空间? 它说读取的总字节数为537399810,远小于4GB。

可能是一些其他进程同时使用GPU? 尝试命令行nvidia-smi看看你的GPU上发生了什么。

我只是将图像和标签大小减少到大约256x320
-----------------------

它运行成功。 我看到它使用大约3.75 GB的GPU内存。


当然也可以用cpu来计算,不过速度超慢,以下是我用cpu运行图:




由图中的时间可以看到cpu运行好慢。囧...
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐