caffe:cudaSuccess (2 vs. 0) out of memory
2016-11-25 11:07
267 查看
问题描述:
我试图在Caffe上训练一个网络。 我有512x640的图像大小。 批量大小是1.我试图实现FCN-8s。
我目前在一个带有4GB GPU内存的Amazon EC2实例(g2.2xlarge)上运行。 但是当我运行solver,它立即抛出一个错误。
我试图在Caffe上训练一个网络。 我有512x640的图像大小。 批量大小是1.我试图实现FCN-8s。
我目前在一个带有4GB GPU内存的Amazon EC2实例(g2.2xlarge)上运行。 但是当我运行solver,它立即抛出一个错误。
Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)
The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note the the error comes from CUDA). Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost of gradient accuracy), but since you are already at batch size = 1... Are you sure batch size is 1 for both TRAIN and TEST phases? 你得到的错误确实是内存不足,但它不是RAM,而是GPU内存(注意错误来自CUDA)。 通常,当caffe内存不足时 - 首先要做的是减少批量大小(以梯度精度为代价),但由于您已经在批处理大小= 1 ... 您确定TRAIN和TEST阶段的批量大小是1吗?
I guessed so. And yes, both train and test phases' batch size is 1. I think I have resize the training images to something smaller and try it out. But why is 4GB of GPU Memory turning out to be less space? It says The total number of bytes read was 537399810 which is much smaller than 4GB. – Abhilash Panigrahi Nov 19 '15 at 8:11 @AbhilashPanigrahi is it possible some other processes are using GPU at the same time? try command line nvidia-smi to see what's going on on your GPU. – Shai Nov 19 '15 at 8:18 I did. No other process is running apart from this (which automatically quits after a few seconds because of the error). – Abhilash Panigrahi Nov 19 '15 at 8:21 1 I just reduced the image and label size to about 256x320. It runs successfully. I saw it is using around 3.75 GB of GPU memory. Thanks for the help. – Abhilash Panigrahi Nov 19 '15 at 8:47 是的,train和测试阶段的批量大小是1.我认为我已经将训练图像调整为很小,并测试。 但是为什么4GB的GPU内存出来更少的空间? 它说读取的总字节数为537399810,远小于4GB。 可能是一些其他进程同时使用GPU? 尝试命令行nvidia-smi看看你的GPU上发生了什么。 我只是将图像和标签大小减少到大约256x320 ----------------------- 它运行成功。 我看到它使用大约3.75 GB的GPU内存。
当然也可以用cpu来计算,不过速度超慢,以下是我用cpu运行图:
由图中的时间可以看到cpu运行好慢。囧...
相关文章推荐
- 【caffe跑试验遇到错误:Check failed: error == cudaSuccess (2 vs. 0) out of memory】:
- 【caffe跑试验遇到错误:Check failed: error == cudaSuccess (2 vs. 0) out of memory】
- caffe训练过程中显示Check failed:error == cudaSuccess(2 vs. 0) out of memory
- caffe跑试验遇到错误:Check failed: error == cudaSuccess (2 vs. 0) out of memory
- Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory
- Caffe 分类问题 Check failed: error == cudaSuccess (2 vs. 0) out of memory
- caffe在训练时遇到:Check failed: error == cudaSuccess (2 vs. 0) out of memory
- caffe failed: error == cudaSuccess (2 vs. 0) out of memory
- 【caffe跑试验遇到错误:Check failed: error == cudaSuccess (2 vs. 0) out of memory】
- caffe跑试验遇到错误:Check failed: error == cudaSuccess (2 vs. 0) out of memory
- Check failed: error == cudaSuccess (2 vs. 0) out of memory
- caffe 报错 Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
- tensorflow CUDA out of memory
- 配置SSD-caffe测试时出现“Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal”解决
- 机器学习中代码出现tensorflow.python.framework.errors_impl.InternalError,from device: CUDA_ERROR_OUT_OF_MEMORY
- 【caffe】 Check failed: error == cudaSuccess (30 vs. 0) unknown error
- CUDA/caffe ERROR:cudaGetDeviceCount returned 30/35,Check failed: error == cudaSuccess (30/35 vs. 0)
- caffe 训练时,出现错误:Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure
- 【caffe】 Check failed: error == cudaSuccess (30 vs. 0) unknown error
- TF报错:CUDA_ERROE_OUT_OF_MEMORY