【问题 解决】mxnet训练mnist数据集的Train_accuracy很小
2017-01-22 17:23
423 查看
2017.2.7更新
在GitHub上提问后有人解决了我的问题,具体见
[Issue #4900] low train-accuracy when train mnist with gpus
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
本问题原文如下:
在dl4上运行train_mnist.py正常,Train_accuacy在0.9附近
但是在另一台机器上,训练mnist的 Train-accuracy就特别小
原因不明
待解决。。。
在GitHub上提问后有人解决了我的问题,具体见
[Issue #4900] low train-accuracy when train mnist with gpus
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
本问题原文如下:
在dl4上运行train_mnist.py正常,Train_accuacy在0.9附近
[hx@dl4 image-classification]$ python train_mnist.py --gpus 0 INFO:root:start with arguments Namespace(batch_size=64, disp_batches=100, gpus='0', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, network='mlp', num_classes=10, num_epochs=20, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001) INFO:root:Start training with [gpu(0)] INFO:root:Epoch[0] Batch [100] Speed: 37082.51 samples/sec Train-accuracy=0.804531 INFO:root:Epoch[0] Batch [200] Speed: 44443.18 samples/sec Train-accuracy=0.906875 INFO:root:Epoch[0] Batch [300] Speed: 43431.66 samples/sec Train-accuracy=0.921562 INFO:root:Epoch[0] Batch [400] Speed: 43475.55 samples/sec Train-accuracy=0.935000 INFO:root:Epoch[0] Batch [500] Speed: 50749.41 samples/sec Train-accuracy=0.932187 INFO:root:Epoch[0] Batch [600] Speed: 52282.47 samples/sec Train-accuracy=0.947344 INFO:root:Epoch[0] Batch [700] Speed: 52977.93 samples/sec Train-accuracy=0.947812 INFO:root:Epoch[0] Batch [800] Speed: 53146.91 samples/sec Train-accuracy=0.945000 INFO:root:Epoch[0] Batch [900] Speed: 53487.58 samples/sec Train-accuracy=0.953906 INFO:root:Epoch[0] Resetting Data Iterator INFO:root:Epoch[0] Time cost=1.518 INFO:root:Epoch[0] Validation-accuracy=0.950338 INFO:root:Epoch[1] Batch [100] Speed: 54243.19 samples/sec Train-accuracy=0.955781 INFO:root:Epoch[1] Batch [200] Speed: 53883.88 samples/sec Train-accuracy=0.957344 INFO:root:Epoch[1] Batch [300] Speed: 53730.51 samples/sec Train-accuracy=0.959063 INFO:root:Epoch[1] Batch [400] Speed: 53162.27 samples/sec Train-accuracy=0.966094 INFO:root:Epoch[1] Batch [500] Speed: 53799.10 samples/sec Train-accuracy=0.959063 INFO:root:Epoch[1] Batch [600] Speed: 54203.21 samples/sec Train-accuracy=0.963906 INFO:root:Epoch[1] Batch [700] Speed: 55385.43 samples/sec Train-accuracy=0.961406 INFO:root:Epoch[1] Batch [800] Speed: 55597.99 samples/sec Train-accuracy=0.962812 INFO:root:Epoch[1] Batch [900] Speed: 55328.35 samples/sec Train-accuracy=0.969688 INFO:root:Epoch[1] Resetting Data Iterator INFO:root:Epoch[1] Time cost=1.106 INFO:root:Epoch[1] Validation-accuracy=0.961186 INFO:root:Epoch[2] Batch [100] Speed: 53831.25 samples/sec Train-accuracy=0.967031 INFO:root:Epoch[2] Batch [200] Speed: 55773.00 samples/sec Train-accuracy=0.971875 INFO:root:Epoch[2] Batch [300] Speed: 56175.44 samples/sec Train-accuracy=0.967656 INFO:root:Epoch[2] Batch [400] Speed: 55856.68 samples/sec Train-accuracy=0.971719 INFO:root:Epoch[2] Batch [500] Speed: 55476.08 samples/sec Train-accuracy=0.966250 INFO:root:Epoch[2] Batch [600] Speed: 55452.59 samples/sec Train-accuracy=0.969219 INFO:root:Epoch[2] Batch [700] Speed: 55739.42 samples/sec Train-accuracy=0.969063 INFO:root:Epoch[2] Batch [800] Speed: 55773.46 samples/sec Train-accuracy=0.968437 INFO:root:Epoch[2] Batch [900] Speed: 55457.29 samples/sec Train-accuracy=0.972344 INFO:root:Epoch[2] Resetting Data Iterator INFO:root:Epoch[2] Time cost=1.084
但是在另一台机器上,训练mnist的 Train-accuracy就特别小
[root@box image-classification]# python train_mnist.py --gpus 0 INFO:root:start with arguments Namespace(batch_size=64, disp_batches=100, gpus='0', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, network='mlp', num_classes=10, num_epochs=50, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001) INFO:root:Start training with [gpu(0)] INFO:root:Epoch[0] Batch [100] Speed: 32293.73 samples/sec Train-accuracy=0.103594 INFO:root:Epoch[0] Batch [200] Speed: 35961.76 samples/sec Train-accuracy=0.096094 INFO:root:Epoch[0] Batch [300] Speed: 39744.90 samples/sec Train-accuracy=0.098125 INFO:root:Epoch[0] Batch [400] Speed: 34682.12 samples/sec Train-accuracy=0.092969 INFO:root:Epoch[0] Batch [500] Speed: 38506.46 samples/sec Train-accuracy=0.095312 INFO:root:Epoch[0] Batch [600] Speed: 42484.79 samples/sec Train-accuracy=0.106250 INFO:root:Epoch[0] Batch [700] Speed: 37907.76 samples/sec Train-accuracy=0.097969 INFO:root:Epoch[0] Batch [800] Speed: 36108.62 samples/sec Train-accuracy=0.096562 INFO:root:Epoch[0] Batch [900] Speed: 40969.94 samples/sec Train-accuracy=0.104063 INFO:root:Epoch[0] Resetting Data Iterator INFO:root:Epoch[0] Time cost=2.638 INFO:root:Epoch[0] Validation-accuracy=0.098029 INFO:root:Epoch[1] Batch [100] Speed: 35087.31 samples/sec Train-accuracy=0.103594 INFO:root:Epoch[1] Batch [200] Speed: 42854.64 samples/sec Train-accuracy=0.096094 INFO:root:Epoch[1] Batch [300] Speed: 38468.49 samples/sec Train-accuracy=0.098125 INFO:root:Epoch[1] Batch [400] Speed: 41700.91 samples/sec Train-accuracy=0.092969 INFO:root:Epoch[1] Batch [500] Speed: 43624.63 samples/sec Train-accuracy=0.095312 INFO:root:Epoch[1] Batch [600] Speed: 43745.13 samples/sec Train-accuracy=0.106250 INFO:root:Epoch[1] Batch [700] Speed: 43572.09 samples/sec Train-accuracy=0.097969 INFO:root:Epoch[1] Batch [800] Speed: 38464.52 samples/sec Train-accuracy=0.096562 INFO:root:Epoch[1] Batch [900] Speed: 37482.35 samples/sec Train-accuracy=0.104063 INFO:root:Epoch[1] Resetting Data Iterator INFO:root:Epoch[1] Time cost=1.489 INFO:root:Epoch[1] Validation-accuracy=0.098029 INFO:root:Epoch[2] Batch [100] Speed: 38364.69 samples/sec Train-accuracy=0.103594 INFO:root:Epoch[2] Batch [200] Speed: 39203.17 samples/sec Train-accuracy=0.096094 INFO:root:Epoch[2] Batch [300] Speed: 42395.08 samples/sec Train-accuracy=0.098125 INFO:root:Epoch[2] Batch [400] Speed: 39442.16 samples/sec Train-accuracy=0.092969 INFO:root:Epoch[2] Batch [500] Speed: 37321.06 samples/sec Train-accuracy=0.095312 INFO:root:Epoch[2] Batch [600] Speed: 40335.30 samples/sec Train-accuracy=0.106250 INFO:root:Epoch[2] Batch [700] Speed: 43203.50 samples/sec Train-accuracy=0.097969 INFO:root:Epoch[2] Batch [800] Speed: 37934.22 samples/sec Train-accuracy=0.096562 INFO:root:Epoch[2] Batch [900] Speed: 39594.41 samples/sec Train-accuracy=0.104063 INFO:root:Epoch[2] Resetting Data Iterator INFO:root:Epoch[2] Time cost=1.511 INFO:root:Epoch[2] Validation-accuracy=0.098029
原因不明
待解决。。。
相关文章推荐
- MXNET安装过程中遇到libinfo导入不了的问题解决
- 走火入魔.NET权限组件在公司的网络版温湿度监控系统中进行实战-用列表资源权限(数据集权限)思想来解决实际问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- Faster R-CNN训练VOC格式的数据集问题与解决
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- SPP-Net可以解决不同的数据集之间Scale不一致的问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- 走火入魔.NET权限组件在公司的网络版温湿度监控系统中进行实战-用列表资源权限(数据集权限)思想来解决实际问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- 解决Tensorflow读取MNIST数据集时网络超时问题
- CRF as RNN训练自己的数据集(FCN)以及一些问题解决
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- MXNet系统上ilsvrc12数据集的制作与inception-bn网络的训练
- 【keras】解决 example 案例中 MNIST 数据集下载不了的问题
- mxnet解决AttributeError: module 'mxnet.test_utils' has no attribute 'get_mnist'这个报错
- [个人整理]如何解决VS.NET无法调试项目问题?