您的位置:首页 > 大数据 > 人工智能

【问题 解决】mxnet训练mnist数据集的Train_accuracy很小

2017-01-22 17:23 423 查看
2017.2.7更新

在GitHub上提问后有人解决了我的问题,具体见
[Issue #4900] low train-accuracy when train mnist with gpus

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

本问题原文如下:

在dl4上运行train_mnist.py正常,Train_accuacy在0.9附近

[hx@dl4 image-classification]$  python train_mnist.py --gpus 0
INFO:root:start with arguments Namespace(batch_size=64, disp_batches=100, gpus='0', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, network='mlp', num_classes=10, num_epochs=20, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001)
INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [100]  Speed: 37082.51 samples/sec     Train-accuracy=0.804531
INFO:root:Epoch[0] Batch [200]  Speed: 44443.18 samples/sec     Train-accuracy=0.906875
INFO:root:Epoch[0] Batch [300]  Speed: 43431.66 samples/sec     Train-accuracy=0.921562
INFO:root:Epoch[0] Batch [400]  Speed: 43475.55 samples/sec     Train-accuracy=0.935000
INFO:root:Epoch[0] Batch [500]  Speed: 50749.41 samples/sec     Train-accuracy=0.932187
INFO:root:Epoch[0] Batch [600]  Speed: 52282.47 samples/sec     Train-accuracy=0.947344
INFO:root:Epoch[0] Batch [700]  Speed: 52977.93 samples/sec     Train-accuracy=0.947812
INFO:root:Epoch[0] Batch [800]  Speed: 53146.91 samples/sec     Train-accuracy=0.945000
INFO:root:Epoch[0] Batch [900]  Speed: 53487.58 samples/sec     Train-accuracy=0.953906
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=1.518
INFO:root:Epoch[0] Validation-accuracy=0.950338
INFO:root:Epoch[1] Batch [100]  Speed: 54243.19 samples/sec     Train-accuracy=0.955781
INFO:root:Epoch[1] Batch [200]  Speed: 53883.88 samples/sec     Train-accuracy=0.957344
INFO:root:Epoch[1] Batch [300]  Speed: 53730.51 samples/sec     Train-accuracy=0.959063
INFO:root:Epoch[1] Batch [400]  Speed: 53162.27 samples/sec     Train-accuracy=0.966094
INFO:root:Epoch[1] Batch [500]  Speed: 53799.10 samples/sec     Train-accuracy=0.959063
INFO:root:Epoch[1] Batch [600]  Speed: 54203.21 samples/sec     Train-accuracy=0.963906
INFO:root:Epoch[1] Batch [700]  Speed: 55385.43 samples/sec     Train-accuracy=0.961406
INFO:root:Epoch[1] Batch [800]  Speed: 55597.99 samples/sec     Train-accuracy=0.962812
INFO:root:Epoch[1] Batch [900]  Speed: 55328.35 samples/sec     Train-accuracy=0.969688
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=1.106
INFO:root:Epoch[1] Validation-accuracy=0.961186
INFO:root:Epoch[2] Batch [100]  Speed: 53831.25 samples/sec     Train-accuracy=0.967031
INFO:root:Epoch[2] Batch [200]  Speed: 55773.00 samples/sec     Train-accuracy=0.971875
INFO:root:Epoch[2] Batch [300]  Speed: 56175.44 samples/sec     Train-accuracy=0.967656
INFO:root:Epoch[2] Batch [400]  Speed: 55856.68 samples/sec     Train-accuracy=0.971719
INFO:root:Epoch[2] Batch [500]  Speed: 55476.08 samples/sec     Train-accuracy=0.966250
INFO:root:Epoch[2] Batch [600]  Speed: 55452.59 samples/sec     Train-accuracy=0.969219
INFO:root:Epoch[2] Batch [700]  Speed: 55739.42 samples/sec     Train-accuracy=0.969063
INFO:root:Epoch[2] Batch [800]  Speed: 55773.46 samples/sec     Train-accuracy=0.968437
INFO:root:Epoch[2] Batch [900]  Speed: 55457.29 samples/sec     Train-accuracy=0.972344
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=1.084


但是在另一台机器上,训练mnist的 Train-accuracy就特别小

[root@box image-classification]# python train_mnist.py --gpus 0
INFO:root:start with arguments Namespace(batch_size=64, disp_batches=100, gpus='0', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, network='mlp', num_classes=10, num_epochs=50, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001)
INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [100]  Speed: 32293.73 samples/sec     Train-accuracy=0.103594
INFO:root:Epoch[0] Batch [200]  Speed: 35961.76 samples/sec     Train-accuracy=0.096094
INFO:root:Epoch[0] Batch [300]  Speed: 39744.90 samples/sec     Train-accuracy=0.098125
INFO:root:Epoch[0] Batch [400]  Speed: 34682.12 samples/sec     Train-accuracy=0.092969
INFO:root:Epoch[0] Batch [500]  Speed: 38506.46 samples/sec     Train-accuracy=0.095312
INFO:root:Epoch[0] Batch [600]  Speed: 42484.79 samples/sec     Train-accuracy=0.106250
INFO:root:Epoch[0] Batch [700]  Speed: 37907.76 samples/sec     Train-accuracy=0.097969
INFO:root:Epoch[0] Batch [800]  Speed: 36108.62 samples/sec     Train-accuracy=0.096562
INFO:root:Epoch[0] Batch [900]  Speed: 40969.94 samples/sec     Train-accuracy=0.104063
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.638
INFO:root:Epoch[0] Validation-accuracy=0.098029
INFO:root:Epoch[1] Batch [100]  Speed: 35087.31 samples/sec     Train-accuracy=0.103594
INFO:root:Epoch[1] Batch [200]  Speed: 42854.64 samples/sec     Train-accuracy=0.096094
INFO:root:Epoch[1] Batch [300]  Speed: 38468.49 samples/sec     Train-accuracy=0.098125
INFO:root:Epoch[1] Batch [400]  Speed: 41700.91 samples/sec     Train-accuracy=0.092969
INFO:root:Epoch[1] Batch [500]  Speed: 43624.63 samples/sec     Train-accuracy=0.095312
INFO:root:Epoch[1] Batch [600]  Speed: 43745.13 samples/sec     Train-accuracy=0.106250
INFO:root:Epoch[1] Batch [700]  Speed: 43572.09 samples/sec     Train-accuracy=0.097969
INFO:root:Epoch[1] Batch [800]  Speed: 38464.52 samples/sec     Train-accuracy=0.096562
INFO:root:Epoch[1] Batch [900]  Speed: 37482.35 samples/sec     Train-accuracy=0.104063
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=1.489
INFO:root:Epoch[1] Validation-accuracy=0.098029
INFO:root:Epoch[2] Batch [100]  Speed: 38364.69 samples/sec     Train-accuracy=0.103594
INFO:root:Epoch[2] Batch [200]  Speed: 39203.17 samples/sec     Train-accuracy=0.096094
INFO:root:Epoch[2] Batch [300]  Speed: 42395.08 samples/sec     Train-accuracy=0.098125
INFO:root:Epoch[2] Batch [400]  Speed: 39442.16 samples/sec     Train-accuracy=0.092969
INFO:root:Epoch[2] Batch [500]  Speed: 37321.06 samples/sec     Train-accuracy=0.095312
INFO:root:Epoch[2] Batch [600]  Speed: 40335.30 samples/sec     Train-accuracy=0.106250
INFO:root:Epoch[2] Batch [700]  Speed: 43203.50 samples/sec     Train-accuracy=0.097969
INFO:root:Epoch[2] Batch [800]  Speed: 37934.22 samples/sec     Train-accuracy=0.096562
INFO:root:Epoch[2] Batch [900]  Speed: 39594.41 samples/sec     Train-accuracy=0.104063
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=1.511
INFO:root:Epoch[2] Validation-accuracy=0.098029

原因不明

待解决。。。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐