您的位置：首页 > 编程语言

fpn pytorch跑github源码遇到的问题及解决

2019-04-05 14:13 1546 查看

今天尝试跑跑fpn pytorch版本的源码，下面列出出现的问题（坑//）和解决方式。因为作者貌似已经不更新（不管）这代码了，很多issue也没有回复，然而issue中大部分问题我也遇到了，所以源码直接下下来跑还是有挺多问题的。但作者说了这个fpn是根据他另一个repo faster rcnn代码改的，faster rcnn那个代码很成熟。所以还是信这位作者的！

这里我的解决方法有从issue中大家的讨论得到的，也有我自己改的。改的时候找不到办法了就对照着faster rcnn同位置部分代码参考，很有效！目前改完这些问题已经可以训练起来了～等待结果中

本文所用的fpn pytorch源码地址：https://github.com/jwyang/fpn.pytorch

（借鉴的faster rcnn源码地址：https://github.com/jwyang/faster-rcnn.pytorch

编译

需要把 make.sh 的-arch=sm_52 改为自己GPU型号对应的号，我是 TITAN X 所以改成了 sm_61
编译环境：要用 python2 pytorch0.4

这里我从issue里看到其他用torch0.4.1或者torch1.0的都编译不起来，我自己用python3编译也有一些print的问题。建议最好用上述版本编译

output_dir 模型输出位置报错

解决：trainval.py 中将作者的绝对路径目录改成自己的路径

No such file or directory：log

这两行会报错。

logging.basicConfig(filename="logs/"+args.net+"_"+args.dataset+"_"+str(args.session)+".log",
filemode='w', level=logging.DEBUG)
logging.info(str(datetime.now()))

解决：创建一个log文件夹在项目的最外层目录

softmax需要改

Specify dim in softmax function

在fpn.py & rpn_fpn.py：

use F.softmax(tensor, 1) in both fpn.py and rpn_fpn.py

trainval.py的item() 和单GPU的rpn_loss_cls，rpn_loss_box要改:

改完如下：

loss_temp += loss.item()

if args.mGPUs:
loss_rpn_cls = rpn_loss_cls.mean().item()
loss_rpn_box = rpn_loss_box.mean().item()
loss_rcnn_cls = RCNN_loss_cls.mean().item()
loss_rcnn_box = RCNN_loss_bbox.mean().item()
fg_cnt = torch.sum(roi_labels.data.ne(0))
bg_cnt = roi_labels.data.numel() - fg_cnt
else:
loss_rpn_cls = rpn_loss_cls.item()
loss_rpn_box = rpn_loss_box.item()
loss_rcnn_cls = RCNN_loss_cls.item()
loss_rcnn_box = RCNN_loss_bbox.item()
fg_cnt = torch.sum(roi_labels.data.ne(0))
bg_cnt = roi_labels.data.numel() - fg_cnt

RuntimeError: reciprocal is not implemented for type torch.cuda.LongTensor

anchor_target_layer.py改完如下

if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
num_examples = torch.sum(labels[i] >= 0)
num_examples = num_examples.float()
positive_weights = 1.0 / num_examples
negative_weights = 1.0 / num_examples

TypeError: index(): argument ‘indices’ (position 1) must be tuple of Tensots, not Tensor

proposal_target_layer.py仿照faster rcnn中的proposal_target_layer_cascade.py（index里面多加一个括号表明是tuple）。改完如下：

labels = gt_boxes[:,:,4].contiguous().view(-1).index((offset.view(-1),))\
.view(batch_size, -1)

RCNN_roi_align ERROR when training

That is because you have only one roi_level in idx_l , you can print idx_l and see. The idx_l should be a list but when there is only 1 roi_level for a given l then you will get this error.
you can change the line https://github.com/jwyang/fpn.pytorch/blob/master/lib/model/fpn/fpn.py#L131 to

idx_l = (roi_level == l).nonzero()
if idx_l.shape[0] > 1:
idx_l = idx_l.squeeze()
else:
idx_l = idx_l.view(-1)

训练过程中

显存占用：
我是用单GPU训练，显存占用浮动较大，会从4G～10G中徘徊
其他问题暂时没遇到，遇到会继续更新

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航