您的位置：首页 > 理论基础 > 计算机网络

神经网络 | DeepVO:Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks框架

2019-03-28 21:48 651 查看

博主github：https://github.com/MichaelBeechan
博主CSDN：https://blog.csdn.net/u011344545

DeepVO代码：链接：https://pan.baidu.com/s/1bSNuZaj0KouXAXlhM4XK_g
提取码：wpzz
论文：http://www.cs.ox.ac.uk/files/9026/DeepVO.pdf

1、End-to-End VO——RCNN（传统VO 和 End-to-End VO）

2、网络结构和CNN配置

CNN部分有9个卷积层，除了Conv6，其他的卷积层后都连接1层ReLU，则共有17层。

3、LSTM

4、理论部分

4.1 时序模型
RNN与CNN的不同之处在于它对隐藏状态的记忆是随着时间的推移而保持的，并且在隐藏状态之间存在反馈回路，使得它当前的隐藏状态是之前状态的函数。
给定卷积特性xk在时刻k, RNN在时刻k更新

hk和yk分别为k时刻的隐藏状态和输出。
W项表示相应的权矩阵。
b项表示偏置向量。
H是一个元素非线性激活函数。
4.2 LSTM

是两个向量的元素积。
σ是乙状结肠非线性。
tanh是双曲正切非线性。
W项表示相应的权矩阵。
b项表示偏置向量。
ik、fk、gk、ck和ok分别是输入门、遗忘门、输入调制门、存储单元和输出门。
每个LSTM层都有1000个隐藏状态。

5、损失函数及优化

输入序列图像X，输出位姿的条件概率：

参数优化：

DNN的超参数：

(pkφk)表示ground truth pose。
(pˆk,φˆk)表示估计的ground truth pose。
κ(实验中设置为100)是一个比例因子平衡postions和orientations的权重。
N是样本的个数。
φ方向由欧拉角表示而不是四元数自额外单位四元数受限制,阻碍了DL的优化问题。

6、实验结果

参考文献

[1] D. Scaramuzza and F. Fraundorfer, “Visual odometry: Tutorial,” IEEE
Robotics & Automation Magazine, vol. 18, no. 4, pp. 80–92, 2011.
[2] F. Fraundorfer and D. Scaramuzza, “Visual odometry: Part II: Match-
ing, robustness, optimization, and applications,” IEEE Robotics &
Automation Magazine, vol. 19, no. 2, pp. 78–90, 2012.
[3] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous
driving? the KITTI vision benchmark suite,” in Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
2012.
[4] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venu-
gopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolu-
tional networks for visual recognition and description,” IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, to appear.
[5] R. Hartley and A. Zisserman, Multiple view geometry in computer
vision. Cambridge university press, 2003.
[6] D. Nistér, O. Naroditsky, and J. Bergen, “Visual odometry,” in
Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), vol. 1. IEEE, 2004, pp. I–652.
[7] A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3D recon-
struction in real-time,” in Intelligent Vehicles Symposium (IV), 2011.
[8] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “MonoSLAM:
Real-time single camera SLAM,” IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, 2007.
[9] G. Klein and D. Murray, “Parallel tracking and mapping for small AR
workspaces,” in IEEE and ACM International Symposium on Mixed
and Augmented Reality (ISMAR). IEEE, 2007, pp. 225–234.
[10] R. Mur-Artal, J. Montiel, and J. D. Tardós, “ORB-SLAM: a versa-
tile and accurate monocular SLAM system,” IEEE Transactions on
Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
[11] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “DTAM:
Dense tracking and mapping in real-time,” in Proceedings of IEEE
International Conference on Computer Vision (ICCV). IEEE, 2011,
pp. 2320–2327.
[12] J. Engel, J. Sturm, and D. Cremers, “Semi-dense visual odometry for a
monocular camera,” in Proceedings of IEEE International Conference
on Computer Vision (ICCV), 2013, pp. 1449–1456.
[13] C. Forster, M. Pizzoli, and D. Scaramuzza, “SVO: Fast semi-direct
monocular visual odometry,” in Proceedings of IEEE International
Conference on Robotics and Automation (ICRA). IEEE, 2014, pp.
15–22.
[14] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” in
arXiv:1607.02565, July 2016.
[15] R. Roberts, H. Nguyen, N. Krishnamurthi, and T. Balch, “Memory-
based learning for visual odometry,” in Proceedings of IEEE Interna-
tional Conference on Robotics and Automation (ICRA). IEEE, 2008,
pp. 47–52.
[16] V. Guizilini and F. Ramos, “Semi-parametric learning for visual
odometry,” The International Journal of Robotics Research, vol. 32,
no. 5, pp. 526–546, 2013.
[17] T. A. Ciarfuglia, G. Costante, P. Valigi, and E. Ricci, “Evaluation
of non-geometric methods for visual odometry,” Robotics and Au-
tonomous Systems, vol. 62, no. 12, pp. 1717–1730, 2014.
[18] N. Sünderhauf, S. Shirazi, A. Jacobson, F. Dayoub, E. Pepperell,
B. Upcroft, and M. Milford, “Place recognition with convnet land-
marks: Viewpoint-robust, condition-robust, training-free,” in Proceed-
ings of Robotics: Science and Systems (RSS), 2015.
[19] K. Konda and R. Memisevic, “Learning visual odometry with a
convolutional network,” in Proceedings of International Conference
on Computer Vision Theory and Applications, 2015.
[20] A. Kendall, M. Grimes, and R. Cipolla, “Convolutional networks for
real-time 6-DoF camera relocalization,” in Proceedings of Interna-
tional Conference on Computer Vision (ICCV), 2015.
[21] G. Costante, M. Mancini, P. Valigi, and T. A. Ciarfuglia, “Exploring
representation learning with CNNs for frame-to-frame ego-motion
estimation,” IEEE Robotics and Automation Letters, vol. 1, no. 1, pp.
18–25, 2016.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” arXiv preprint arXiv:1409.1556,
2014.
[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
convolutions,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
[24] A. Dosovitskiy, P. Fischery, E. Ilg, C. Hazirbas, V. Golkov, P. van der
Smagt, D. Cremers, T. Brox et al., “Flownet: Learning optical flow
with convolutional networks,” in Proceedings of IEEE International
Conference on Computer Vision (ICCV). IEEE, 2015, pp. 2758–2766.
[25] I. Goodfellow, Y. Bengio, and A. Courville, “Deep learning,” 2016,
book in preparation for MIT Press.
[26] W. Zaremba and I. Sutskever, “Learning to execute,” arXiv preprint
arXiv:1410.4615, 2014.
[27] A. Graves and N. Jaitly, “Towards end-to-end speech recognition with
recurrent neural networks.” in Proceedings of International Conference
on Machine Learning (ICML), vol. 14, 2014, pp. 1764–1772.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航