您的位置:首页 > 其它

深度学习阅读列表 Deep Learning Reading List

2016-03-02 12:18 435 查看

Reading List

List of reading lists and survey papers:


Books

Deep Learning, Yoshua Bengio, Ian Goodfellow, Aaron Courville, MIT Press, In preparation.


Review Papers

 Representation Learning: A Review and New Perspectives, Yoshua Bengio, Aaron Courville, Pascal Vincent,
Arxiv, 2012.
The monograph or review paper Learning Deep Architectures for
AI (Foundations & Trends in Machine Learning, 2009).
Deep Machine Learning – A New Frontier in Artificial Intelligence Research – a survey
paper by Itamar Arel, Derek C. Rose, and Thomas P. Karnowski.
Graves, A. (2012). Supervised sequence labelling with recurrent neural networks(Vol. 385). Springer.
Schmidhuber, J. (2014). Deep Learning in Neural Networks: An Overview. 75 pages, 850+ references, http://arxiv.org/abs/1404.7828,
PDF & LATEX source & complete public BIBTEX file under http://www.idsia.ch/~juergen/deep-learning-overview.html.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep
learning.” Nature 521, no. 7553 (2015): 436-444.


Reinforcement Learning

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. “Playing Atari with deep reinforcement learning.” arXiv preprint arXiv:1312.5602 (2013).

Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu. “Recurrent
Models of Visual Attention” ArXiv e-print, 2014.


Computer Vision

ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky,
Ilya Sutskever, Geoffrey E Hinton, NIPS 2012.
Going Deeper with Convolutions, Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, 19-Sept-2014.
Learning Hierarchical Features for Scene Labeling, Clement Farabet, Camille
Couprie, Laurent Najman and Yann LeCun, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
 Learning Convolutional Feature Hierachies for Visual Recognition, Koray Kavukcuoglu,
Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michaël Mathieu and Yann LeCun, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010.
Graves, Alex, et al. “A novel connectionist system for unconstrained
handwriting recognition.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.5 (2009): 855-868.
Cireşan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten
digit recognition. Neural computation, 22(12), 3207-3220.
Ciresan, Dan, Ueli Meier, and Jürgen Schmidhuber. “Multi-column deep neural networks for image classification.” Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
Ciresan, D., Meier, U., Masci, J., & Schmidhuber, J. (2011, July). A committee of neural networks
for traffic sign classification. In Neural Networks (IJCNN), The 2011 International Joint Conference on (pp. 1918-1921). IEEE.


NLP and Speech

Joint Learning of Words and
Meaning Representations for Open-Text Semantic Parsing, Antoine Bordes, Xavier Glorot, Jason Weston and Yoshua Bengio (2012), in: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS)
Dynamic
pooling and unfolding recursive autoencoders for paraphrase detection. Socher, R., Huang, E. H., Pennington, J., Ng, A. Y., and Manning, C. D. (2011a).  In NIPS’2011.
Semi-supervised recursive autoencoders for predicting sentiment distributions.
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D. (2011b).  In EMNLP’2011.
Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno
University of Technology, 2012.
Graves, Alex, and Jürgen Schmidhuber. “Framewise
phoneme classification with bidirectional LSTM and other neural network architectures.” Neural Networks 18.5 (2005): 602-610.

Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. “Distributed
representations of words and phrases and their compositionality.” In Advances in Neural Information Processing Systems, pp. 3111-3119. 2013.

K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. Learning
Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. EMNLP 2014.

Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. “Sequence
to sequence learning with neural networks.” Advances in Neural Information Processing Systems. 2014.


Disentangling Factors and Variations with Depth

Goodfellow, Ian, et al. “Measuring invariances in deep networks.” Advances in neural information processing systems 22 (2009): 646-654.

Bengio, Yoshua, et al. “Better Mixing via Deep Representations.” arXiv preprint arXiv:1207.4404 (2012).

Xavier Glorot, Antoine
Bordes and Yoshua Bengio, Domain
Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), pages 97-110, 2011.


Transfer Learning and domain adaptation

Raina, Rajat, et al. “Self-taught learning: transfer learning from unlabeled data.” Proceedings of the 24th international conference on Machine learning. ACM, 2007.

Xavier Glorot, Antoine
Bordes and Yoshua Bengio, Domain
Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11), pages 97-110, 2011.
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa. Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12:2493-2537, 2011.

Mesnil, Grégoire, et al. “Unsupervised and transfer learning challenge: a deep learning approach.” Unsupervised and Transfer Learning Workshop, in conjunction with ICML. 2011.

Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012, June). Transfer learning for Latin and
Chinese characters with deep neural networks. In Neural Networks (IJCNN), The 2012 International Joint Conference on (pp. 1-6). IEEE.
Goodfellow, Ian, Aaron Courville, and Yoshua Bengio. “Large-Scale Feature Learning With Spike-and-Slab Sparse
Coding.” ICML 2012.


Practical Tricks and Guides

“Improving neural networks by preventing co-adaptation of feature detectors.” Hinton, Geoffrey E., et al. 
arXiv preprint arXiv:1207.0580 (2012).
Practical recommendations for gradient-based training of deep architectures, Yoshua Bengio, U. Montreal,
arXiv report:1206.5533, Lecture Notes in Computer Science Volume 7700, Neural Networks: Tricks of the Trade Second Edition, Editors: Grégoire Montavon, Geneviève B. Orr, Klaus-Robert Müller, 2012.
A practical guide to training Restricted Boltzmann Machines, by Geoffrey Hinton.


Sparse Coding

Emergence of simple-cell receptive field properties
by learning a sparse code for natural images, Bruno Olhausen, Nature 1996.
Kavukcuoglu, Koray, Marc’Aurelio Ranzato, and Yann LeCun. “Fast inference
in sparse coding algorithms with applications to object recognition.” arXiv preprint arXiv:1010.3467 (2010).
Goodfellow, Ian, Aaron Courville, and Yoshua Bengio. “Large-Scale Feature Learning With Spike-and-Slab Sparse
Coding.” ICML 2012.
Efficient sparse coding algorithms. Honglak Lee, Alexis Battle, Raina Rajat and Andrew Y. Ng. In NIPS 19, 2007. pdf

“Sparse coding with an overcomplete basis
set: A strategy employed by VI?.” . Olshausen, Bruno A., and David J. Field. Vision research 37.23 (1997): 3311-3326.


Foundation Theory and Motivation

Hinton, Geoffrey E. “Deterministic Boltzmann learning performs steepest descent in weight-space.” Neural computation 1.1 (1989): 143-150.

Bengio, Yoshua, and Samy Bengio. “Modeling high-dimensional discrete data with multi-layer neural networks.” Advances in Neural Information Processing Systems 12 (2000): 400-406.

Bengio, Yoshua, et al. “Greedy layer-wise training of deep networks.” Advances in neural information processing systems 19 (2007): 153.

Bengio, Yoshua, Martin Monperrus, and Hugo Larochelle. “Nonlocal estimation of manifold structure.” Neural Computation 18.10 (2006): 2509-2528.

Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. “Reducing the dimensionality of data with neural networks.” Science 313.5786 (2006): 504-507.

Marc’Aurelio Ranzato, Y., Lan Boureau, and Yann LeCun. “Sparse feature learning for deep belief networks.” Advances in neural information processing systems 20 (2007): 1185-1192.

Bengio, Yoshua, and Yann LeCun. “Scaling learning algorithms towards AI.” Large-Scale Kernel Machines 34 (2007).

Le Roux, Nicolas, and Yoshua Bengio. “Representational power of restricted boltzmann machines and deep belief networks.” Neural Computation 20.6 (2008): 1631-1649.

Sutskever, Ilya, and Geoffrey Hinton. “Temporal-Kernel Recurrent Neural Networks.” Neural Networks 23.2 (2010): 239-243.

Le Roux, Nicolas, and Yoshua Bengio. “Deep belief networks are compact universal approximators.” Neural computation 22.8 (2010): 2192-2207.

Bengio, Yoshua, and Olivier Delalleau. “On the expressive power of deep architectures.” Algorithmic Learning Theory. Springer Berlin/Heidelberg, 2011.

Montufar, Guido F., and Jason Morton. “When Does a Mixture of Products Contain a Product of Mixtures?.” arXiv preprint arXiv:1206.0387 (2012).

Montúfar, Guido, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. “On the Number of Linear Regions of
Deep Neural Networks.” arXiv preprint arXiv:1402.1869 (2014).


Supervised Feedfoward Neural Networks

The Manifold Tangent Classifier, Salah Rifai, Yann Dauphin, Pascal Vincent, Yoshua
Bengio and Xavier Muller, in: NIPS’2011.
Discriminative Learning of Sum-Product Networks.“, Gens, Robert, and Pedro Domingos,
NIPS 2012 Best Student Paper.
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013). Maxout networks. Technical
Report, Universite de Montreal.

Hinton, Geoffrey E., et al. “Improving neural networks by preventing co-adaptation
of feature detectors.” arXiv preprint arXiv:1207.0580 (2012).

Wang, Sida, and Christopher Manning. “Fast dropout training.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 118-126. 2013.

Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. “Deep
sparse rectifier networks.” In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume, vol. 15, pp. 315-323. 2011.

ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky,
Ilya Sutskever, Geoffrey E Hinton, NIPS 2012.


Large Scale Deep Learning

Building High-level Features Using Large Scale Unsupervised Learning Quoc V.
Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng, ICML 2012.
Bengio, Yoshua, et al. “Neural probabilistic language models.” Innovations
in Machine Learning (2006): 137-186. Specifically Section 3 of this paper discusses the asynchronous SGD.

Dean, Jeffrey, et al. “Large scale distributed
deep networks.” Advances in Neural Information Processing Systems. 2012.


Recurrent Networks

Training Recurrent Neural Networks, Ilya Sutskever, PhD Thesis, 2012.
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies
with gradient descent is difficult.” Neural Networks, IEEE Transactions on 5.2 (1994): 157-166.
Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno
University of Technology, 2012.

Hochreiter, Sepp, and Jürgen Schmidhuber. “Long
short-term memory.” Neural computation 9.8 (1997): 1735-1780.

Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient
flow in recurrent nets: the difficulty of learning long-term dependencies.
Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural
Computation, 4(2), 234-242.
Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist
temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp. 369-376). ACM.


Hyper Parameters

“Practical Bayesian Optimization of Machine Learning Algorithms”, Jasper Snoek,
Hugo Larochelle, Ryan Adams, NIPS 2012.
Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua
Bengio (2012), in: Journal of Machine Learning Research, 13(281–305).
Algorithms for Hyper-Parameter Optimization, James Bergstra, Rémy Bardenet, Yoshua
Bengio and Balázs Kégl, in: NIPS’2011, 2011.


Optimization

Training Deep and Recurrent Neural Networks with Hessian-Free Optimization,
James Martens and Ilya Sutskever, Neural Networks: Tricks of the Trade, 2012.
Schaul, Tom, Sixin Zhang, and Yann LeCun. “No More Pesky Learning Rates.” arXiv preprint arXiv:1206.1106 (2012).
Le Roux, Nicolas, Pierre-Antoine Manzagol, and Yoshua Bengio. “Topmoumoute
online natural gradient algorithm.” Neural Information Processing Systems (NIPS). 2007.
Bordes, Antoine, Léon Bottou, and Patrick Gallinari. “SGD-QN: Careful quasi-Newton
stochastic gradient descent.” The Journal of Machine Learning Research 10 (2009): 1737-1754.
Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of
training deep feedforward neural networks.” Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). Society for Artificial Intelligence and Statistics. 2010.
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier
Networks.” Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume. Vol. 15. 2011.

“Deep learning via Hessian-free optimization.” Martens, James. Proceedings
of the 27th International Conference on Machine Learning (ICML). Vol. 951. 2010.

Hochreiter, Sepp, and Jürgen Schmidhuber. “Flat minima.” Neural Computation, 9.1
(1997): 1-42.

Pascanu, Razvan, and Yoshua Bengio. “Revisiting natural gradient for deep networks.” arXiv
preprint arXiv:1301.3584 (2013).

Dauphin, Yann N., Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. “Identifying
and attacking the saddle point problem in high-dimensional non-convex optimization.” In Advances in Neural Information Processing Systems, pp. 2933-2941. 2014.


Unsupervised Feature Learning

Salakhutdinov, Ruslan, and Geoffrey E. Hinton. “Deep boltzmann machines.” Proceedings
of the international conference on artificial intelligence and statistics. Vol. 5. No. 2. Cambridge, MA: MIT Press, 2009.
Scholarpedia page on Deep Belief Networks.


Deep Boltzmann Machines

An Efficient Learning Procedure for Deep Boltzmann Machines, Ruslan Salakhutdinov
and Geoffrey Hinton, Neural Computation August 2012, Vol. 24, No. 8: 1967 — 2006.
Montavon, Grégoire, and Klaus-Robert Müller. “Deep Boltzmann Machines and the
Centering Trick.” Neural Networks: Tricks of the Trade (2012): 621-637.
Salakhutdinov, Ruslan, and Hugo Larochelle. “Efficient learning of deep boltzmann machines.” International
Conference on Artificial Intelligence and Statistics. 2010.
Salakhutdinov, Ruslan. Learning deep generative models. Diss. University
of Toronto, 2009.

Goodfellow, Ian, et al. “Multi-prediction
deep Boltzmann machines.” Advances in Neural Information Processing Systems. 2013.


RBMs

Unsupervised Models of Images by Spike-and-Slab RBMs, Aaron Courville, James
Bergstra and Yoshua Bengio, in: ICML’2011
Hinton, Geoffrey. “A practical guide to training restricted Boltzmann machines.” Momentum 9.1
(2010): 926.


Autoencoders

Regularized Auto-Encoders Estimate Local Statistics, Guillaume Alain, Yoshua Bengio and Salah Rifai, Université
de Montréal, arXiv report 1211.4246, 2012
A Generative Process for Sampling Contractive Auto-Encoders, Salah Rifai, Yoshua Bengio, Yann Dauphin
and Pascal Vincent, in: ICML’2012, Edinburgh, Scotland, U.K., 2012
Contracting Auto-Encoders: Explicit invariance during feature extraction,
Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot and Yoshua Bengio, in: ICML’2011
Disentangling factors of variation for facial expression recognition,
Salah Rifai, Yoshua Bengio, Aaron Courville, Pascal Vincent and Mehdi Mirza, in: ECCV’2012.
Vincent, Pascal, et al. “Stacked denoising autoencoders: Learning useful
representations in a deep network with a local denoising criterion.” The Journal of Machine Learning Research 11 (2010): 3371-3408.
Vincent, Pascal. “A connection between score matching and denoising
autoencoders.” Neural computation 23.7 (2011): 1661-1674.
Chen, Minmin, et al. “Marginalized denoising autoencoders for domain adaptation.” arXiv preprint arXiv:1206.4683 (2012).


Miscellaneous

The ICML 2009 Workshop on Learning Feature Hierarchies webpage has a reading
list.
Stanford’s UFLDL Recommended Readings.
The LISApublic
wiki has a reading list and a bibliography.
Geoff Hinton has readings NIPS
2007 tutorial.
The LISA publications database contains a deep architectures category.
A very brief introduction to AI, Machine Learning, and Deep
Learning in Yoshua Bengio‘s IFT6266
graduate class
Memkite’s deep learning reading list, http://memkite.com/deep-learning-bibliography/. Deep learning resources page, http://www.jeremydjacksonphd.com/?cat=7
from: http://deeplearning.net/reading-list/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息