您的位置:首页 > 大数据 > 人工智能

【论文笔记】residual neural network-kaiming he

2016-03-18 16:41 477 查看
http://arxiv.org/abs/1512.03385

The stack of layer will cause degradation(same train, more layer, less accuracy, which is not caused by overfit or derivation vanishing). 

H(x) is expected result, F(x) is residual error, H(x)=F(x)+x. Assume F(x) is easier be approached by CNN than H(x). Experiment support this assumption. 

Bottle-neck architecture of ResNets is more economical.

ReLu

Activation function. Its derivative is logistic function. y = 0 when x < 0, it reduce the number of active neuron in network. Therefore there are only about half of parameters should be modified when BP, increase the training speed.

derivation vanishing

Activation function has output range (-1,1) or (0,1), cause the decrease of derivation in back propagation, making shallow layer parameters can not be modified effectively, called vanish.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息