您的位置:首页 > 其它

Learning Face Hallucination in the Wild--阅读笔记

2017-04-27 18:00 627 查看
1、However, conventional hallucination methods are often designed for controlled settings and cannot handle varying conditions of pose, resolution degree, and blur.

传统的方法有约束控制,不能处理多种姿态、像素深度及遮挡的情况。

2、Our method is based on a novel network architecture called Bi-channel Convolutional Neural Network (Bi-channel CNN). 

本文提出的方法基于一种叫做Bi-channel CNN的新颖的网络系统结构。

3、It extracts robust face representations from raw input by using deep convolutional network, then adaptively integrates two channels of information (the raw input image and face representations) to predict the high-resolution image.

简单说,就是从原始输入抽取鲁棒的人脸表征,接着将原始输入与人脸表征这两个信息整合从而预测高分辨率图像。

4、Our model consists of two modules: a feature extractor, and an image generator.

本文提出的模型包括两个模块:特征提取模块和图像生成模块。

5、The deep convolutional extractor learns from raw LR images and extracts descriptive face representations. The image generator takes two channels of information as inputs: the representations extracted by feature extractor and raw LR image.

特征抽取模块:利用深度卷积网络从低分辨率图像学习并抽取人脸表征。

图像产生模块:将抽取的人脸表征与原始的低分辨率图像结合产生高分辨率图像。

6、In this paper, we exploit a simple strategy to combine two channels of information by linear combination.

5中的两个信息利用线性组合方式合并。

7、the process of getting the LR image from HR image can be modeled as:



Here G is the blur kernel, ⊗ denotes the convolution operation and ↓ means down sampling.

这里给出的是如何对一个高分辨图片(正常图片)进行降采样生成一个低分辨率图片。

8、For a given LR image I L , the face hallucination system f is expected to predict a hallucinated face as similar as the ground truth I H by minimizing:



where Φ means the parameter of the system.

这里给出的是如何优化参数,使得f得到的高分辨率图像尽可能与原始的高分辨率图像接近。

9、For a given training set composed of LR and HR image pairs D = {(I L 1 , I H 1 ), (I L 2 , I H 2 ), . . . , (I L N , I H N )}, the parameter Φ can be determined by minimizing the objective function in Eq. 3:



同8.

10、Gaussian blur is usually caused by out-of-focus. It is defined in Eq. 4:



σ x , σ y are variance parameters on the horizontal and vertical directions and S g is a normalization constant.

这里是高斯模糊(导致散焦的一种模糊)。

11、Motion blur is caused by the relative movement between the image system and subjects. For simplicity, the blur kernel is modeled by two parameters θ, l, which represent the blur direction and moving distance, respectively. S m is a normalized constant.



运动模糊,移动摄像设备或者物体造成的模糊。

12、The feature extractor focuses on the robust global represen tations and naturally tends to lose information from raw LR input. Thus we provide an extra data path for image generator to obtain the raw LR input directly. Specifically, we want the image
generator uses both the raw LR input image and robust representations extracted by the prior extractor to hallucinate the HR output.



G就是产生高分辨图片的,而F就是抽取特征的,可以看到G的参数包括:抽取的特征、低分辨率图像及G的参数。

13、In this paper, we exploit a simple way to integrate both the raw input image and face representations with linear combination. It is controlled by one parameter fusion coefficient α as:



where ↑I in means upsampled input image by using bicubic interpolation, and α is the fusion coefficient controlled the incorporating behavior. I rec is the intermediate image predicted by a two layers fully-connected network as described in Basic CNN. The
fusion coefficient is predicted in the image generator implicitly based on the face representations F (I in , Φ F ).

就是说在生成高分辨率图片时,怎么将两个输入结合起来,I(in)就是低分辨率图像,I(rec)是直接用特征提取F之后,再经过两个全连接层获得的高分辨率图像。

14、Fig. 3 shows network details of our approach.



Figure 3: The network details of our method. The first three convolutional layers extract feature from the LR image I in . Each layer outputs feature maps by convolving the previous feature maps with linear filters, applying a non-linear function tanh(·),
and then down sampling by using max-pooling. The following fully-connected layers are combined into two groups. One group predicts a reconstructed face image I rec and another group estimates a fusion coefficient α. The HR output integrates I rec and I in
linearly with α.

具体说明如下:

Different from the Basic CNN, the image generator in Bi-channel CNN contains four fully-connected layers. First two layers predict the intermediate image I rec as described in Basic CNN and remains estimate a fusion coefficient α. Eq. 11, 12 present the
output of each layer, where W i j is the weighted matrix and b ji is bias term.



也就是说F()是将低分辨率图像经过3层卷积之后得到的特征图,将F得到的特征图复制成两份,一份用来通过两个全连接得到I(rec),一份用来通过两个全连接得到α。

15、Our dataset contains more than 100,000 faces.

实验时使用的数据为十万个人脸。

16、The 60% of faces are used as training set, 20% of faces are used as validation set and the remains are left out for testing. All images are scaled into 100 × 100-pixel.

五分之三用于训练,五分之一用于验证,剩下的五分之一用于测试,所有的图片缩放到100x100大小。

17、Both training and validation set are applied Gaussian blur or motion blur randomly, down sampled by a factor from 2 to 5 (i.e., the resolution of LR image lies in the range 20 × 20 to 50 × 50 pixels). The variance of Gaussian blur σ x , σ y lie in the
range 0 to 7. The moving distance of motion blur l lies in the range 0 to 11, and the blur direction θ is uniform selected from −π to π.

这里讲对训练集和验证集的图片怎么进行模糊(下采样),随机的选取高斯模糊和运动模糊,降采样后的低分辨率图像的大小介于20x20到50x50之间,高斯模糊的方差介于0到7之间,运动模糊的移动距离介于0到11之间,方向介于-π到π之间。

18、The size of network’s input I in is 48×48 pixels with RGB 3 channels. The network’s output is HR image with 100 × 100 pixels and RGB 3-channels.

原始的高分辨率的图片大小是100x100,而经过降采样之后输入到网络里的低分辨图片大小为48x48.

19、Data Pre-processing We train our model to handle LR input with different resolutions. The resolution of LR input lies in the range 20 × 20 to 50 × 50. Since the resolution of network input I in is 48 × 48, we upsample or downsample the LR image I L to
48 × 48 by using bicubic interpolation:



因为所有的原始图片都被缩放到100x100大小,而在模糊图像的时候,图片的大小变成了20x20到50x50的大小,而输入到网络里用于训练的图片的大小是固定的48x48,因此需要将模糊的图片进行上采样或者降采样到48x48大小。

20、All of entries of the input image I in and the ground truth image I H are normalized to lie in the range -1 to 1. Specifically, we denote I in and I H as



就是告诉你要对输入图片进行归一化。

21、The input image’s mean and standard deviation are computed as



I in and I H are normalized in Eq. 17:



这就是具体的归一化的方法。

22、When we test an image, we recover image from its normalized responding I out in Eq. 18.



在测试时候,网络输出的是归一化的结果,而式子18是从归一化的结果得到最终的结果。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  图像增强 cnn
相关文章推荐