您的位置:首页 > 其它

关于semantic segmentation的标签制作方法

2017-10-27 09:50 232 查看
这段时间想做自己的图像分割的标签,想了好几种方法工作量都比较大,现在在Amazon picking challenge的文章上面找到一种方案,拍摄背景,然后放上物体,然后算法处理即可获得分割标签,原文如下



VI. SELF-SUPERVISEDTRAINING
By bringing deep learning into the approach we gain robustness. This, however, comes at the expense of amassing

quality training data, which is necessary to learn highcapacity models with many parameters. Gathering and manually labeling such large amounts of training data is expensive.

The existing large-scale datasets used by deep learning (e.g.

ImageNet [20]) are mostly Internet photos, which have very

different object and image statistics from our warehouse

setting.

To automatically capture and pixel-wise label images, we

propose a self-supervised method, based on three observations:
· Batch-training on scenes with a single object can yield

deep models that perform well on scenes with multiple

objects [17] (i.e., simultaneous training on cat-only or

dog-only images enables successful testing on cat-withdog images);
· An accurate robot arm and accurate camera calibration,

gives us at will control over camera viewpoint;
· For single object scenes, with known background and

known camera viewpoint, we can automatically obtain

precise segmentation labels by foreground masking.

The captured training dataset contains 136,575 RGB-D images of 39 objects, all automatically labeled.
Semi-automatic data gathering. To semi-autonomously

gather large quantities of training data, we place single

known objects inside the shelf bins or tote in arbitrary poses,

and configure the robot to move the camera and capture

RGB-D images of the objects from a variety of different

viewpoints. The position of the shelf/tote is known to the
robot, as is the camera viewpoint, which we use to transform

the collected RGB-D images in shelf/or tote frame. After

capturing several hundred RGB-D images, the objects are

manually re-arranged to different poses, and the process is

repeated several times. Human involvement sums up to rearranging the objects and labeling which objects correspond

to which bin/tote. Selecting and changing the viewpoint,

capturing sensor data, and labeling each image by object

is automated. We collect RGB-D images of the empty shelf

and tote from the same exact camera viewpoints to model the

background, in preparation for the automatic data labeling.
Automatic data labeling. To obtain pixel-wise object segmentation labels, we create an object mask that separates

foreground from background. The process is composed of 2D

and 3D pipelines. The 2D pipeline is robust to thin objects

(objects not sufficient volume to be reliably segmented in 3D

when placed too close to a walls or ground) and objects with

no depth information, while the 3D pipeline is robust to large

miss-alignments between the pre-scanned shelf bin and tote.

Results from both pipelines are combined to automatically

label an object mask for each training RGB-D image.

The 2D pipeline starts by fixing minor possible image

misalignments by using multimodal 2D intensity-based registration to align the two RGB-D images [21]. We then convert

the aligned color image from RGB to HSV, and do pixelwise comparisons of the HSV and depth channels to separate

and label foreground from background.

The 3D pipeline uses multiple views of an empty shelf bin

and tote to create their pre-scanned 3D models. We then use

ICP to align all training images to the background model,

and remove points too close to the background to identify the

foreground mask. Finally, we project the foreground points

back to 2D to retrieve the object mask

4000
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: