您的位置:首页 > 其它

bam/sam 数据格式的介绍 (二)

2017-06-12 16:36 232 查看
5.详解

举例:

E00606:11:H2CC3CCXY:8:1101:7172:14195 77
* 0
0 * *
0 0
CTACGAGTCATTTAGCACCGGGTTCTCCACAAACTTGCGGTGCGTCTCCAGAGAGGGGCGGCACTCGTTCGGCCGCACCCCGGTCCAGTCACGAACGGCTCTCCACACCGGCCGGCCCCGGGGGGTCGACCGGCTATCCCAGGCCAATCA
AAFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JFJJJJ<FJJJJJJJJJJJJJJJJJ)FJ<JJJJJJFJJJJJJJJJJFJJ<
XM:i:0

E00606:11:H2CC3CCXY:8:1101:7172:14195 2:N:0:ATCACG
141 * 0
0 *
* 0 0
AGACATTTGGTGCGTGTGCTTGGCTGAGGAGCCACTGGTGCGAAGCTACCATCTGTGGGATTATGACTGAACGCCTCTAAGTCAGAATCCCGCCTAAACGTAACGATACCGCAGCGCCGCGGGACTTTGATTGGCCTGGGATAGCCGGTC
AAAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJJ<JAJJJJJJJ<JJJJJ<JJJJJJJJJF7JJFFFJJJJJFJAJJJAJFJJJJ7JJFJJFFA-A7FFJJJJF-AFJJJJJJJJ
XM:i:0

E00606:11:H2CC3CCXY:8:1101:6400:14195 77
* 0
0 * *
0 0
GCGGGATGCAGGCCGCTCACCATGGCGACGGAGCTGGAGGCGTGGCTCATGTATGAGGATGTCTGGGGCAGCGGATACGTCACCACCTCCAGTACATCATGAGAGCTGCGCTTGAAGCGGTTATTACTGGGCAGCGGCAGCAGGGGGCAG
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
XM:i:0

E00606:11:H2CC3CCXY:8:1101:7963:14195 2:N:0:ATCACG
141 * 0
0 *
* 0 0
GAGTCTAACGCACGCGCGAGTCAAAGGGTGTCTCCGAGCCCCCACGGCGCAATGAAGGTGAAGGCCGGCGCTCGCCGGCCCAGGTGGGATCCCCCCGCCCCGGCGGGGGGCGCACCACCGGCCCGTCTCGCCCGCACCGCCGGGCAGGTG
AAAFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJFJFJJFJJJJJ<JJJJJJJ<FJJJFJJJFFFJJJJFJJJFJJJJJJJJJJJJJ)AFFJJJJFJJJFFJJJJJJJJ)7<JF--<FJFJ)<
XM:i:0

1)QNAME

query name 一般就是read名称 如:E00606:11:H2CC3CCXY:8:1101:7172:14195

2)FLAG  



以下信息来自于:http://www.cnblogs.com/xudongliang/p/5437850.html

/*! @abstract the read is paired in sequencing, no matter whether it is mapped in a pair */

#define BAM_FPAIRED        1

/*! @abstract the read is mapped in a proper pair */

#define BAM_FPROPER_PAIR   2

/*! @abstract the read itself is unmapped; conflictive with BAM_FPROPER_PAIR */

#define BAM_FUNMAP         4

/*! @abstract the mate is unmapped */

#define BAM_FMUNMAP        8

/*! @abstract the read is mapped to the reverse strand */

#define BAM_FREVERSE      16

/*! @abstract the mate is mapped to the reverse strand */

#define BAM_FMREVERSE     32

/*! @abstract this is read1 */

#define BAM_FREAD1        64

/*! @abstract this is read2 */

#define BAM_FREAD2       128

/*! @abstract not primary alignment */

#define BAM_FSECONDARY   256

/*! @abstract QC failure */

#define BAM_FQCFAIL      512

/*! @abstract optical or PCR duplicate */

#define BAM_FDUP        1024

/*! @abstract supplementary alignment */

#define BAM_FSUPPLEMENTARY 2048

1 : 代表这个序列采用的是PE双端测序

2: 代表这个序列和参考序列完全匹配,没有错配和插入缺失

4: 代表这个序列没有mapping到参考序列上

8: 代表这个序列的另一端序列没有比对到参考序列上,比如这条序列是R1,它对应的R2端序列没有比对到参考序列上

16:代表这个序列比对到参考序列的负链上

32 :代表这个序列对应的另一端序列比对到参考序列的负链上

64 : 代表这个序列是R1端序列, read1;

128 : 代表这个序列是R2端序列,read2;

256: 代表这个序列不是主要的比对,一条序列可能比对到参考序列的多个位置,只有一个是首要的比对位置,其他都是次要的

512: 代表这个序列在QC时失败了,被过滤不掉了(# 这个标签不常用)

1024: 代表这个序列是PCR重复序列(#这个标签不常用)

2048: 代表这个序列是补充的比对(#这个标签具体什么意思,没搞清楚,但是不常用)

上面的这几个标签都是2的n次方,这样的数列有一个特点,就是随机挑选其中的几个,它们的和是唯一的,比如65 只能是1 和 64 组成,代表这个序列是双端测序,而且是read1

samtools 中flag 可以查看flags详细信息:如:

$samtools flags 77

0x4d    77      PAIRED,UNMAP,MUNMAP,READ1

flags值为77 

PAIRED表示这条序列采用双端测序, 其值为1;

UNMAP表示这个序列没有mapping到参考序列上, 其值为4;

MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;

READ1表示这条序列是R1端序列,其值为64.

以上数值相加和为77

$samtools flags 141

0x8d    141     PAIRED,UNMAP,MUNMAP,READ2

flags值为141

PAIRED表示这条序列采用双端测序, 其值为1;

UNMAP表示这个序列没有mapping到参考序列上, 其值为4;

MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;

READ1表示这条序列是R1端序列,其值为128.

以上数值相加和为141

3)RNAME

reference sequence name

一般是参考基因组染色体名称,如果没有比对上,用*表示
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  sam bam 生物信息