bam/sam 数据格式的介绍 (二)
2017-06-12 16:36
232 查看
5.详解
举例:
E00606:11:H2CC3CCXY:8:1101:7172:14195 77
* 0
0 * *
0 0
CTACGAGTCATTTAGCACCGGGTTCTCCACAAACTTGCGGTGCGTCTCCAGAGAGGGGCGGCACTCGTTCGGCCGCACCCCGGTCCAGTCACGAACGGCTCTCCACACCGGCCGGCCCCGGGGGGTCGACCGGCTATCCCAGGCCAATCA
AAFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JFJJJJ<FJJJJJJJJJJJJJJJJJ)FJ<JJJJJJFJJJJJJJJJJFJJ<
XM:i:0
E00606:11:H2CC3CCXY:8:1101:7172:14195 2:N:0:ATCACG
141 * 0
0 *
* 0 0
AGACATTTGGTGCGTGTGCTTGGCTGAGGAGCCACTGGTGCGAAGCTACCATCTGTGGGATTATGACTGAACGCCTCTAAGTCAGAATCCCGCCTAAACGTAACGATACCGCAGCGCCGCGGGACTTTGATTGGCCTGGGATAGCCGGTC
AAAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJJ<JAJJJJJJJ<JJJJJ<JJJJJJJJJF7JJFFFJJJJJFJAJJJAJFJJJJ7JJFJJFFA-A7FFJJJJF-AFJJJJJJJJ
XM:i:0
E00606:11:H2CC3CCXY:8:1101:6400:14195 77
* 0
0 * *
0 0
GCGGGATGCAGGCCGCTCACCATGGCGACGGAGCTGGAGGCGTGGCTCATGTATGAGGATGTCTGGGGCAGCGGATACGTCACCACCTCCAGTACATCATGAGAGCTGCGCTTGAAGCGGTTATTACTGGGCAGCGGCAGCAGGGGGCAG
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
XM:i:0
E00606:11:H2CC3CCXY:8:1101:7963:14195 2:N:0:ATCACG
141 * 0
0 *
* 0 0
GAGTCTAACGCACGCGCGAGTCAAAGGGTGTCTCCGAGCCCCCACGGCGCAATGAAGGTGAAGGCCGGCGCTCGCCGGCCCAGGTGGGATCCCCCCGCCCCGGCGGGGGGCGCACCACCGGCCCGTCTCGCCCGCACCGCCGGGCAGGTG
AAAFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJFJFJJFJJJJJ<JJJJJJJ<FJJJFJJJFFFJJJJFJJJFJJJJJJJJJJJJJ)AFFJJJJFJJJFFJJJJJJJJ)7<JF--<FJFJ)<
XM:i:0
1)QNAME
query name 一般就是read名称 如:E00606:11:H2CC3CCXY:8:1101:7172:14195
2)FLAG
以下信息来自于:http://www.cnblogs.com/xudongliang/p/5437850.html
/*! @abstract the read is paired in sequencing, no matter whether it is mapped in a pair */
#define BAM_FPAIRED 1
/*! @abstract the read is mapped in a proper pair */
#define BAM_FPROPER_PAIR 2
/*! @abstract the read itself is unmapped; conflictive with BAM_FPROPER_PAIR */
#define BAM_FUNMAP 4
/*! @abstract the mate is unmapped */
#define BAM_FMUNMAP 8
/*! @abstract the read is mapped to the reverse strand */
#define BAM_FREVERSE 16
/*! @abstract the mate is mapped to the reverse strand */
#define BAM_FMREVERSE 32
/*! @abstract this is read1 */
#define BAM_FREAD1 64
/*! @abstract this is read2 */
#define BAM_FREAD2 128
/*! @abstract not primary alignment */
#define BAM_FSECONDARY 256
/*! @abstract QC failure */
#define BAM_FQCFAIL 512
/*! @abstract optical or PCR duplicate */
#define BAM_FDUP 1024
/*! @abstract supplementary alignment */
#define BAM_FSUPPLEMENTARY 2048
1 : 代表这个序列采用的是PE双端测序
2: 代表这个序列和参考序列完全匹配,没有错配和插入缺失
4: 代表这个序列没有mapping到参考序列上
8: 代表这个序列的另一端序列没有比对到参考序列上,比如这条序列是R1,它对应的R2端序列没有比对到参考序列上
16:代表这个序列比对到参考序列的负链上
32 :代表这个序列对应的另一端序列比对到参考序列的负链上
64 : 代表这个序列是R1端序列, read1;
128 : 代表这个序列是R2端序列,read2;
256: 代表这个序列不是主要的比对,一条序列可能比对到参考序列的多个位置,只有一个是首要的比对位置,其他都是次要的
512: 代表这个序列在QC时失败了,被过滤不掉了(# 这个标签不常用)
1024: 代表这个序列是PCR重复序列(#这个标签不常用)
2048: 代表这个序列是补充的比对(#这个标签具体什么意思,没搞清楚,但是不常用)
上面的这几个标签都是2的n次方,这样的数列有一个特点,就是随机挑选其中的几个,它们的和是唯一的,比如65 只能是1 和 64 组成,代表这个序列是双端测序,而且是read1
samtools 中flag 可以查看flags详细信息:如:
$samtools flags 77
0x4d 77 PAIRED,UNMAP,MUNMAP,READ1
flags值为77
PAIRED表示这条序列采用双端测序, 其值为1;
UNMAP表示这个序列没有mapping到参考序列上, 其值为4;
MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;
READ1表示这条序列是R1端序列,其值为64.
以上数值相加和为77
$samtools flags 141
0x8d 141 PAIRED,UNMAP,MUNMAP,READ2
flags值为141
PAIRED表示这条序列采用双端测序, 其值为1;
UNMAP表示这个序列没有mapping到参考序列上, 其值为4;
MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;
READ1表示这条序列是R1端序列,其值为128.
以上数值相加和为141
3)RNAME
reference sequence name
一般是参考基因组染色体名称,如果没有比对上,用*表示
举例:
E00606:11:H2CC3CCXY:8:1101:7172:14195 77
* 0
0 * *
0 0
CTACGAGTCATTTAGCACCGGGTTCTCCACAAACTTGCGGTGCGTCTCCAGAGAGGGGCGGCACTCGTTCGGCCGCACCCCGGTCCAGTCACGAACGGCTCTCCACACCGGCCGGCCCCGGGGGGTCGACCGGCTATCCCAGGCCAATCA
AAFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JFJJJJ<FJJJJJJJJJJJJJJJJJ)FJ<JJJJJJFJJJJJJJJJJFJJ<
XM:i:0
E00606:11:H2CC3CCXY:8:1101:7172:14195 2:N:0:ATCACG
141 * 0
0 *
* 0 0
AGACATTTGGTGCGTGTGCTTGGCTGAGGAGCCACTGGTGCGAAGCTACCATCTGTGGGATTATGACTGAACGCCTCTAAGTCAGAATCCCGCCTAAACGTAACGATACCGCAGCGCCGCGGGACTTTGATTGGCCTGGGATAGCCGGTC
AAAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJJ<JAJJJJJJJ<JJJJJ<JJJJJJJJJF7JJFFFJJJJJFJAJJJAJFJJJJ7JJFJJFFA-A7FFJJJJF-AFJJJJJJJJ
XM:i:0
E00606:11:H2CC3CCXY:8:1101:6400:14195 77
* 0
0 * *
0 0
GCGGGATGCAGGCCGCTCACCATGGCGACGGAGCTGGAGGCGTGGCTCATGTATGAGGATGTCTGGGGCAGCGGATACGTCACCACCTCCAGTACATCATGAGAGCTGCGCTTGAAGCGGTTATTACTGGGCAGCGGCAGCAGGGGGCAG
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
XM:i:0
E00606:11:H2CC3CCXY:8:1101:7963:14195 2:N:0:ATCACG
141 * 0
0 *
* 0 0
GAGTCTAACGCACGCGCGAGTCAAAGGGTGTCTCCGAGCCCCCACGGCGCAATGAAGGTGAAGGCCGGCGCTCGCCGGCCCAGGTGGGATCCCCCCGCCCCGGCGGGGGGCGCACCACCGGCCCGTCTCGCCCGCACCGCCGGGCAGGTG
AAAFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJFJFJJFJJJJJ<JJJJJJJ<FJJJFJJJFFFJJJJFJJJFJJJJJJJJJJJJJ)AFFJJJJFJJJFFJJJJJJJJ)7<JF--<FJFJ)<
XM:i:0
1)QNAME
query name 一般就是read名称 如:E00606:11:H2CC3CCXY:8:1101:7172:14195
2)FLAG
以下信息来自于:http://www.cnblogs.com/xudongliang/p/5437850.html
/*! @abstract the read is paired in sequencing, no matter whether it is mapped in a pair */
#define BAM_FPAIRED 1
/*! @abstract the read is mapped in a proper pair */
#define BAM_FPROPER_PAIR 2
/*! @abstract the read itself is unmapped; conflictive with BAM_FPROPER_PAIR */
#define BAM_FUNMAP 4
/*! @abstract the mate is unmapped */
#define BAM_FMUNMAP 8
/*! @abstract the read is mapped to the reverse strand */
#define BAM_FREVERSE 16
/*! @abstract the mate is mapped to the reverse strand */
#define BAM_FMREVERSE 32
/*! @abstract this is read1 */
#define BAM_FREAD1 64
/*! @abstract this is read2 */
#define BAM_FREAD2 128
/*! @abstract not primary alignment */
#define BAM_FSECONDARY 256
/*! @abstract QC failure */
#define BAM_FQCFAIL 512
/*! @abstract optical or PCR duplicate */
#define BAM_FDUP 1024
/*! @abstract supplementary alignment */
#define BAM_FSUPPLEMENTARY 2048
1 : 代表这个序列采用的是PE双端测序
2: 代表这个序列和参考序列完全匹配,没有错配和插入缺失
4: 代表这个序列没有mapping到参考序列上
8: 代表这个序列的另一端序列没有比对到参考序列上,比如这条序列是R1,它对应的R2端序列没有比对到参考序列上
16:代表这个序列比对到参考序列的负链上
32 :代表这个序列对应的另一端序列比对到参考序列的负链上
64 : 代表这个序列是R1端序列, read1;
128 : 代表这个序列是R2端序列,read2;
256: 代表这个序列不是主要的比对,一条序列可能比对到参考序列的多个位置,只有一个是首要的比对位置,其他都是次要的
512: 代表这个序列在QC时失败了,被过滤不掉了(# 这个标签不常用)
1024: 代表这个序列是PCR重复序列(#这个标签不常用)
2048: 代表这个序列是补充的比对(#这个标签具体什么意思,没搞清楚,但是不常用)
上面的这几个标签都是2的n次方,这样的数列有一个特点,就是随机挑选其中的几个,它们的和是唯一的,比如65 只能是1 和 64 组成,代表这个序列是双端测序,而且是read1
samtools 中flag 可以查看flags详细信息:如:
$samtools flags 77
0x4d 77 PAIRED,UNMAP,MUNMAP,READ1
flags值为77
PAIRED表示这条序列采用双端测序, 其值为1;
UNMAP表示这个序列没有mapping到参考序列上, 其值为4;
MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;
READ1表示这条序列是R1端序列,其值为64.
以上数值相加和为77
$samtools flags 141
0x8d 141 PAIRED,UNMAP,MUNMAP,READ2
flags值为141
PAIRED表示这条序列采用双端测序, 其值为1;
UNMAP表示这个序列没有mapping到参考序列上, 其值为4;
MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;
READ1表示这条序列是R1端序列,其值为128.
以上数值相加和为141
3)RNAME
reference sequence name
一般是参考基因组染色体名称,如果没有比对上,用*表示
相关文章推荐
- bam/sam 数据格式的介绍 (一)
- pysam - 多种格式基因组数据(sam/bam/vcf/bcf/cram/…)读写与处理模块(python)
- pysam - 多种格式基因组数据(sam/bam/vcf/bcf/cram/…)读写与处理模块(python)--转载
- Adam学习13之Fasta/Fastq/SAM/BAM文件格式数据读取
- 转:本文介绍了在使用DevExpress GridControl的开发过程中如何设置列数据的格式。
- JSON数据格式介绍
- JSON数据格式介绍 .
- Weka -- 数据格式基本介绍
- Json格式和数据类型 介绍
- LANDSAT数据下载及数据格式介绍
- DecimalFormat 数据格式设置 SimpleDateFormat时间格式的用法介绍 --转载
- 数据格式介绍和转换
- JSON(JavaScript Object Notation) 数据交换格式介绍
- 转:ArcInfo数据格式介绍
- java与javascript之间json格式数据互转介绍
- Ajax核心XMLHttpRequest对象、(发送请求、接收)方法和属性介绍、AJAX开发框架、数据格式提要(XML、JSON、HTML)
- JSON数据格式介绍
- jason数据格式介绍
- JSON 数据格式介绍
- JSON数据格式介绍