Run a local BLAST
2015-07-02 17:18
330 查看
In this context, ‘local’ means you are running BLAST on your own server, not at NCBI or anyone else’s server. This gives you the flexibility of comparing your query either against precomputed databases (like NR, Swissprot, trEMBL, etc.) or against a customized database, containing a specific set of sequences.
Query Sequence Set: contigs.supercontigs.filtered_012808.fasta
Reference Database: all_gene-CDS-111313.txt
(Hint: choose among: BLASTP, BLASTN, BLASTX, TBLASTN, TBLASTX)
BLAST Reference: http://www.ncbi.nlm.nih.gov/BLAST/blast_program.shtml
Example1.blastn,compares a nucleotide query sequence against a nucleotide sequence database
1.Generate a customized search database
$ makeblastdb -dbtype nucl -in contigs.supercontigs.filtered_012808.fasta
2.BLAST the all_gene-CDS-111313.txt
$ blastn -query all_gene-CDS-111313.txt -db contigs.supercontigs.filtered_012808.fasta-outfmt 6 -out gene.blastn
3.Wait until it finishes then take a look at the output:
$ less gene.blastn
4.BLAST tabular format (as specified in -outfmt 6) has multiple columns, and is the easiest output format to work with. http://www.pangloss.com/wiki/Blast
Cout the column (-f 2):
$ cut -f 2 gene.blastn | sort -u > hits.ids
5.How many unique hits did we get? This equals to the number of lines in the file:
$ wc -l hits.ids
6.It appears we are getting lots of hits. We may need to go back to the BLAST command to add an E-value cutoff to be more stringent. How to add E-value cutoff? Let’s look at the help for tblastn
$ blastn -help
7.Now run BLAST with more stringent settings (E-value cutoff: 1e-20):
$ blastn -query all_gene-CDS-111313.txt -db contigs.supercontigs.filtered_012808.fasta -outfmt 6 -out gene.blastn -evalue 1e-20
$ cut -f 2 gene.blastn| sort -u > hits.ids
8.Extract sequences from contigs.supercontigs.filtered_012808.fasta
The command faSomeRecords can be used to extract multiple sequences, now let’s get all the BLAST hits to retrieve a gene family.
$ faSomeRecords contigs.supercontigs.filtered_012808.fasta hits.ids hits.fasta
Example2:blastp,compares an amino acid query sequence against a protein sequence database
1.makeblastdb -dbtype prot -in rice-target.fasta
2. blastp -query rice-aa.fasta -db rice-target.fasta -outfmt 6 -out rice.blastp
3. less rice.blastp
Query Sequence Set: contigs.supercontigs.filtered_012808.fasta
Reference Database: all_gene-CDS-111313.txt
(Hint: choose among: BLASTP, BLASTN, BLASTX, TBLASTN, TBLASTX)
BLAST Reference: http://www.ncbi.nlm.nih.gov/BLAST/blast_program.shtml
Example1.blastn,compares a nucleotide query sequence against a nucleotide sequence database
1.Generate a customized search database
$ makeblastdb -dbtype nucl -in contigs.supercontigs.filtered_012808.fasta
2.BLAST the all_gene-CDS-111313.txt
$ blastn -query all_gene-CDS-111313.txt -db contigs.supercontigs.filtered_012808.fasta-outfmt 6 -out gene.blastn
3.Wait until it finishes then take a look at the output:
$ less gene.blastn
4.BLAST tabular format (as specified in -outfmt 6) has multiple columns, and is the easiest output format to work with. http://www.pangloss.com/wiki/Blast
Cout the column (-f 2):
$ cut -f 2 gene.blastn | sort -u > hits.ids
5.How many unique hits did we get? This equals to the number of lines in the file:
$ wc -l hits.ids
6.It appears we are getting lots of hits. We may need to go back to the BLAST command to add an E-value cutoff to be more stringent. How to add E-value cutoff? Let’s look at the help for tblastn
$ blastn -help
7.Now run BLAST with more stringent settings (E-value cutoff: 1e-20):
$ blastn -query all_gene-CDS-111313.txt -db contigs.supercontigs.filtered_012808.fasta -outfmt 6 -out gene.blastn -evalue 1e-20
$ cut -f 2 gene.blastn| sort -u > hits.ids
8.Extract sequences from contigs.supercontigs.filtered_012808.fasta
The command faSomeRecords can be used to extract multiple sequences, now let’s get all the BLAST hits to retrieve a gene family.
$ faSomeRecords contigs.supercontigs.filtered_012808.fasta hits.ids hits.fasta
Example2:blastp,compares an amino acid query sequence against a protein sequence database
1.makeblastdb -dbtype prot -in rice-target.fasta
2. blastp -query rice-aa.fasta -db rice-target.fasta -outfmt 6 -out rice.blastp
3. less rice.blastp
相关文章推荐
- 插入节点appendChild()
- Ubuntu12.04出现 The system is running in low-graphics mode解决方法
- ajaxfileupload.js 实现异步上传图片
- iOS UIView动画实践(一):揭开Animation的神秘面纱
- Android圆环控件
- shell常用的50个命令
- jmap 命令
- VMWare Linux 瘦身
- C++重载运算符简单总结
- 【经济学原理】十大经济型原理——思维导图
- 我下载了渲云的客户端,但是安装的时候出现了错误提示
- PHP中PDO_MYSQL扩展支持
- android SharedPreferences 存储对象
- 九度oj 题目1207:质因数的个数
- 公式编写及各种K形态的描述
- 编程之美 2.10 扩展问题:求数组中的第二大数
- DropZone
- .gitignore的多级目录配置
- 原创Oracle数据泵导出/导入(expdp/impdp)
- 初探Asp.net请求机制原理 1