<二代测序> 下载 NCBI sra 文件
2016-04-06 18:19
323 查看
本文最近更新地址:
/article/9717738.html
随着测序技术的不断提高,二代测序数据成指数增长。
NCBI提供了SRA数据库存储这些数据。
http://www.ncbi.nlm.nih.gov/sra
为了方便更好的分析这些数据,NCBI提供了下载的命令行工具:sra-toolkit。包括以下命令:
官方文档:
http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc
prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data 下载数据
fastq-dump: Convert SRA data into fastq format # 将下载的sra数据转换为 fastq文件,支持 PE
sam-dump: Convert SRA data to sam format# sra转换为sam
sra-pileup: Generate pileup statistics on aligned SRA data
vdb-config: Display and modify VDB configuration information
vdb-decrypt: Decrypt non-SRA dbGaP data (“phenotype data”)
例子
prefetch ERR732926
直接下载 ERR732926 样本的文件,默认放入
prefetch cart_0.krt
下载 kart文件中的列表
prefetch -l cart_0.krt
列举cart_0.krt文件的内容
例子
fastq-dump -X 5 -Z SRR390728
可以在不下载的情况下,显示SRR390728样本的前五个读段(20行)
fastq-dump -I –split-files SRR390728
处理 paired-end 文件
Produces two fastq files (–split-files) containing “.1” and “.2” read suffices (-I) for paired-end data.
fastq-dump –split-files –fasta 60 SRR390728
Produces two (–split-files) fasta files (–fasta) with 60 bases per line (“60” included after –fasta).
fastq-dump –split-files –aligned -Q 64 SRR390728
Produces two fastq files (–split-files) that contain only aligned reads (–aligned; Note: only for files submitted as aligned data), with a quality offset of 64 (-Q 64) Please see the documentation on vdb-dump if you wish to produce fasta/qual data.
列举出常用命令,如果有其他需要请阅读官方文档。
/article/9717738.html
随着测序技术的不断提高,二代测序数据成指数增长。
NCBI提供了SRA数据库存储这些数据。
http://www.ncbi.nlm.nih.gov/sra
为了方便更好的分析这些数据,NCBI提供了下载的命令行工具:sra-toolkit。包括以下命令:
官方文档:
http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc
prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data 下载数据
fastq-dump: Convert SRA data into fastq format # 将下载的sra数据转换为 fastq文件,支持 PE
sam-dump: Convert SRA data to sam format# sra转换为sam
sra-pileup: Generate pileup statistics on aligned SRA data
vdb-config: Display and modify VDB configuration information
vdb-decrypt: Decrypt non-SRA dbGaP data (“phenotype data”)
prefetch
常用命令 Data transfer: # 如果已有下载的文件是否强制下载,默认为非强制 -f | --force <value> Force object download. One of: no, yes, all. no [default]: Skip download if the object if found and complete; yes: Download it even if it is found and is complete; all: Ignore lock files (stale locks or if it is currently being downloaded: use at your own risk!). # 选择下载的方式 ascp 和 http,默认先尝试 ascp,再尝试http --transport <value> Value one of: ascp (only), http (only), both (first try ascp, fallback to http). Default: both. # 列举 kart 文件中的 内容,大小 # 你可以把需要下载的项目放入 kart 文件 -l | --list List the contents of a kart file. -s | --list-sizes List the content of kart file with target file sizes. # 设置文件的最小尺寸 -N | --min-size <size> Minimum file size to download in KB (inclusive). # 设置文件的最大尺寸 -X | --max-size <size> Maximum file size to download in KB (exclusive). Default: 20G. # 排序方式 -o | --order <value> Kart prefetch order. One of: kart (in kart order), size (by file size: smallest first). default: size.
例子
prefetch ERR732926
直接下载 ERR732926 样本的文件,默认放入
~//ncbi/public/sra目录下
prefetch cart_0.krt
下载 kart文件中的列表
prefetch -l cart_0.krt
列举cart_0.krt文件的内容
fastq-dump
General: -h | --help Displays ALL options, general usage, and version information. -V | --version Display the version of the program. Data formatting: #分割 paired-end data --split-files Dump each read into separate file. Files will receive suffix corresponding to read number. --split-spot Split spots into individual reads. # 只保留fasta,没有质量得分 --fasta <[line width]> FASTA only, no qualities. Optional line wrap width (set to zero for no wrapping). -I | --readids Append read id after spot id as 'accession.spot.readid' on defline. -F | --origfmt Defline contains only original sequence name. -C | --dumpcs <[cskey]> Formats sequence using color space (default for SOLiD). "cskey" may be specified for translation. -B | --dumpbase Formats sequence using base space (default for other than SOLiD). -Q | --offset <integer> Offset to use for ASCII quality scores. Default is 33 ("!"). Filtering: -N | --minSpotId <rowid> Minimum spot id to be dumped. Use with "X" to dump a range. -X | --maxSpotId <rowid> Maximum spot id to be dumped. Use with "N" to dump a range. -M | --minReadLen <len> Filter by sequence length >= <len> --skip-technical Dump only biological reads. --aligned Dump only aligned sequences. Aligned datasets only; see sra-stat. --unaligned Dump only unaligned sequences. Will dump all for unaligned datasets. # 输出数据 Workflow and piping: -O | --outdir <path> Output directory, default is current working directory ('.'). -Z | --stdout Output to stdout, all split data become joined into single stream. --gzip Compress output using gzip. --bzip2 Compress output using bzip2.
例子
fastq-dump -X 5 -Z SRR390728
可以在不下载的情况下,显示SRR390728样本的前五个读段(20行)
fastq-dump -I –split-files SRR390728
处理 paired-end 文件
Produces two fastq files (–split-files) containing “.1” and “.2” read suffices (-I) for paired-end data.
fastq-dump –split-files –fasta 60 SRR390728
Produces two (–split-files) fasta files (–fasta) with 60 bases per line (“60” included after –fasta).
fastq-dump –split-files –aligned -Q 64 SRR390728
Produces two fastq files (–split-files) that contain only aligned reads (–aligned; Note: only for files submitted as aligned data), with a quality offset of 64 (-Q 64) Please see the documentation on vdb-dump if you wish to produce fasta/qual data.
列举出常用命令,如果有其他需要请阅读官方文档。
相关文章推荐
- Android应用层View绘制流程与源码分析,性能优化
- Android 上多方式定位元素(python)
- 结构体内存对齐问题
- Atitit.判断元素是否显示隐藏在父元素 overflow
- Atitit.判断元素是否显示隐藏在父元素 overflow
- Mach-o可执行文件简述
- java中&和&&的区别和联系
- 整理Java的MyBatis框架中一些重要的功能及基本使用示例
- Swift和OC代码注释分析 #pragma mark, FIXME and TODO
- 上帝视角-我是一个线程『转』
- Fast guided Filter
- iOS 数组越界处理方法总结
- xcode7的一些使用技巧
- MySql的like语句中的通配符:百分号、下划线和escape
- nginx 反向代理
- Atitit jOrgChart的使用 组织架构图css html
- javascript的模块化解读
- 在Android 下写一个检测软件版本号 以自动升级APP 的插件
- Atitit jOrgChart的使用 组织架构图css html
- 七种排序方法介绍