Extract User Defined Region From An Chromosome Fasta File
2016-12-30 09:59
260 查看
Extract User Defined Region From An Chromosome Fasta File
小麦基因组一条染色体序列至少有500M,我们想要获得染色体某一段的序列还是比较难办的。除了samtool可以做到,今天发现一个用python写的脚本pyfaidx,使用起来也比较方便。首先安装
pip install --user pyfaidx #安装在自己账户的 $HOME/.local/bin/faidx sudo pip install pyfaidx #需要权限,安装在/usr/local/bin/faidx
输入命令 faidx 默认返回使用帮助
faidx $ faidx usage: faidx [-h] [-b BED] [-o OUT] [-i {bed,chromsizes,nucleotide,transposed}] [-c] [-r] [-a SIZE_RANGE] [-n | -f] [-x] [-l] [-s DEFAULT_SEQ] [-d DELIMITER] [-g REGEX | -v] [-m | -M] [--no-rebuild] [--version] fasta [regions [regions ...]] Fetch sequences from FASTA. If no regions are specified, all entries in the input file are returned. Input FASTA file must be consistently line-wrapped, and line wrapping of output is based on input line lengths. positional arguments: fasta FASTA file regions space separated regions of sequence to fetch e.g. chr1:1-1000 optional arguments: -h, --help show this help message and exit -b BED, --bed BED bed file of regions -o OUT, --out OUT output file name (default: stdout) -i {bed,chromsizes,nucleotide,transposed}, --transform {bed,chromsizes,nucleotide,transposed} transform the requested regions into another format. default: None -c, --complement complement the sequence. default: False # -r, --reverse reverse the sequence. default: False -a SIZE_RANGE, --size-range SIZE_RANGE selected sequences are in the size range [low, high]. example: 1,1000 default: None -n, --no-names omit sequence names from output. default: False -f, --full-names output full names including description. default: False -x, --split-files write each region to a separate file (names are derived from regions) -l, --lazy fill in --default-seq for missing ranges. default: False -s DEFAULT_SEQ, --default-seq DEFAULT_SEQ default base for missing positions and masking. default: N -d DELIMITER, --delimiter DELIMITER delimiter for splitting names to multiple values (duplicate names will be discarded). default: None -g REGEX, --regex REGEX selected sequences are those matching regular expression. default: .* -v, --invert-match selected sequences are those not matching 'regions' argument. default: False -m, --mask-with-default-seq mask the FASTA file using --default-seq default: False -M, --mask-by-case mask the FASTA file by changing to lowercase. default: False --no-rebuild do not rebuild the .fai index even if it is out of date. default: False --version print pyfaidx version number Please cite: Shirley MD, Ma Z, Pedersen BS, Wheelan SJ. (2015) Efficient "pythonic" access to FASTA files using pyfaidx. PeerJ PrePrints 3:e1196 https://dx.doi.org/10.7287/peerj.preprints.970v1[/code]
从上边的帮助信息了解到除了获取指定的序列外还有诸如倒序反转等等功能。
举个最简单的例子,想要获取chr5B_part2上75800000-76000000的序列,则命令如下faidx wheat_genome.fasta chr5B_part2:75800000-76000000 # 默认返回序列屏幕 >chr5B_part2:75800000-76000000 GTCGTCTGCGAAGCCCCTACCAGAAATCCGCAACCTCCATGACATCCACCACTACACCAG GCCTCCACGTCTCTTGTCCTCTGCCGCCAATCCACACCTCTTGTCCGCCCACCATGTGGT ACCACATCCGCCCCTATAGAAGTCACCACGCCATGCAAGTGTTCGGTGAAATGCTCATGC TAATTTCCTTGTCCCTTCTTTATACAGTAATGGATTCAGACATGGAGAACATATATGAGC GCTATGTTGAGTCGTCCGAGGAGCAAGACTACATGGATGAAATGATAATGATGCAGGTGG TCCTTGCAGACGCGGAGCAAGCAGAAGAGCATGTTCTCAATTTCAAGGGATCCATCAA.... #剩下的序列未列出
相关文章推荐
- jar - user of jar command to list/extract file from a jar file
- Extract sequences from FASTA file based on a pair of list
- Extract Sobel feature from an image
- Extract UserControls from IssueVision Part I -- PaneCaption
- Extract audio from video file.
- The path "fos_user.from_email.address" cannot contain an empty value, but got null.
- How to create an XTR file from XML in Delphi XE4 using XML Mapper?(delphi中如何通过xmlmapper创建xtr文件)
- How to access UI controls from a user defined class (CFormView based SDI application)
- How to extract .dmg further .pkg file from Leopard
- An association from the table User refers to an unmapped class
- org.hibernate.MappingException: An association from the table tuser refers to an unmapped c
- From Source Code to an Executalbe File ----- Learning of Gcc on Linux Operating Systems
- How to manually remove an infected file from your computer
- Use jQuery To Retrieve Data From An XML File
- Security Tutorials系列第十二章:Building an Interface to Select One User Account from Many
- Hibernate学习: 异常 identifier of an instance of com.zhssh.vo.TUser was altered from 1 to 1
- How to manually remove an infected file from your computer
- Recovering an InnoDB table from only an .ibd file
- linux extract a file from a rpm, ko, deb file
- Alert: Querying v$asm_file Gives ORA-15196 After ASM Was Upgraded From 10gR2 To 11gR2 with an AU size > 1M [ID 1145365.1]