fakit: 一个处理fasta序列的小工具。

断断续续的学了些rust语法,就想着写些简单的东西,以熟悉语法为主。这次主要针对fasta文件的简单处理写了fakit,参数也不多可以利用管道组合使用,主要是复杂的功能我不会,哈哈。

github:

https://github.com/sharkLoc/fakit

install

git clone https://github.com/sharkLoc/fakit.git
cd fakit
cargo b --release
# mv target/release/fakit to anywhere you want 

usage

fakit -h
fqkit: a simple program for fasta file manipulation

Usage: fakit [OPTIONS] [INPUT]

Arguments:
  [INPUT]  input fasta[.gz] file

Options:
  -u, --upper           convert base to uppercase
  -l, --lower           convert base to lowercase
  -w, --length <LEN>    base number of each line, 0 for long single line
  -f, --fake <FAKE>     fasta to fastq and generate fake fastq quality
  -d, --drop <DROP>     drop sequences with length shorter than int
  -c, --convert <CONV>  r for reverse seq, m for match seq
  -s, --summary         simple statistics of fasta file
  -h, --help            Print help information
  -V, --version         Print version information

exeample

test.fa

>s1
GAGATCGGAGAAGATAGTTTTAGGGTTTGAGATTGAGAAGAAGATGAAGAAAATTTATGA
>s2
gactnacntacnncGCACAAACAGGACgatgatgttgatCCGTGTGTGTACGTGAGTTGG
>s3
GAGAGACTCTTCGTAAGACAGTAAGATTGTGAAAGTCA

fakit -u test.ta

>s1
GAGATCGGAGAAGATAGTTTTAGGGTTTGAGATTGAGAAGAAGATGAAGAAAATTTATGA
>s2
GACTNACNTACNNCGCACAAACAGGACGATGATGTTGATCCGTGTGTGTACGTGAGTTGG
>s3
GAGAGACTCTTCGTAAGACAGTAAGATTGTGAAAGTCA

fakit -u test.ta |fakit -w 30

>s1
GAGATCGGAGAAGATAGTTTTAGGGTTTGA
GATTGAGAAGAAGATGAAGAAAATTTATGA
>s2
GACTNACNTACNNCGCACAAACAGGACGAT
GATGTTGATCCGTGTGTGTACGTGAGTTGG
>s3
GAGAGACTCTTCGTAAGACAGTAAGATTGT
GAAAGTCA

fakit -u test.ta |fakit -w 0 |fakit -l

>s1
gagatcggagaagatagttttagggtttgagattgagaagaagatgaagaaaatttatga
>s2
gactnacntacnncgcacaaacaggacgatgatgttgatccgtgtgtgtacgtgagttgg
>s3
gagagactcttcgtaagacagtaagattgtgaaagtca

fakit -u test.ta |fakit -d 50

>s1
GAGATCGGAGAAGATAGTTTTAGGGTTTGAGATTGAGAAGAAGATGAAGAAAATTTATGA
>s2
GACTNACNTACNNCGCACAAACAGGACGATGATGTTGATCCGTGTGTGTACGTGAGTTGG

fakit -c r test.ta

>s1
AGTATTTAAAAGAAGTAGAAGAAGAGTTAGAGTTTGGGATTTTGATAGAAGAGGCTAGAG
>s2
GGTTGAGTGCATGTGTGTGCCtagttgtagtagCAGGACAAACACGcnncatncantcag
>s3
ACTGAAAGTGTTAGAATGACAGAATGCTTCTCAGAGAG

fakit -s test.ta

id      base_A  base_T  base_G  base_C  base_N  GC_Rate seq_Len
s1      24      16      19      1       0       0.33    60
s2      14      14      17      15      0       0.53    60
s3      14      9       10      5       0       0.39    38

fakit -f E test.ta

@s1
GAGATCGGAGAAGATAGTTTTAGGGTTTGAGATTGAGAAGAAGATGAAGAAAATTTATGA
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@s2
gactnacntacnncGCACAAACAGGACgatgatgttgatCCGTGTGTGTACGTGAGTTGG
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@s3
GAGAGACTCTTCGTAAGACAGTAAGATTGTGAAAGTCA
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
posted @ 2022-10-30 20:15  天使不设防  阅读(131)  评论(0编辑  收藏  举报