fasta多行文件处理

1.创建fa文件,如下,命令为1.fa
>SOX2    
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1    
CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT
CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG    
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGG
ACGAGGGACGCATCGGACGACTGCAGGACTGTC
ACGAGGGACGCATCGGACGACTGCAGGACTGT

2.给>开头的行的行尾加上TAB键,以便隔开名字和序列,

sed 's/^\(>.*\)/\1\t/' test.fasta | cat -A   > 2.fa(cat -A可以显示所有的符号)  ###  \(\)表示记录匹配的内容,\1则表示()中记录的匹配的内容(没怎么看懂这个命令)

结果如下:

>SOX2^I$
ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
ACGAGGGACGCATCGGACGACTGCAGGAC$
>POU5F1^I$
CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT$
CGGAAGGTAGTCGTCAGTGCAGCGAGTCC$
>NANOG^I$
ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
ACGAGGGACGCATCGGACGACTGCAGG$
ACGAGGGACGCATCGGACGACTGCAGGACTGTC$

3.把所有的换行符替换为空格,tr

cat 2.fa  | tr '\n'   ' ' > 3.fa

>SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGAC >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT CGGAAGGTAGTCGTCAGTGCAGCGAGTCC >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

 

4.把最后一个空格替换成换行符

sed -e 's/ $/\n/' 3.fa > 4.fa

5.把‘ >’替换成换行符(空格+>)

sed -e 's/ >/\n>/g' 4.fa  > 5.fa

>SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT  CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

6.把所有的空格替换掉

sed 's/  //g' 5.fa > 6.fa

>SOX2    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

7.把TAB键转换为换行符

sed 's/\t/\n/g'  6.fa > 7.fa

>SOX2
ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
>POU5F1
CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
>NANOG
ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

 

posted @ 2017-10-23 10:13  遗世独立的愚公  阅读(1418)  评论(0)    收藏  举报