Bam sort问题
Bam sort问题
本文作者:Sunny-King
发布时间:2022-07-22
本文链接:https://www.cnblogs.com/Sunny-King/p/Bioinformatics-Bam_sort.html
将两个bam文件合并到一块后发现Tag标签顺序不一致,回顾发现排序软件在排序时会对Tag的顺序做调整,并且不同软件之间也不一致。
一、问题
最近观察Bam文件时偶然发现之前从同一个文件提取出的bam文件,merge之后的Tag出现了顺序不一致的情况,经过排查发现其中一个bam经过了排序之后bam文件中Tag的顺序会与之前不一致。
为了验证是否是由于排序软件造成的,分别尝试了picard和samtools sort对同一个bam排序之后的tag顺序变化。
二、测试过程
1、测试参数
#Samtools
samtools sort test.bam -o test.samtools_sort.bam
# Picard
java ${JAVA_OPTS} -jar ${PICARD} SortSam \
I=test.bam \
O=test.picard_sort.bam \
SORT_ORDER=coordinate
2、结果
从中随机挑出两条序列查看
输入序列
UMI-TAATA-TGCCT-5427_1 177 chr1 5531170 60 93M chr9 115986372 0 CACCAGCAGACACGCGGCTGGACCAGGATTTGAGGCAAGCTGCAGCATTTCCTCCTGGTGCTGTTAGTGGTCTTCCCAGTAAGGAGTCTACAA tlttprmttttpbtrttutusmsUttttrtmtnttttqnrmtstttkttiotptlttttttqtttttttttttltrttuttsqtttrtttsts NM:i:0 MD:Z:93 MC:Z:93M AS:i:93 XS:i:21 RG:Z:lib0225 BD:Z:jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjN BI:Z:llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllN MI:Z:TAATA-TGCCT-5427 RX:Z:TAATA-TGCCT XR:Z:TAATA-TGCCT-0 XZ:Z:1,2
UMI-ATAAT-GTGTT-32_1 145 chr1 3213457 60 29S35M chr7 55086650 0 GCGAGCTAGACGTCCGGGCAGCCCCCGGCGCAGCACGGCTTAGCCTCACCCACCTTAACGCGCT F-FEF=FEEFDEEFBDDFCEECEFEFDDDDDDDjZnmimljmkmmTjnnlmnnmlllmhlklnl NM:i:0 MD:Z:35 MC:Z:93M AS:i:35 XS:i:21 RG:Z:lib0225 SA:Z:chr7,55086703,-,34M30S,60,0; BD:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHEEEEiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiN BI:Z:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjN MI:Z:ATAAT-GTGTT-32 RX:Z:ATAAT-GTGTT XR:Z:ATAAT-GTGTT-0 XZ:Z:1,1
Samtools sort输出序列
UMI-TAATA-TGCCT-5427_1 177 chr1 5531170 60 93M chr9 115986372 0 CACCAGCAGACACGCGGCTGGACCAGGATTTGAGGCAAGCTGCAGCATTTCCTCCTGGTGCTGTTAGTGGTCTTCCCAGTAAGGAGTCTACAA tlttprmttttpbtrttutusmsUttttrtmtnttttqnrmtstttkttiotptlttttttqtttttttttttltrttuttsqtttrtttsts NM:i:0 MD:Z:93 MC:Z:93M AS:i:93 XS:i:21 BD:Z:jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjN BI:Z:llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllN MI:Z:TAATA-TGCCT-5427 RX:Z:TAATA-TGCCT XR:Z:TAATA-TGCCT-0 XZ:Z:1,2 RG:Z:lib0225
UMI-ATAAT-GTGTT-32_1 145 chr1 3213457 60 29S35M chr7 55086650 0 GCGAGCTAGACGTCCGGGCAGCCCCCGGCGCAGCACGGCTTAGCCTCACCCACCTTAACGCGCT F-FEF=FEEFDEEFBDDFCEECEFEFDDDDDDDjZnmimljmkmmTjnnlmnnmlllmhlklnl NM:i:0 MD:Z:35 MC:Z:93M AS:i:35 XS:i:21 SA:Z:chr7,55086703,-,34M30S,60,0; BD:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHEEEEiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiN BI:Z:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjN MI:Z:ATAAT-GTGTT-32 RX:Z:ATAAT-GTGTT XR:Z:ATAAT-GTGTT-0 XZ:Z:1,1 RG:Z:lib0225
Picard SortSam输出序列
UMI-TAATA-TGCCT-5427_1 177 chr1 5531170 60 93M chr9 115986372 0 CACCAGCAGACACGCGGCTGGACCAGGATTTGAGGCAAGCTGCAGCATTTCCTCCTGGTGCTGTTAGTGGTCTTCCCAGTAAGGAGTCTACAA tlttprmttttpbtrttutusmsUttttrtmtnttttqnrmtstttkttiotptlttttttqtttttttttttltrttuttsqtttrtttsts MC:Z:93M BD:Z:jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjN MD:Z:93 RG:Z:lib0225 BI:Z:llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllN MI:Z:TAATA-TGCCT-5427 NM:i:0 XR:Z:TAATA-TGCCT-0 AS:i:93 XS:i:21 RX:Z:TAATA-TGCCT XZ:Z:1,2
UMI-ATAAT-GTGTT-32_1 145 chr1 3213457 60 29S35M chr7 55086650 0 GCGAGCTAGACGTCCGGGCAGCCCCCGGCGCAGCACGGCTTAGCCTCACCCACCTTAACGCGCT F-FEF=FEEFDEEFBDDFCEECEFEFDDDDDDDjZnmimljmkmmTjnnlmnnmlllmhlklnl SA:Z:chr7,55086703,-,34M30S,60,0; MC:Z:93M BD:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHEEEEiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiN MD:Z:35 RG:Z:lib0225 BI:Z:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjN MI:Z:ATAAT-GTGTT-32 NM:i:0 XR:Z:ATAAT-GTGTT-0 AS:i:35 XS:i:21 RX:Z:ATAAT-GTGTT XZ:Z:1,1
三、结论
- Picard SortSam排序之后的Tag顺序与原始bam不一致
- Samtools sort排序之后的Tag顺序与原始bam不一致
- Picard SortSam和samtools sort排序之后的bam文件里不同read中Tag的顺序可能会有差异(有些reads可能会出现特有的Tag造成顺序异常)
本文作者:Sunny-King
本文链接:文章来源于博客园 https://www.cnblogs.com/Sunny-King/p/Bioinformatics-Bam_sort.html
转载要求:欢迎转载,转载之后请务必在文章明显位置标出原文链接和作者
错误修复:如有错误或疑问请联系博主
版权声明:本作品采用署名-非商业使用-禁止演绎 (by-nc-nd)许可协议进行许可
如果本文对您有帮助,请点个赞吧!志同道合的朋友可以点个关注。

浙公网安备 33010602011771号