LeftAlignIndels Alignments added out of order... Offending records are at [chr7:55268881] and [chr7:55268881]
Hello GATK Community,
We've encountered a unique bam that fails LeftAlignIndels when sorted by coordinate, but passes LeftAlignIndels when sorted by query name. We are unable to determine why this bam is producing an error when sorted by coordinate.
Any help would be greatly appreciated.
The following is a description of the issue observed.
Re-sorting the bam by coordinate with Picard-tools or samtools produces an identical error:
picard SortSam -I bedtools_intersected.bam -O bedtools_intersected.picard.sorted.bam -SO coordinate
gatk LeftAlignIndels -R ~/Source/ruo-analysis/archer-dependencies/hg19.fa -I bedtools_intersected.picard.sorted.bam -O broken.bam
Sorting the bam by query name resolves the error:
picard SortSam -I bedtools_intersected.bam -O bedtools_intersected.picard.sorted.queryname.bam -SO queryname
gatk LeftAlignIndels -R ~/Source/ruo-analysis/archer-dependencies/hg19.fa -I bedtools_intersected.picard.sorted.queryname.bam -O not_broken.bam
picard ValidateSamFile produces and ERROR with the file header:
ERROR::MISSING_PLATFORM_VALUE:Read name Ion_V2_BC15_rawlib.basecaller, A platform (PL) attribute was not found for read group.
This ERROR exists on the "unbroken.bam" as well, that was sorted by queryname and made it through LeftAlignIndels.
We used bedtools intersect (at chr7 55268881 55268881) to create a smaller bam (521 entries) with all entries at the same location. The resulting bam (bedtools_intersected.bam) produces the aforementioned error with LeftAlignIndels when sorted by coordinate.
REQUIRED for all errors and issues:
a) GATK version used: 4.4.0.0
b) Exact command used: gatk LeftAlignIndels -R ~/Source/ruo-analysis/archer-dependencies/hg19.fa -I bedtools_intersected.bam -O broken.bam
c) Entire program log:
-
Hi James Covino
I am not sure if you could solve your issue but let me ask. Have you observed any weird reads within bedtools intersect output via IGV or just samtools view command ?
Would you be able to share any images of the region?
Also I noticed that you are trying to run these tools on a Apple Silicon Mac therefore you are receiving many error messages with response to library incompatibilities. Can you try running these in a compatible x86_64 compute environment and see if the issue still persists?
Regards
-
Hi @SkyWarrior,
Thanks for the reply.
I ran this on my personal Mac (Silicon) but we saw and duplicated this error on our deployed instance (Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz).
I'm attaching a sam with two alignments that result in the error from LeftAlignIndels. It appears that the first alignment is simplified into 56S74M, but with a starting coordinate of 55268882. When the next alignment is processed and written to the output, the coordinate-sorted order is no longer preserved and the processing crashes with the aforementioned error. LeftAlignIndels assumes that if the input is coordinate-sorted, the output will be as well.
@HD VN:1.6 SO:coordinate
@SQ SN:chr7 LN:159138663
@RG ID:Ion_V2_BC15_rawlib.basecaller SM:Ion_V2_BC15_rawlib.basecaller
CCAAATAACATGTCTTCT_molbar_1 2064 chr7 55268881 60 56S2M4D2M3I67M * 0 0 TGATCATCGAATTCTCCAAAATGGCCCGAGACCCCCAGCGCTACCTTGTCATTTAGGGGATGAAAGAATGCATTTGCCAAGTCCTACAGACTCCAACTTCTACCGTGCCCTGATGGATGAAGAAGACATG <<>>>==>>9?7///)/);;;<7=8>==>>>1>>B>>>;<<<<7<6==<<<7@<8*9988=7+780399///19997=7===8<=<;;=<<<8<9==9??==9====8A===@=9=?@B??<7=;;:;<= SA:Z:chr7,55268051,-,56M74S,60,*; AS:i:73 YX:i:41 RG:Z:Ion_V2_BC15_rawlib.basecaller NM:i:7 MD:Z:2^GGAT69
CCAACAGCTGGCAAATGC_molbar_1 2064 chr7 55268881 60 64S24M * 0 0 CCGTGAGTTGATCATCGAATTCTCCAAAATGGCCCGAGACCCCCAGCGCTACCTTGTCATTCAGGGGGATGAAAGAATGCATTTGCCA 8=====<7<=>>===<=9<7<<<68)77777<6=>===<0;;;;;====>=9=9==>=>9>=<1<<<<<=<7=<<9>>;?89@@<9=9 SA:Z:chr7,55268043,-,64M24S,60,*; HI:i:1 AS:i:88 RG:Z:Ion_V2_BC15_rawlib.basecaller NM:i:0 MD:Z:24
We are currently using STAR 2.7.9a to generate this alignment, and perhaps a better alignment will be generated with a more recent version of STAR, which we will try next. -
Hi James Covino
We were able to replicate the issue and it looks like there is a conflict within the conditional statements within the actual method (Looks like this is a previously uncaught edge case) that checks the read sorting order when reads are fed with coordinate sorted order. Issue does not happen when reads are fed with queryname order therefore the only solution until a fix arrives will be to use queryname sorted bam files to left align indels. Until we have a PR ready we recommend this quick solution for your use case.
I hope this helps.
-
Hi GATK team and Gökalp Çelik ,
We are interested in the fix to this leftalignindels edge case .
Do you have a github PR or issue that we can follow?Best,
James
-
Hi James Covino
Sorry for the delay on this issue. We started an issue ticket in the github however the current solution to this problem is feeding reads in queryname sorted way. Our team will look into this matter to see if it is an easy fix or requires more work to get it done. We cannot promise an easy solution to this problem immediately and it may definitely be in the backlog for sometime. If you have a solution to fix it you are welcome to suggest a PR with proper tests and we can certainly review it.
https://github.com/broadinstitute/gatk/issues/8975
Regards.
Please sign in to leave a comment.
5 comments