Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

LeftAlignIndels Alignments added out of order... Offending records are at [chr7:55268881] and [chr7:55268881]

0

5 comments

  • Avatar
    SkyWarrior

    Hi James Covino

    I am not sure if you could solve your issue but let me ask. Have you observed any weird reads within bedtools intersect output via IGV or just samtools view command ?

    Would you be able to share any images of the region? 

    Also I noticed that you are trying to run these tools on a Apple Silicon Mac therefore you are receiving many error messages with response to library incompatibilities. Can you try running these in a compatible x86_64 compute environment and see if the issue still persists?

    Regards 

    0
    Comment actions Permalink
  • Avatar
    James Covino

    Hi @SkyWarrior,  
     
    Thanks for the reply.  
     
    I ran this on my personal Mac (Silicon) but we saw and duplicated this error on our deployed instance (Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz).   
     
    I'm attaching a sam with two alignments that result in the error from LeftAlignIndels. It appears that the first alignment is simplified into 56S74M, but with a starting coordinate of 55268882. When the next alignment is processed and written to the output, the coordinate-sorted order is no longer preserved and the processing crashes with the aforementioned error.  LeftAlignIndels assumes that if the input is coordinate-sorted, the output will be as well. 
     
     
    @HD  VN:1.6  SO:coordinate  
    @SQ  SN:chr7  LN:159138663  
    @RG  ID:Ion_V2_BC15_rawlib.basecaller  SM:Ion_V2_BC15_rawlib.basecaller  
    CCAAATAACATGTCTTCT_molbar_1  2064  chr7  55268881  60  56S2M4D2M3I67M  *  0  0  TGATCATCGAATTCTCCAAAATGGCCCGAGACCCCCAGCGCTACCTTGTCATTTAGGGGATGAAAGAATGCATTTGCCAAGTCCTACAGACTCCAACTTCTACCGTGCCCTGATGGATGAAGAAGACATG  <<>>>==>>9?7///)/);;;<7=8>==>>>1>>B>>>;<<<<7<6==<<<7@<8*9988=7+780399///19997=7===8<=<;;=<<<8<9==9??==9====8A===@=9=?@B??<7=;;:;<=  SA:Z:chr7,55268051,-,56M74S,60,*;  AS:i:73  YX:i:41  RG:Z:Ion_V2_BC15_rawlib.basecaller  NM:i:7  MD:Z:2^GGAT69  
    CCAACAGCTGGCAAATGC_molbar_1  2064  chr7  55268881  60  64S24M  *  0  0  CCGTGAGTTGATCATCGAATTCTCCAAAATGGCCCGAGACCCCCAGCGCTACCTTGTCATTCAGGGGGATGAAAGAATGCATTTGCCA  8=====<7<=>>===<=9<7<<<68)77777<6=>===<0;;;;;====>=9=9==>=>9>=<1<<<<<=<7=<<9>>;?89@@<9=9  SA:Z:chr7,55268043,-,64M24S,60,*;  HI:i:1  AS:i:88  RG:Z:Ion_V2_BC15_rawlib.basecaller  NM:i:0  MD:Z:24  
     
    We are currently using STAR 2.7.9a to generate this alignment, and perhaps a better alignment will be generated with a more recent version of STAR, which we will try next. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi James Covino

    We were able to replicate the issue and it looks like there is a conflict within the conditional statements within the actual method (Looks like this is a previously uncaught edge case) that checks the read sorting order when reads are fed with coordinate sorted order. Issue does not happen when reads are fed with queryname order therefore the only solution until a fix arrives will be to use queryname sorted bam files to left align indels. Until we have a PR ready we recommend this quick solution for your use case. 

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    James Covino

    Hi GATK team and Gökalp Çelik ,

    We are interested in the fix to this leftalignindels edge case .
    Do you have a github PR or issue that we can follow?

     

    Best,

    James

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi James Covino

    Sorry for the delay on this issue. We started an issue ticket in the github however the current solution to this problem is feeding reads in queryname sorted way. Our team will look into this matter to see if it is an easy fix or requires more work to get it done. We cannot promise an easy solution to this problem immediately and it may definitely be in the backlog for sometime. If you have a solution to fix it you are welcome to suggest a PR with proper tests and we can certainly review it. 

    https://github.com/broadinstitute/gatk/issues/8975 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk