Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenomeSTRiP ReciprocalOverlapAnnotator Annotation track not sorted

0

8 comments

  • Avatar
    Bhanu Gandham

    Hi Asma Riyaz

     

    Tagging Bob Handsaker to this thread. Bob will be able to help you out with your question.

    0
    Comment actions Permalink
  • Avatar
    Asma Riyaz

    Bhanu Gandham Thank you Bhanu for connecting me to Bob. Could you kindly let me know why was the other question deleted? Though both questions are on GenomeSTRiP ReciprocalOverlapAnnotator, both the errors I get are very different on different samples. This one in particular is about track not being sorted, other one that you deleted was regarding "Non-overlapping coordinates ERROR". Could you please un-delete that thread? I really need to get both sorted. I haven't posted duplicate questions. 

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Did you grep the entire vcf?

    If you can generate a small vcf that reproduces the problem (you can leave out the genotypes) then I can take a look.

    I'm sure the GenomeSTRiP tools will not understand variants like the one at 125905998, where END is referring to a different chromosome. POS and END are presumed to represent an interval on a single reference sequence.

     

    0
    Comment actions Permalink
  • Avatar
    Asma Riyaz

    Here is the VCF file (This is a google drive link). The SV caller I used, called 125905998 as a break end, does GenomeSTRiP not like break ends?

     

     

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Your file isn't sorted in reference sequence order.

    % cat trimmed.cuteSV.sorted.vcf | grep -v ^# | cut -f 1 | sort -u | head -30

    chr1

    chr10

    chr11

    chr12

    chr13

    chr14

    chr14_GL000009v2_random

    chr14_GL000225v1_random

    chr15

    chr16

    chr16_KI270728v1_random

    chr17

    chr17_GL000205v2_random

    chr17_KI270729v1_random

    chr17_KI270730v1_random

    chr17_KI270860v1_alt

    chr17_KI270862v1_alt

    chr18

    chr19

    chr19_KI270866v1_alt

    chr1_KI270709v1_random

    chr1_KI270712v1_random

    chr2

    chr20

    chr21

    chr22

    chr22_KI270732v1_random

    chr22_KI270733v1_random

    chr22_KI270735v1_random

    chr22_KI270736v1_random

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    As I said, the Genome STRiP tools interpret END as being on CHROM. There is no special processing for breakends, so I'm not sure how it will interpret this invalid interval.

    It's also unclear to me what reciprocal overlap means for breakends.

    0
    Comment actions Permalink
  • Avatar
    Asma Riyaz

    Bob Handsaker Despite the tool generating an error, it gave an output. Does the tool tend to ignore these variants and generate output based on the rest?

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    These breakend variants will not be ignored, per se, but as I explained the interval for the breakend will be taken to be CHROM : POS - END. I checked the VCF specification (v4.3) and this appears to be the correct interpretation, so it seems arguable that the END coordinate on those breakend records are incorrect.

    Given that the variant interval will be assumed to be CHROM : POS - END, the next question is how does the ReciprocalOverlap annotator treat intervals where END < POS. The short answer is that (a) there is no error checking (the code currently won't complain, although perhaps it should) and (b) I believe they will be treated like zero-length intervals: They will not be found to overlap with any other variant.

    So this will be the behavior for some breakends, those where END < POS. For others where END > POS, they will be treated like any other variant.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk