Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Spanning or overlapping deletions (* allele) Follow

7 comments

  • Avatar
    Joanna Kelley

    I am getting an error with Genome Analysis Toolkit (GATK) v4.1.4.1

    htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 78746: unparsable vcf record with allele *CCCCCCCCCGCCCCTCCCCC, for input source: test.vcf.gz

    How can I solve this error? The start of the line of the vcf is 

    NC_036443.1 78983 . ACCCCCCCCCCCCCCCCCCCGCCCCTCCCCC ACCCCCCCCCGCCCCTCCCCC,*CCCCCCCCCGCCCCTCCCCC,ACCCCCCCGCCCCTCCCCC,*

    0
    Comment actions Permalink
  • Avatar
    Degang Wu

    Representing spanning deletion by * is good in itself, but most of the downstream bioinformatics software I know cannot take care of spanning deletion. Therefore, is there a tool to convert * into an INDEL?

    6
    Comment actions Permalink
  • Avatar
    Sam Khalouei

    Hello, my question is also related to question posted by Degang. I am using gatk4.1.7.0 and I was wondering if there is a flag that could be used to choose between the two VCF formats mentioned in your article (ie. with or without * designation). Thanks.

    1
    Comment actions Permalink
  • Avatar
    Eric Roller

    GATK Team

    The VCF entry shown for position 14 seems problematic. Lian is assigned a GT of 0/1 indicating that for the reference sequence at that position (i.e. GCCCCCACCC) one of his haplotypes is the reference allele, which it is not. I wonder what would be the proper way to allow representing Lian's two variant alleles as separate VCF records? Perhaps we need to use <*> like this:

    14    GCCCCCACCC    G,<*>    1/2
    20 A T,* 1/2

    The entry at position 14 is similarly problematic for Bob

    0
    Comment actions Permalink
  • Avatar
    Jalini Rajapakse

    why some variant caller do not call these spanning deletions when the "Dels =0.25" and instead call it as a heterozygous SNP. ?

    thanks.

    0
    Comment actions Permalink
  • Avatar
    Begonia_pavonina

    As Degang Wu and Sam Khalouei I encounter several difficulties with the spanning or overlapping deletion allele notation (*). It is not recognised by downstream analysis tools, in my case the ANGSD population genetics software. Is there a way to convert them as indel?

    0
    Comment actions Permalink
  • Avatar
    L T

    The effort of the author describing all these terms is unprecedented! Thank you so much for this

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk