Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Spanning or overlapping deletions (* allele) Follow


    Joanna Kelley

    I am getting an error with Genome Analysis Toolkit (GATK) v4.1.4.1

    htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 78746: unparsable vcf record with allele *CCCCCCCCCGCCCCTCCCCC, for input source: test.vcf.gz

    How can I solve this error? The start of the line of the vcf is 


    Degang Wu

    Representing spanning deletion by * is good in itself, but most of the downstream bioinformatics software I know cannot take care of spanning deletion. Therefore, is there a tool to convert * into an INDEL?

    Sam Khalouei

    Hello, my question is also related to question posted by Degang. I am using gatk4.1.7.0 and I was wondering if there is a flag that could be used to choose between the two VCF formats mentioned in your article (ie. with or without * designation). Thanks.

    Eric Roller

    GATK Team

    The VCF entry shown for position 14 seems problematic. Lian is assigned a GT of 0/1 indicating that for the reference sequence at that position (i.e. GCCCCCACCC) one of his haplotypes is the reference allele, which it is not. I wonder what would be the proper way to allow representing Lian's two variant alleles as separate VCF records? Perhaps we need to use <*> like this:

    14    GCCCCCACCC    G,<*>    1/2
    20 A T,* 1/2

    The entry at position 14 is similarly problematic for Bob

