Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

how to get rid of haplotypecaller variants caused by weird artificial haplotypes

0

5 comments

  • 0
    Comment actions Permalink
  • Avatar
    Kelly Rabionet

    Hi Bhanu,

    we have gone through these recommendations several times, we have also increased some of the thresholds to try to remove these variants, but, when considering all samples, even if they represent only 0.15 of reads in just one sample, these variant positions pass all these filters. When we try to filter by the format field by sample, we get an awfully formatted vcf, and in addition, this yields basically no variants pass (neither these, nor any other).

    We have then gone back to the generategvcf step, where we have modifled the -min-pruning, but I am worried that this might remove real variants:

    $GATK --java-options "-Xmx28G" HaplotypeCaller -R $REF -I !{READ} -O !{NAME}.gvcf.gz --dbsnp $DBSNP -ERC GVCF -L $EXOME -ip 100 --min-pruning 10 --bamout !{NAME}.haplotypecaller.bam

    This seems to get rid of several of the weird calls that we observed, but not all, and I still am not convinced this might be the better option.

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Kelly Rabionet

     

    I discussed with our dev team and we realize this is something to fix in our code. The reason we see this issue is because phased variants are assigned higher likelihoods since they are more different from the reference. Thank you for bringing this up. We will work on fixing this in the future. At this time I cannot guarantee how long it will take us to fix this issue.

     

    A workaround in the meantime would be to use picard filtervcf with argument --MIN_AB 0.2 to filter variants with lower than 0.2 or 0.15 allele balance.

    0
    Comment actions Permalink
  • Avatar
    Estefanía Alcaide

    Hi Bhanu,

    I am working with Kelly on this sequencing data and despite it has been a long time since this post, we are still having the same problem.

    We decided to call variants with HaplotypeCaller (GATK v4.1.6) and modified the -min-pruning:

    $GATK --java-options "-Xmx28G" HaplotypeCaller -R $REF -I !{READ} -O !{NAME}.gvcf.gz --dbsnp $DBSNP -ERC GVCF -L $EXOME -ip 100 --min-pruning 10 --bamout !{NAME}.haplotypecaller.bam

    At first, this filter allowed to eliminate a lot of false large insertions or deletions. However, some unreal variants remained in our final file.

    We have proved HaplotypeCaller v4.2.0 to try to solve this issue, but there is no change in comparison with v4.1.6. We do not know how to avoid these variants and we would appreciate any ideas.

    Thanks,

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Estefanía Alcaide,

    I am another member of the GATK team and can follow up on this for Bhanu. 

    After your HaplotypeCaller command, are you running any filtering steps and are those filtering steps eliminating these insertions and deletions? Could you send your false positive and sensitivity rate after filtering so that we can compare with what we would expect?

    You should be able to get rid of these insertions and deletions in post-HaplotypeCaller steps. HaplotypeCaller is meant to be very sensitive to allow for haplotype discovery, and our downstream filtering steps determine if the found haplotypes are real or not. Again, we recommend allele balance filtering.

    You can also look into our filtering method CNNScoreVariants, which could work well for your data and is pre-trained.

    Hope this helps,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk