Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Differences between -ERC GVCF and -ERC BP_RESOLUTION in chr7:142458451 region from HaplotypeCaller GVCF.

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi 차주영

    Using BP_RESOLUTION will produce output per base as the parameter implies and regardless of any variant site found in the loci --include-non-variant-sites will produce every single base as a entry in the VCF file. 

    However in GVCF mode that particular site is kept inside a long stretch of reference block therefore genotyping that long reference block will not return any single nucleotide as a HOMREF and you will not be able to get that entry in the VCF.

    Regardless, looking at the counts at that position, it is almost a HOMREF or can even be a NO_CALL if there is not enough evidence. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    차주영

    Thank you for your response. However, what I'm curious about is slightly different from your explanation.

    What I'm wondering is, when analyzing the region chr7:142458451 without any variants:

    1. In the gVCF generated with -ERC BP_RESOLUTION, the information for the region chr7:142458451 is present, and in the resulting VCF file from genotypeGVCF --include-non-variant-sites, there is a non-variant result as HOMREF. On the other hand,
    2. In the gVCF generated with -ERC GVCF, the information for the region chr7:142458451 is not even present within the reference block, so in the resulting VCF file from genotypeGVCF --include-non-variant-sites, there is no result for that region at all.

    Whether it's -ERC BP_RESOLUTION or -ERC GVCF, if genotypeGVCF --include-non-variant-sites is applied, there should be a non-variant result as HOMEREF in the VCF. However, I'm confused why the results differ between the two approaches. Could you please provide more details on this?

    The reason stated for recommending -ERC GVCF is solely due to server resources and file size. However, if the analysis results differ, would it be correct to assume that the accuracy of the analysis varies? Additionally, could you please provide guidance on determining which option yields higher accuracy?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. Looks like you might have hit a bug present in earlier versions of the tool. Looking at the output of 4.5 here -ERC GVCF also produces HOM_REF sites for each nucleotide position. In this case we recommend you to upgrade your workflow to the latest version. 

    For the other question you have BP_RESOLUTION would be the most accurate since it keeps depth information individually for each site. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk