Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

--output-mode EMIT_ALL_CONFIDENT_SITES lists variants with QUAL 0

0

10 comments

  • Avatar
    danilovkiri

    Hi DHWANI DHOLAKIA

    First of all, I hope you do not try to infer genotypes from the GVCF produced by HC in ERC mode. It is not correct. The correct way is to pass the GVCF to GenotypeGVCFs tool to obtain a VCF with all the metrics you want (read the documentation on GenotypeGVCFs for reference). 

    Now about your questions. The non-zero QUAL value, as you might have deduced, is calculated and assigned only in case the probabilistic model decided there is enough evidence for non-homozygous reference genotype to be placed. In other words, when there is enough evidence for an ALT allele presence, the genotype assigned is 0/1 or 1/1 (or any other depending on specified/default ploidy) and the QUAL value is calculated. For sites where NO reads with ALT alleles were found, the genotype is 0/0 and the QUAL value is set to `.` (unspecified). The QUAL value of zero can be found when an ALT allele is supported by a number of reads which do not result in HC decision to call ALT allele.

    After you pass this GVCF to GenotypeGVCFs, you'll get a VCF with all QUAL scores recalculated, there will be no zero-QUALity genotypes (only unspecified might occur).

    As for the FILTER column, how do you suppose HC should FILTER the genotypes? It does not have criteria (at least that I am aware of) for that. FILTER column is populated by FILTERing tools (like VariantRecalibrator + ApplyVQSR and many others) depending on your goal, criteria and databases used for filtering. VQSR may be applied to a genotyped VCF file and not a GVCF. Note that VQSR operates only on variant sites. If you have homozygous reference genotypes in your final VCF, they will not be processed by VQSR and the FILTER column will remain unspecified.

    0
    Comment actions Permalink
  • Avatar
    DHWANI DHOLAKIA

    danilovkiri

    Thanks a lot for the information. Can i run HC to produce VCF by setting -ERC NONE.

    java -jar ${gatk_jar_name} HaplotypeCaller -R ${base_path}/${ref_file_name}.fasta -I ${base_path}/base_recalib/$i\_vqsr.bam -O ${base_path}/gatk4/haplotype_caller_gatk4/$i\_haplotype_4.vcf --output-mode EMIT_ALL_CONFIDENT_SITES -ERC NONE --native-pair-hmm-threads 8 --dbsnp ${base_path}/${vcf_file_name} -L ${base_path}/${bed_file_name}

     

    And if yes, What is the difference that I can expect in results?

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    You can run HC without specifying ERC mode to produce ready VCF. The difference is that the output VCF file will contain only ALT-containing entries. You will not find any sites where variation was not found even if there's evidence for homozygous reference genotype. Choosing ERC or noERC depends on the purpose of your analysis. Do you want to find only variant sites? Or do you need to find homozygous reference sites as well? Answering these questions given the context of the task is necessary to choose the ERC or noERC pipeline.

    As for the command line you have provided above, it is OK, the -ERC NONE is the default behaviour so you might not specify this argument at all unless you want to explicitly state it for the purpose of code annotation.

    0
    Comment actions Permalink
  • Avatar
    DHWANI DHOLAKIA

    danilovkiri

    I got my query solved in the given below link. https://gatk.broadinstitute.org/hc/en-us/community/posts/360067428552-Large-number-of-variants-called-using-haplotypecaller. Thanks.

     

    Is there any such option of -ERC and genotypeGVCF in GATK38 also, so that I can get information on homozygous calls in the VCF?

    I agree that GATK3.8 is not supported by GATK, but I have done my analysis previously on GATK38 and your suggestion would help me a lot.

     

    Thanks

     

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Yes, GATK3.8 supports ERC modes in the exact same manner, the documentation (archived) can be found at https://github.com/broadinstitute/gatk-docs/tree/master/gatk3-tooldocs/3.8-0

    The file you are looking for is https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.html

    You have to download it and open in your browser to be able to view it conveniently. You can also view the raw HTML code, specifically line 660, to find the ERC info.

    Please note that the CLI syntax is a subject of change when referring to different GATK major releases so be careful. 

    0
    Comment actions Permalink
  • Avatar
    DHWANI DHOLAKIA

    danilovkiri Is ERC modes also available for Unified genotyper in GATK3.8?

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    DHWANI DHOLAKIA

    UnifiedGenotyper documentation is also available at the link in my previous comment. The exact file is here (https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.html). It does not have an ERC option, though it does have `--output_mode` argument (as well as HC), which is not exactly what you seek as I guess. However, `--output_mode EMIT_ALL_CONFIDENT_SITES` might do the trick, but I still advise you not to use UnifiedGenotyper as it was dropped in favour of HaplotypeCaller for many reasons. If one cares about the quality of genotyping data, one will choose HC.

    1
    Comment actions Permalink
  • Avatar
    DHWANI DHOLAKIA

    danilovkiri Thanks a lot for clearing all my queries. It would be really helpful if you can send a link where I can also compare the results or methods of GATK3.8 and GATK4 HC. 

    I mean what advantage will one have if they use GATK4 pipeline and HC compared to GATK3.8 pipeline and HC

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Unfortunately, there is little information available on this topic. Googling helps understand some aspects, but not the bigger picture. In a nutshell, there have been some improvements in performance and correct processing of ambiguous regions for some tools including HC. Perhaps, the GATK Team might elaborate.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi DHWANI DHOLAKIA

     

    Unfortunately we do not support or provide any comparisons to any of the GATK3 tools.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk