Trio Denovo annotation using VariantAnnotator PossibleDenovo not working?
AnsweredHi,
I've tried to annotate denovo variants in my TRIO samples (Mother, Father, Proband) vcf using VariantAnnotator PossibleDenovo. However, none of my variants were annotated with either hiConfDeNovo or loConfDeNovo.
I manually checked some of the variants, and they should pass the crieteria for either hiConfDeNovo or loConfDeNovo variants.
What could went wrong here, and how can I fix it? Thanks in advance !!!
Below are the command I used to merge the gvcf and run VariantAnnotator PossibleDenovo
1) Merge gvcfs (called by GATK haplotypecaller-done by my colleague)
(gatk) root@2dc5158253fb:/gatk/my_data# gatk CombineGVCFs -R Homo_sapiens_assembly38.fasta --variant GGSpi006.g.vcf.gz --variant GGSpi007.g.vcf.gz --variant GGSpi005.g.vcf.gz -O GGSpi005.cohort.GATKpipeline.g.vcf.gz
2) The genotypes were missing after I merged the gvcfs, so after reading some guide on GATK forum, I perform joint genotyping
(gatk) root@2dc5158253fb:/gatk/my_data# gatk GenotypeGVCFs -R Homo_sapiens_assembly38.fasta -V GGSpi005.cohort.GATKpipeline.g.vcf.gz -O GGSpi005.cohort.GATKpipeline.combineGT.g.vcf.gz
3) Ran VariantAnnotator
(gatk) root@2dc5158253fb:/gatk/my_data# gatk VariantAnnotator -A PossibleDeNovo -R Homo_sapiens_assembly38.fasta --pedigree GGSpi005.ped -V GGSpi005.cohort.GATKpipeline.combineGT.g.vcf.gz -O GGSpi005.cohort.GATKpipeline.combineGT.possibledenovo.g.vcf.gz
Logs from VariantAnnotator:
08:09:36.135 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 02, 2021 8:09:37 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:09:37.841 INFO VariantAnnotator - ------------------------------------------------------------
08:09:37.841 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.1.3.0
08:09:37.842 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
08:09:37.842 INFO VariantAnnotator - Executing as root@2dc5158253fb on Linux v4.15.0-128-generic amd64
08:09:37.842 INFO VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
08:09:37.842 INFO VariantAnnotator - Start Date/Time: September 2, 2021 8:09:36 AM UTC
08:09:37.842 INFO VariantAnnotator - ------------------------------------------------------------
08:09:37.842 INFO VariantAnnotator - ------------------------------------------------------------
08:09:37.843 INFO VariantAnnotator - HTSJDK Version: 2.20.1
08:09:37.843 INFO VariantAnnotator - Picard Version: 2.20.5
08:09:37.843 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:09:37.843 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:09:37.843 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:09:37.843 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:09:37.843 INFO VariantAnnotator - Deflater: IntelDeflater
08:09:37.843 INFO VariantAnnotator - Inflater: IntelInflater
08:09:37.843 INFO VariantAnnotator - GCS max retries/reopens: 20
08:09:37.843 INFO VariantAnnotator - Requester pays: disabled
08:09:37.844 WARN VariantAnnotator -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: VariantAnnotator is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
08:09:37.844 INFO VariantAnnotator - Initializing engine
08:09:38.286 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/GGSpi005.cohort.GATKpipeline.combineGT.g.vcf.gz
08:09:38.647 INFO VariantAnnotator - Done initializing engine
08:09:38.735 INFO ProgressMeter - Starting traversal
08:09:38.735 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:09:39.031 INFO PedReader - Reading PED file GGSpi005.ped with missing fields: []
08:09:39.033 INFO PedReader - Phenotype is other? false
08:09:48.749 INFO ProgressMeter - chr1:80703442 0.2 185000 1108558.9
08:09:58.782 INFO ProgressMeter - chr1:182224598 0.3 396000 1185273.9
08:10:08.787 INFO ProgressMeter - chr2:16645232 0.5 599000 1195927.1
08:10:18.835 INFO ProgressMeter - chr2:107532026 0.7 815000 1219451.4
08:10:28.869 INFO ProgressMeter - chr2:213961585 0.8 1043000 1248254.7
08:10:38.896 INFO ProgressMeter - chr3:70258454 1.0 1279000 1275577.2
08:10:48.922 INFO ProgressMeter - chr3:157701073 1.2 1485000 1269465.9
08:10:58.955 INFO ProgressMeter - chr4:43128759 1.3 1695000 1267779.5
08:11:08.979 INFO ProgressMeter - chr4:138932992 1.5 1926000 1280528.3
08:11:18.997 INFO ProgressMeter - chr5:36332061 1.7 2158000 1291416.5
08:11:29.007 INFO ProgressMeter - chr5:152626381 1.8 2402000 1306950.1
08:11:39.033 INFO ProgressMeter - chr6:51516372 2.0 2634000 1313737.6
08:11:49.046 INFO ProgressMeter - chr6:151071223 2.2 2853000 1313636.7
08:11:59.047 INFO ProgressMeter - chr7:56735293 2.3 3060000 1308512.5
08:12:09.077 INFO ProgressMeter - chr8:343368 2.5 3307000 1319790.9
08:12:19.094 INFO ProgressMeter - chr8:103320886 2.7 3556000 1330514.7
08:12:29.115 INFO ProgressMeter - chr9:42375718 2.8 3774000 1329029.2
08:12:39.134 INFO ProgressMeter - chr10:17793490 3.0 4016000 1335705.9
08:12:49.136 INFO ProgressMeter - chr10:112786823 3.2 4256000 1341169.4
08:12:59.172 INFO ProgressMeter - chr11:71343184 3.3 4499000 1346757.3
08:13:09.192 INFO ProgressMeter - chr12:32329105 3.5 4740000 1351351.4
08:13:19.200 INFO ProgressMeter - chr13:16124482 3.7 4979000 1355045.0
08:13:29.228 INFO ProgressMeter - chr13:109140124 3.8 5225000 1360128.1
08:13:39.261 INFO ProgressMeter - chr15:23086088 4.0 5468000 1364016.2
08:13:49.293 INFO ProgressMeter - chr16:17065339 4.2 5715000 1368550.9
08:13:59.313 INFO ProgressMeter - chr17:32971285 4.3 5959000 1372103.6
08:14:09.322 INFO ProgressMeter - chr18:56553291 4.5 6202000 1375237.4
08:14:19.355 INFO ProgressMeter - chr20:4817194 4.7 6447000 1378447.7
08:14:29.358 INFO ProgressMeter - chr21:31112171 4.8 6692000 1381583.7
08:14:39.391 INFO ProgressMeter - chrX:56103925 5.0 6939000 1384772.0
08:14:45.800 INFO ProgressMeter - chrY:56863546 5.1 7089754 1385330.9
08:14:45.800 INFO ProgressMeter - Traversal complete. Processed 7089754 total variants in 5.1 minutes.
08:14:45.966 INFO VariantAnnotator - Shutting down engine
[September 2, 2021 8:14:45 AM UTC] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 5.16 minutes.
Runtime.totalMemory()=1352138752
GATK was run in a docker container:
docker run -v ~/my_project:/gatk/my_data -it broadinstitute/gatk:4.1.3.0
Pedigree file:
FAM1 GGSpi006 0 0 1 1
FAM2 GGSpi007 0 0 2 1
FAM1 GGSpi005 GGSpi006 GGspi007 1 2
Some relevant header in the output header:
##GATKCommandLine=<ID=CombineGVCFs,CommandLine="CombineGVCFs --output GGSpi005.cohort.GATKpipeline.g.vcf.gz --variant GGSpi006.g.vcf.gz --variant GGSpi007.g.vcf.gz --variant GGSpi005.g.vcf.gz --reference Homo_sapiens_assembly38.fasta --convert-to-base-pair-resolution false --break-bands-at-multiples-of 0 --input
##GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output GGSpi005.cohort.GATKpipeline.combineGT.g.vcf.gz --variant GGSpi005.cohort.GATKpipeline.g.vcf.gz --reference Homo_sapiens_assembly38.fasta --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --gvcf-gq-bands 10 --gvcf-gq-bands 20 --gvcf-gq-bands 30 --gvcf-gq-bands 40 --gvcf-gq-bands 50 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --use-new-qual-calculator true --contamination-fr
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AS_InbreedingCoeff,Number=A,Type=Float,Description="allele specific heterozygosity as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation; relate to inbreeding coefficient">
##INFO=<ID=AS_QD,Number=A,Type=Float,Description="Allele-specific Variant Confidence/Quality by Depth">
##INFO=<ID=AS_RAW_BaseQRankSum,Number=1,Type=String,Description="raw data for allele specific rank sum test of base qualities">
##INFO=<ID=AS_RAW_MQ,Number=1,Type=String,Description="Allele-specfic raw data for RMS Mapping Quality">
##INFO=<ID=AS_RAW_MQRankSum,Number=1,Type=String,Description="Allele-specfic raw data for Mapping Quality Rank Sum">
##INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##INFO=<ID=hiConfDeNovo,Number=1,Type=String,Description="High confidence possible de novo mutation (GQ >= 20 for all trio members)=[comma-delimited list of child samples]">
##INFO=<ID=loConfDeNovo,Number=1,Type=String,Description="Low confidence possible de novo mutation (GQ >= 10 for child, GQ > 0 for parents)=[comma-delimited list of child samples]">
##reference=file:///gatk/my_data/Homo_sapiens_assembly38.fasta
##source=CombineGVCFs
##source=GenotypeGVCFs
##source=HaplotypeCaller
Some output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GGSpi005 GGSpi006 GGSpi007
chr1 72078 . A C 76.50 . AC=2;AF=0.333;AN=6;DP=16;ExcessHet=0.4576;FS=0.000;MLEAC=1;MLEAF=0.167;MQ=29.00;QD=29.56;SOR=2.303 GT:AD:DP:GQ:PGT:PID:PL 0/0:7,0:7:21:.:.:0,21,203 0/0:7,0:7:21:.:.:0,21,223 1/1:0,2:2:6:1|1:72078_A_C:90,6,0
chr1 85222 . A T 35.48 . AC=2;AF=0.333;AN=6;DP=23;ExcessHet=0.4576;FS=0.000;MLEAC=1;MLEAF=0.167;MQ=21.00;QD=17.74;SOR=2.303 GT:AD:DP:GQ:PL 0/0:8,0:8:21:0,21,248 0/0:13,0:13:36:0,36,452 1/1:0,2:2:6:49,6,0
chr1 183598 . C G 57.58 . AC=1;AF=0.250;AN=4;BaseQRankSum=1.58;DP=122;ExcessHet=3.0103;FS=3.444;MLEAC=1;MLEAF=0.250;MQ=40.04;MQRankSum=-2.817e+00;QD=0.89;ReadPosRankSum=1.22;SOR=1.477 GT:AD:DP:GQ:PL 0/1:57,8:65:65:65,0,1604 0/0:57,0:57:32:0,32,1533 ./.:0,0:0:.:0,0,0
-
Hi Jan,
I apologize for the delay in getting back to you. I was attending a conference.
Can you please try the latest version of GATK which is v4.2.2.0. The version you are using is quite old at this point. So just updating your version might solve this issue.
But if you are still not seeing any variants annotated as hi/loconfdenovo even with the updated version, then please share a few variant records from your input vcf that follow the rules for those annotation. The rules are provided in the header — ##INFO=
= 20 for all trio members)=[comma-delimited list of child samples]">
##INFO== 10 for child, GQ > 0 for parents)=[comma-delimited list of child samples]">
It would be helpful if you could provide some variant records from the input that follow those rules and the same set of the variant records in the VariantAnnottator output vcf. This will help me debug the issue.
— Bhanu
Please sign in to leave a comment.
1 comment