Trio de novo variants calling pipeline confuse
Hi,
I'm running trio data analysis but confused about some details. Following the steps in gatk document "Genotype Refinement workflow for germline short variants", I noticed that allele frequency will be used in last step for indentifying de novo mutations. AF < 0.1% must mean external group AF information. However, in my trio vcf file, AF are determined by the family three people. VariantAnnotator can add group AF information but the string is not "AF" but such as "gnomad.AF".
chr1 69511 . A G 9126.73 PASS AC=6;AF=1.00;AN=6;BaseQRankSum=1.25;DP=315;ExcessHet=0.0000;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=44.31;MQRankSum=-2.170e+00;NEGATIVE_TRAIN_SITE;QD=29.63;ReadPosRankSum=1.92;SOR=1.145;VQSLOD=-2.285e+00;culprit=MQ;gnomad.AF=0.871 GT:AD:DP:GQ:JL:JP:PL:PP 1/1:0,129:129:99:127:127:3954,387,0:4014,387,0 1/1:2,83:85:99:127:127:2445,205,0:2505,205,0 1/1:0,94:94:99:127:127:2741,282,0:2861,342,0
What's more, if I use this gnomad.AF contained vcf file for VariantAnnotator -A PossibleDeNovo, no de novo information will be annotated in any sites.
11:48:50.898 INFO ProgressMeter - Starting traversal
11:48:50.899 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
11:48:50.945 INFO PedReader - Phenotype is other? false
11:48:51.139 INFO ProgressMeter - unmapped 0.0 830 207500.0
11:48:51.140 INFO ProgressMeter - Traversal complete. Processed 830 total variants in 0.0 minutes.
Vcf without gnomad.AF can be annotated normally.
Is it because I did wrong in former steps?
Here's my commands:
gatk CalculateGenotypePosteriors -V filtered.vcf -O cgp.vcf -ped ped
--skip-population-priors
gatk VariantFiltration -R $REF -V cgp.vcf -O vf.vcf --genotype-filter-name "GQfilter" --genotype-filter-expression "GQ < 20"
gatk VariantAnnotator -R $REF -V vf.vcf -O va.vcf -A PossibleDeNovo -ped ped
( If annotate group AF information:
gatk VariantAnnotator -R $REF -V vf.vcf -O va1.vcf --resource:gnomad $bundle/af-only-gnomad.hg38.vcf.gz -E gnomad.AF
gatk VariantAnnotator -R $REF -V va1.vcf -O va2.vcf -A PossibleDeNovo -ped ped
)
Any steps before are the same as normal germline calling, applying HaplotypeCaller, CombineGVCFs and GenotypeGVCFs and VQSR. GATK version is 4.5.0.0.
-
Can anyone help me?
-
Hi yx zhang,
Looking at the code, the AF used in the PossibleDeNovo annotation is derived from the genotypes in the VCF: https://github.com/broadinstitute/gatk/blob/264040448aa8b8a580f7f818731c80dc24deea7e/src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/PossibleDeNovo.java#L144. Given that, I don't expect the gnomad.AF annotation to change anything.
Is chr1:69511 a position you care about? I wouldn't expect that position to get annotated because I believe the de novo mutations are required to be heterozygous, i.e. mother and father are 0/0 and offspring is 0/1. After CalculateGenotypePosteriors, only sites where mother and father are 0/0 and offspring is 0/1 have the potentially to be called as de novos.
The log isn't very helpful here. Can you give me the output from the two different pipelines?
-
Hi,
I think I know why no de novo mutants are annotated. Just now I noticed that the vcf file with genomad AF is very small and only contains few chr1 data. It's so few that no de novo mutants are contained, so won't get any de novo annotation. I guess it's related to the error caused in "gatk VariantAnnotator -R $REF -V vf.vcf -O va1.vcf --resource:gnomad $bundle/af-only-gnomad.hg38.vcf.gz -E gnomad.AF". It's listed at the back.[March 12, 2024 at 2:52:44 PM CST] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.07 minutes.
Runtime.totalMemory()=1224736768
java.lang.IllegalStateException: Allele in genotype G not in the variant context [G*, GT]
at htsjdk.variant.variantcontext.VariantContext$Validation.validateGenotypes(VariantContext.java:386)
at htsjdk.variant.variantcontext.VariantContext$Validation$2.validate(VariantContext.java:335)
at htsjdk.variant.variantcontext.VariantContext.lambda$validate$0(VariantContext.java:1421)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1421)
at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:493)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638)
at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.trimAlleles(GATKVariantContextUtils.java:1467)
at org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils.trimAlleles(GATKVariantContextUtils.java:1423)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.getMinRepresentationBiallelics(VariantAnnotatorEngine.java:609)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:550)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.addInfoAnnotations(VariantAnnotatorEngine.java:382)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:363)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:243)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)Furthermore, is "af-only-gnomad.hg38.vcf.gz" file in gatk resource appropriate to be used in gatk VariantAnnotator for group AF? If not, are there any files recommanded? And how can I change AF into group AF so that de novo mutations can be properly identified?
-
Hi yx zhang,
That error would certainly stop VariantAnnotator from completing. Are you running the latest GATK (4.5.0.0)? I remember fixing a very similar htsjdk exception to provide more information like the position because that's not a lot to go on.
The PossibleDeNovo annotation will only use the cohort calculated AF value, which is likely to be higher than the gnomAD AF value. So PossibleDeNovo should work without VariantAnnotator, but adding the gnomAD AF and filtering on that should improve your specificity. If you're not seeing de novos being annotated without gnomAD then gnomAD won't recover them.
Please sign in to leave a comment.
4 comments