The problem
You specified -A <some annotation>
in a command line invoking one of the annotation-capable tools (HaplotypeCaller, MuTect2, GenotypeGVCFs and VariantAnnotator), but that annotation did not show up in your output VCF.
Keep in mind that all annotations that are necessary to run our Best Practices are annotated by default, so you should generally not need to request annotations unless you're doing something a bit special.
Why this happens & solutions
There can be several reasons why this happens, depending on the tool, the annotation, and you data. These are the four we see most often; if you encounter another that is not listed here, let us know in the comments.
1. You requested an annotation that cannot be calculated by the tool
For example, you're running Mutect2 but requested an annotation that is specific to HaplotypeCaller. There should be an error message to that effect in the output log. It's not possible to override this; but if you believe the annotation should be available to the tool, let us know in the forum and we'll consider putting in a feature request.
2. You requested an annotation that can only be calculated if an optional input is provided
For example, you're running HaplotypeCaller and you want InbreedingCoefficient, but you didn't specify a pedigree file. There should be an error message to that effect in the output log. The solution is simply to provide the missing input file. Another example: you're running VariantAnnotator and you want to annotate Coverage, but you didn't specify a BAM file. The tool needs to see the read data in order to calculate the annotation, so again, you simply need to provide the BAM file.
3. You requested an annotation that has requirements which are not met by some or all sites
For example, you're looking at RankSumTest annotations, which require heterozygous sites in order to perform the necessary calculations, but you're running on haploid data so you don't have any het sites. There is no workaround; the annotation is not applicable to your data. Another example: you requested InbreedingCoefficient, but your population includes fewer than 10 founder samples, which are required for the annotation calculation. There is no workaround; the annotation is not applicable to your data.
4. You requested an annotation that is already applied by default by the tool you are running
For example, you requested Coverage from HaplotypeCaller, which already annotates this by default. There is currently a bug that causes some default annotations to be dropped from the list if specified on the command line. This will be addressed in an upcoming version. For now the workaround is to check what annotations are applied by default and NOT request them with -A
.
3 comments
Dear,
I am running HaplotypeCaller (Version="4.1.6.0") with the following options :
--annotation StrandBiasBySample
--annotation TandemRepeat
--annotation BaseQuality
I read the annotations in the output VCF files (the variants are annotated with the SB, RPA, RU, STR, and MBQ tags). HaplotypeCaller works perfectly.
Then, I am running GenomicsDBImport (Version="4.1.6.0") followed by GenotypeGVCFs (Version="4.1.6.0").
I notice all the 3 annotations (StrandBiasBySample, TandemRepeat and BaseQuality) are missing in the multi-sample gVCF output file (3 VCF files are merged).
I don't get explanation about this issue yet and after exploring the GATK forum, I don't have a clue. I really would like to report the SB and MBQ tags in the final gVCF file.
Thank you very much for your help,
Best Regard,
v.
Dear, do you have any feedback to share from my previous post ? Merci
Hi V,
You have posted your question on to the blog and not to the forum. So, you should repost this question in the forum to get an answer from the GATK community.
^_^
Please sign in to leave a comment.