Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK-VQSR issue

0

12 comments

  • Avatar
    Qianru Lenus SUN

    Following the error above, I run the below (as firstly with error, showing samtools index, then I used the indexed .bam.bai): 

    gatk VariantAnnotator \
    -R /path/Homo_sapiens_assembly38.fasta \
    -I /path/A2.sorted.markdup.bam.bai \
    -V /path/A2.hc.vcf.gz \
    -O /path/A2_VariantAnnotator.vcf \
    -A Coverage \
    --dbsnp /path/Homo_sapiens_assembly38.dbsnp138.vcf \
    1>/path/A2_VariantAnnotatorlog1.recal 2>&1 && echo "VariantAnnotator Done!" 
    The error in the log file showed:
    but all the reference.fa was from gatk bundle. could you share your ideas?m thanks so much
    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Hi dear developer, may I have your advice? Thanks so much!

    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Gökalp Çelik 

    Dear developer, may I have your idea on it? Thanks!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Qianru Lenus SUN

    You provided the bai index as Input bam file to VariantAnnotator tool. Can you fix that input and try again?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Gökalp Çelik Thanks! It worked after switching to the gatk BQSR recalibrated .bam. But may I know how long the gatk VariantAnnotator would run? seems there is no info online, and so far, it has lasted for over 7 hours with the default code. 

    Thank you so much!

    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Gökalp Çelik

    Genevieve Brandt (she/her)

    Dear developers, I have completed the VariantAnnotator successfully after running 23 hours, with generating the sample_VariantAnnotator.vcf.

    Next, the same error again from the beginning post, that: "Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.''. 

    After reviewing others' similar posts, I have confidence that the gatk VariantRecalibrator has processed meaningfully mounts of reads, not 0 reads like from others " —Processed 52180315 total variants" (in my log). Could you help to figure it out> Thanks so much!!!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    You may want to check which annotations needed to be added for your analysis by checking your original VCF files and use VariantAnnotator to add those at once. Below are all the annotations that VariantAnnotator can add if specified. QualByDepth is the one that you are looking for, for this error message. On the other hand if you are not interested in QD annotation and recalibration based on QD you may remove that from the parameters of VariantRecalibrator. 

    --annotation,-A <String>      One or more specific annotations to add to variant calls  This argument may be specified 0
                                  or more times. Default value: null. Possible values: {AlleleFraction, AllelePseudoDepth,
                                  AS_BaseQualityRankSumTest, AS_FisherStrand, AS_InbreedingCoeff,
                                  AS_MappingQualityRankSumTest, AS_QualByDepth, AS_ReadPosRankSumTest, AS_RMSMappingQuality,
                                  AS_StrandBiasMutectAnnotation, AS_StrandOddsRatio, AssemblyComplexity, BaseQuality,
                                  BaseQualityHistogram, BaseQualityRankSumTest, ChromosomeCounts, ClippingRankSumTest,
                                  CountNs, Coverage, CycleSkipStatus, DepthPerAlleleBySample, DepthPerSampleHC, ExcessHet,
                                  FisherStrand, FragmentDepthPerAlleleBySample, FragmentLength, GcContent,
                                  GenotypeSummaries, HaplotypeFilteringAnnotation, HmerIndelLength, HmerIndelNuc,
                                  HmerMotifs, InbreedingCoeff, IndelClassify, IndelLength, LikelihoodRankSumTest,
                                  MappingQuality, MappingQualityRankSumTest, MappingQualityZero, OrientationBiasReadCounts,
                                  OriginalAlignment, PossibleDeNovo, QualByDepth, RawGtCount, ReadPosition,
                                  ReadPosRankSumTest, ReferenceBases, RMSMappingQuality, SampleList, StrandBiasBySample,
                                  StrandOddsRatio, TandemRepeat, TransmittedSingleton, UniqueAltReadCount, VariantType}

    Regards.

    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Gökalp Çelik Dear Celik, 

    Thank you so much! I did check my sample.hc.vcf.gz (generated by gatk HaplotypeCaller), showing as below (i just screenshotted the first 3 pages), all the "##FILTER=<ID=xxx" indicating the annotation string types i need to add in the VariantAnnotator? Thank you so much!

    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Gökalp Çelik thanks! I have added the different types in one go (as previously I found in the VariantRecalibrator step, if i remove each annotation, e.g. QD, then the next error will be MQ, following the order in the code, thus I added so many)

    I am trying to add them parallelly like the code below, as found cannot be added: -A Coverage,QualByDepth,MappingQuality,MappingQualityRankSumTest,ReadPosRankSumTest,FisherStrand,StrandOddsRatio,DepthPerAlleleBySample,DepthPerSampleHC \ it would showing unrecognized name

     
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    The problem is now more evident. You are trying to perform VQSR on a non-genotyped GVCF. this file must be genotyped using GenotypeGVCFs and later you can perform VQSR.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Qianru Lenus SUN

    Gökalp Çelik 

    Thank you so much! The issue got fixed!  While, the error about R and ggplot2 and other dependencies version (I'm not sure) as shown as below, although with a .pdf generated, not sure if it's completed.

    I've refer to this website: https://github.com/broadinstitute/gatk/issues/8664, which you mentioned previously about the packages needed and older versions. While, it still showed the error.Thanks!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Qianru Lenus SUN

    If it all came to the R script part that means your calibration is complete. Error message indicates that you are using a more recent version of the R environment we require therefore there is an incompatibility between newer libraries and our R script. This will be fixed in our next or next next release hopefully.

    In the meantime you may fix that error in the script and re-run R script to generate your plots if they are not generated properly. Or you may use our conda environment R 3.6 and rerun the unmodified R script to get those plots properly. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk