Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK Variant Filtration undefined variable

0

7 comments

  • Avatar
    Pamela Bretscher

    Hi Krithika_Subramanian,

    These messages are most likely occurring at locations in your file where there is no value for ReadPosRankSum or MQRankSum. However, these are just warning messages stating that VariantFiltration won't be able to filter for those annotations at sites where they aren't present. These aren't actual error messages and it seems that the tool is still finishing properly, so you should be able to ignore the warnings. Please let me know if this helps answer your question.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Krithika_Subramanian

    Hi Pamela Bretscher,

    Thanks for your reply. Here I am providing a variant line, which has the value of ReadPosRankSum or MQRankSum. Also, I am not seeing any differences after applying this filtration. So could you please explain to me why this filtration is not applied to my input file? 

    Example line:

    chr22 10536709 . T C 153717.48 VQSRTrancheSNP99.70to99.80 AC=520;AF=0.633;AN=822;AS_UNIQ_ALT_READ_COUNT=0;BaseQRankSum=-1.072e+00;DP=4929;ExcessHet=-0.0000;FS=5.776;GQ_MEAN=31.25;GQ_STDDEV=25.93;InbreedingCoeff=0.4864;MBQ=0,0;MFRL=0,0;MLEAC=1519;MLEAF=1.00;MMQ=60,60;MPOS=50;MQ=32.42;MQ0=0;MQRankSum=0.529;NCC=600;NCount=0;NEGATIVE_TRAIN_SITE;QD=33.68;ReadPosRankSum=1.30;SOR=0.013;VQSLOD=-1.995e+00;culprit=DP GT:AD:AF:DP:F1R2:F2R1:GQ:PL 1/1:0,0:1.00:0:0,0:0,0:54:624,54,0

     

     

    Regards

    Krithika S

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Krithika_Subramanian,

    Thank you for providing this example line. Are there other lines where MQRankSum and ReadPosRankSum are not present? The values are present in this particular line but the warning messages are likely coming from lines where the values are not there. These values can't be calculated at sites where there are no heterozygous individuals, but it should not affect filtration. In regards to the filtration not being applied, do you mean that VariantFiltration is not annotating any variants as PASS/failing or that filtration is not being applied for these two annotations?

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Krithika_Subramanian

    Hi Pamela Bretscher,

    Thanks for your reply. Could you please explain me, how I can check whether VariantFiltration is applied for my data?

    Because there are 742,973 variants present before applying VariantFiltration. After running the above command, the same number of variants were presents in the output .vcf file. That's the reason, I am wondering to know whether filtration is applied or not applied to my input.

     

    Regards,

    Krithika S

     

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Krithika_Subramanian,

    The VariantFiltration tool will annotate each variant either as PASS (if it passes all filter criteria) or it will annotate the variant by changing the FILTER field to the reason why the variant failed filtration. However, all of the variants will still be kept in the VCF file unless you specify that they should be removed. Therefore, it makes sense that you still have the same number of variants in the output file, but they should now be annotated based on if they passed or failed the applied filters. You can either check the output file for the annotations or specify that you would like the failing variants to be removed to check whether VariantFiltration is being applied properly. More information can be found in the tool documentation.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Krithika_Subramanian

    Hi Pamela,

    Thank you for your clarification. I have given specific parameters in my command for excluding, for example, QD <2.0, SOR > 3.0, MQ < 40.0 etc. 

    In my input, there are variants that have MQ < 30. So I am expecting that should remove after using this filtration. But I am getting the same number of variants. Kindly help me resolve these issues.

    Exact command used: gatk VariantFiltration -V Input_SNP.vcf -filter "QD < 2.0" --filter-name "QD2" -filter "QUAL < 30.0" --filter-name "QUAL30" -filter "SOR > 3.0" --filter-name "SOR3" -filter "FS > 60.0" --filter-name "FS60" -filter "MQ < 40.0" --filter-name "MQ40" -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" -O Output_SNP_filtered.vcf

     

     

    Thank you

    Regards

    Krithika S

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Krithika_Subramanian,

    These filtering parameters do not actually remove variants from your VCF file if they do not pass the filters. Rather, these variants will just be labeled with why they fail your specifications in the output VCF. If variants pass all filters, they will be labeled as PASS, but all of the variants will still be present in the VCF. Could you look at your output VCF file or include a few lines of it here to see if the variants are properly labeled (e.g. chr2 213455 G GTT 34449.7 PASS)?

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk