Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

VariantFiltration issue

Answered
0

3 comments

  • Avatar
    Theresa Saunders

    Update: The problem seems to somehow be tied to the input file for the VariantFiltration step. In my case, it is Rorida_quinquenervia.SNPall.vcf. The VariantFiltration fails as soon as it come to a SNP in this file with any value for ReadPosRankSum= in the INFO column. The value doesn't matter, even if it should pass the filter I have set for ReadPosRankSum. If I go in and manually delete "ReadPosRankSum=*" from the input file, then the VariantFiltration step can continue. For example, I've copied the first 6 rows of SNPs from my input file. SNPs at position 146, 166, 190, 236, and 269 all pass and are included in the output file; however, the SNP at position 415 does not show up, nor do any after this position. If I manually go in and edit the input file and delete the phrase "ReadPosRankSum=1.26;" then this SNP passes and is included in the output file. The SNP at position 415 should have passed anyway, since it has a ReadPosRankSum greater than -8. I'm not sure what is going wrong, but I would appreciate any help! Thanks so much!

    #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    158_Rorida_quinquenervia_?fimbriata    159_Rorida_quinquenervia_?noeana    160_Rorida_quinquenervia_?noeana    161_Rorida_quinquenervia_?noeana_brachystyla    44_Rorida_quinquenervia_?dolichostyla    96_Rorida_quinquenervia_?dolichostyla

    4471_supercontig_158    146    .    A    C    357.82    .    AC=6;AF=0.600;AN=10;DP=39;ExcessHet=0.0000;FS=0.000;MLEAC=5;MLEAF=0.500;MQ=60.00;QD=25.36;SOR=4.615    GT:AD:DP:GQ:PL    0/0:25,0:25:69:0,69,1035    1/1:0,4:4:12:165,12,0    0/0:5,0:5:15:0,15,205    ./.:0,0:0:0:0,0,0    1/1:0,3:3:9:128,9,0    1/1:0,2:2:6:85,6,0
    4471_supercontig_158    166    .    G    A    1225.88    .    AC=8;AF=0.800;AN=10;DP=69;ExcessHet=0.0000;FS=0.000;MLEAC=9;MLEAF=0.900;MQ=60.00;QD=28.73;SOR=3.545    GT:AD:DP:GQ:PL    0/0:35,0:35:99:0,99,1485    1/1:0,11:11:33:411,33,0    1/1:0,10:10:30:379,30,0    ./.:0,0:0:0:0,0,0    1/1:0,6:6:18:250,18,0    1/1:0,5:5:15:196,15,0
    4471_supercontig_158    190    .    T    A    1753.4    .    AC=6;AF=0.500;AN=12;DP=95;ExcessHet=0.0000;FS=0.000;MLEAC=6;MLEAF=0.500;MQ=60.00;QD=27.24;SOR=2.093    GT:AD:DP:GQ:PGT:PID:PL:PS    0/0:35,0:35:99:.:.:0,99,1485    1|1:0,22:22:66:1|1:182_A_AT:929,66,0:182    0/0:17,0:17:51:.:.:0,51,580    0/0:1,0:1:3:.:.:0,3,42    1|1:0,15:15:45:1|1:182_A_AT:622,45,0:182    1|1:0,5:5:15:1|1:182_A_AT:225,15,0:182
    4471_supercontig_158    236    .    G    T    3595.17    .    AC=10;AF=0.833;AN=12;DP=159;ExcessHet=0.0000;FS=0.000;MLEAC=10;MLEAF=0.833;MQ=60.00;QD=29.47;SOR=1.071    GT:AD:DP:GQ:PL    0/0:35,0:35:99:0,99,1485    1/1:0,39:39:99:1168,117,0    1/1:0,33:33:99:940,99,0    1/1:0,2:2:6:49,6,0    1/1:0,33:33:99:963,99,0    1/1:0,15:15:45:484,45,0
    4471_supercontig_158    269    .    A    G    4466.17    .    AC=10;AF=0.833;AN=12;DP=193;ExcessHet=0.0000;FS=0.000;MLEAC=10;MLEAF=0.833;MQ=60.00;QD=28.63;SOR=0.887    GT:AD:DP:GQ:PL    0/0:35,0:35:99:0,99,1485    1/1:0,42:42:99:1183,126,0    1/1:0,44:44:99:1344,132,0    1/1:0,2:2:6:49,6,0    1/1:0,47:47:99:1271,141,0    1/1:0,21:21:63:628,63,0
    4471_supercontig_158    415    .    A    G    88.74    .    AC=1;AF=0.100;AN=10;BaseQRankSum=-2.326e+00;DP=97;ExcessHet=0.0000;FS=2.218;MLEAC=1;MLEAF=0.100;MQ=60.00;MQRankSum=0.00;QD=5.22;ReadPosRankSum=1.26;SOR=0.180    GT:AD:DP:GQ:PL    0/0:35,0:35:99:0,99,1485    0/1:11,6:17:97:97,0,347    0/0:29,0:29:78:0,78,1170    ./.:0,0:0:0:0,0,0    0/0:8,0:8:0:0,0,87    0/0:8,0:8:24:0,24,306

    0
    Comment actions Permalink
  • Avatar
    Theresa Saunders

    Well, haha I figured out where everything was going wrong, and it was a really simple fix. In my original command, I just had to replace the & with ||. 

    Original:

    gatk VariantFiltration -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.SNPall.vcf --filter-name "hardfilter" -O Rorida_quinquenervia.snp.filtered.vcf --filterExpression "QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0"

    Fixed:

    gatk VariantFiltration -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.SNPall.vcf --filter-name "hardfilter" -O Rorida_quinquenervia.snp.filtered.vcf --filterExpression "QD < 5.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"

    Hopefully this helps someone else eventually :)

    0
    Comment actions Permalink
  • Avatar
    Anthony DiCi

    Hi Theresa Saunders,

    Thank you for writing to the GATK forum! I’m happy to hear that you were able to identify and fix this issue.

    We appreciate the time and effort you took to post in our forum. As you said, we hope it will help others encountering the same problem in the future.

    Thank you for being a vital part of the GATK community! If any other issue should arise, please do not hesitate to reach out again.

    Best,
    Anthony

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk