Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Hard filters on VariantFiltration following HaplotypeCaller; Filter contains an illegal character

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Graeme Thorn,

    I think the issue is in your no reads filter:

    --filter-name "no reads" --filter-expression "DP < 10"

    Try switching the 10 to 10.0 so that it is read as an integer. 

    Let me know if this works.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Graeme Thorn

    Hi Genevieve,

    I have tried changing the filter expression to "DP < 10.0", but still get the same error.

    One thing I did find with switching from GATK3(.8) to GATK4 was that I needed to index the file first using

    gatk IndexFeatureFile -I <VCF>

    before filtering 

    gatk VariantFiltration -R <FA> \
    -V <VCF_IN> \
    -O <VCF_OUT> \
    --cluster-size 3 \
    --cluster-window-size 35 \
    --filter-name "low coverage" --filter-expression "QD < 5.0" \
    --filter-name "no reads" --filter-expression "DP < 10.0" \
    --filter-name "failed RPRS" --filter-expression "ReadPosRankSum < -8.0" \
    --filter-name "failed MQRS" --filter-expression "MQRankSum < -12.5" \
    --filter-name "failed MQ" --filter-expression "MQ < 40.0" \
    --filter-name "failed FS" --filter-expression "FS > 60.0"

    Might this be causing the issue? 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Graeme Thorn,

    Thanks for checking that. I don't think the indexing has to do with this issue. It looks like the spaces in your filter names are throwing this exception, spaces are not allowed in the filter name because of the VCF specifications

    I would also recommend keeping the 10.0 instead of 10 because other users have had that come up as a problem before.

    Let me know if this fixes it!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Graeme Thorn

    Hi Genevieve,

    That seems to have fixed it - I've also pre-emptively changed the other filter names to remove the spaces. It obviously is how GATK4 deals with the strings in the filter names compared to GATK3(.8)

    Thanks again,

    Graeme

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great! Glad that this is fixed and thank you for posting the solution!

    0
    Comment actions Permalink
  • Avatar
    bioinfo analyst

    Hi Genevieve Brandt (she/her)

    I am using gatK version 4.2.1.0 for variant filtration but it generates same undefined variable warnings for ReadPosRankSum and MQRankSum. 

    Command used:


    java -jar -Xmx30G gatk.jar VariantFiltration -R reference.fa -V A_raw_snps.vcf -O A_filtered_snps.vcf --filter-name "QD_filter" -filter "QD < 2.0" --filter-name "FS_filter" -filter "FS > 60.0" --filter-name "SOR_filter" -filter "SOR > 10.0" --filter-name "MQRankSum_filter" -filter "MQRankSum<-12.5" --filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum<-8.0" --genotype-filter-expression "DP < 10" --genotype-filter-name "DP_filter" --genotype-filter-expression "GQ < 10" --genotype-filter-name "GQ_filter"

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi bioinfo analyst

    Those warning messages are not a problem. Not all variant contexts contain all the parameters that you are using to filter therefore those warning messages are issued. Tool works just as expected.

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk