Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

VariantFiltration MQRankSum error: "-12.5 is not a recognized option"

Answered
0

9 comments

  • Avatar
    Bhanu Gandham

    Hi Chloé Girard

    Hi

    For what you are trying to do use VariantFiltration over SelectVariants. Also there are some other corrections to made in the command you are using:

    1. --filter-expression is used for INFO level filtering and --genotype-filter-expression is used for FORMAT level filtering. See documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360057440031-VariantFiltration 
    2. use must use --filter--name or --genotype-filter-name for each filter 
    3. GT (genotype) field is an exception. We have put in convenience methods to enable filtering out heterozygous calls (isHet == 1), homozygous-reference calls (isHomRef == 1), and homozygous-variant calls (isHomVar == 1). Take a look at this document: https://gatk.broadinstitute.org/hc/en-us/articles/360035891011-JEXL-filtering-expressions

    This should help resolve the errors.

    0
    Comment actions Permalink
  • Avatar
    Chloé Girard

    Hello Banhu,
    Thanks for your reply.

    My code now reads:

    gatk VariantFiltration \
    -R ~/Genomes/TAIR10_chr_all.fasta \
    -V ~/$sample1.vcf.gz \
    -O ~/$sample1$addon.vcf.gz \
    --filter-name "QD2" \
    --filter-expression "QD < 2.0" \
    --filter-name "QUAL30" \
    --filter-expression "QUAL < 30.0" \
    --filter-name "SOR3" \
    --filter-expression "SOR > 3.0" \
    --filter-name "FS60" \
    --filter-expression "FS > 60.0" \
    --filter-name "MQ40" \
    --filter-expression "MQ < 40.0" \
    --filter-name "MQRS-12.5" \
    --filter-expression "MQRankSum < -12.5" \
    --filter-name "RPRS-8" \
    --filter-expression "ReadPosRankSum < -8.0"\
    --genotype-filter-name "HOM" \
    --genotype-filter-expression "isHomVar==1" \
    --genotype-filter-name "HET" \
    --genotype-filter-expression "isHet==1"

    And I still get the same error :

    A USER ERROR has occurred: -12.5 is not a recognized option

    If I use 0 instead of -12.5 for MQRankSum and 0 instead of -8 for ReadPosRankSum, then I get the following error:

    A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',<{2.0{<{30.0??{>{3.0{>{60.0{<{40.0{<{0{<{0}' but no positional argument is defined for this tool.

    Thanks again
    Best
    Clhloé

     

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Chloé Girard

     

    Can you please share with me the reference you are using and the vcf file? I would like to recreate this error on my end to figure out why this is happening. Please follow instructions provided here to share the bug report: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671 

    0
    Comment actions Permalink
  • Avatar
    Chloé Girard

    Dear Bhanu,
    Thank you for your message

    • The exact command line that you used when you had the problem (in a text file).
    #!/bin/bash

    #PBS -N Nselectvariants

    #PBS -q lowprio
    #PBS -j oe
    #PBS -l select=1:ncpus=10

    #export TMPDIR=/scratchlocal/$USER/temp
    #mkdir -p $TMPDIR

    ### Name variables
    sample1=7R

    /opt/singularity/calls_gatk/gatk VariantFiltration \
    -R "$WRKPATH/Genomes/TAIR10_chr_all.fasta" \
    -V "$JOBPATH/$sample1$addon1.vcf.gz" \
    -O "$JOBPATH/$sample1$addon1$addon2.vcf.gz" \
    --filter-name "QD2" \
    --filter-expression "QD < 2.0" \
    --filter-name "QUAL30" \
    --filter-expression "QUAL < 30.0" \
    --filter-name "SOR3" \
    --filter-expression "SOR > 3.0" \
    --filter-name "FS60" \
    --filter-expression "FS > 60.0" \
    --filter-name "MQ40" \
    --filter-expression "MQ < 40.0" \
    --filter-name "MQRS-12.5" \
    --filter-expression "MQRankSum < 0" \
    --filter-name "RPRS-8" \
    --filter-expression "ReadPosRankSum < 0"\
    --genotype-filter-name "HOM" \
    --genotype-filter-expression "isHomVar==1" \
    --genotype-filter-name "HET" \
    --genotype-filter-expression "isHet==1"
    • The full log output (program output in the console) from the start of the run to the end or error message (in a text file).
    node20
    Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=fa
    lse -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar VariantFiltration -V /home/chloe.girard/RIL_seq
    /7RV168_recalibrated.vcf.gz -filter QD < 2.0 --filter-name QD2 -filter QUAL < 30.0 --filter-name QUAL30 -filter SOR > 3.0 --fi
    lter-name SOR3 -filter FS > 60.0 --filter-name FS60 -filter MQ < 40.0 --filter-name MQ40 -filter MQRankSum < 0 --filter-name MQRank
    Sum0 -filter ReadPosRankSum < 0 --filter-name ReadPosRankSum0 -O /home/chloe.girard/RIL_seq/7RV168_recalibrated_SNPs_filtered.vcf.g
    z
    USAGE: VariantFiltration [arguments]

    Filter variant calls based on INFO and/or FORMAT annotations.
    Version:4.1.9.0-SNAPSHOT


    Required Arguments:

    Goes on to list all required arguments, then

    ***********************************************************************

    A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',<{2.0{<{30.0{>{3.0{>{60.0{<{40.0{<{0{<{0}' but no positional argument is defined for th
    is tool.

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

     

    • A snippet of the BAM or VCF file if applicable and the index file associated with it.

    using the zcat command | more

    ##fileformat=VCFv4.2
    ##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
    ##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is no
    t intended to describe called alleles">
    ##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
    ##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
    ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
    ##GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output /home/chloe.girard/RIL_seq/7RV168_recalibrated.vcf.gz --variant /home/chloe.girard/RIL_seq/7RV168_recalibrated.g.vcf.gz --reference
    /home/chloe.girard/Genomes/TAIR10_chr_all.fasta --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combin
    ed-raw-annotations false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0
    --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --only-output-c
    alls-starting-in-intervals false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-up
    dates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false
    --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help f
    alse --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-
    filters false --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.1.9.0-SNAPSHOT",Date="April 29, 2021 2:36:27 PM GMT"
    >
    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --output /home/chloe.girard/RIL_seq/7RV168_recalibrated.g.vcf.gz --input /home/chloe.girard/RIL_seq/bam-files/
    7RV168_recalibrated_sorted.bam --reference /home/chloe.girard/Genomes/TAIR10_chr_all.fasta --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosi
    ty-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --contamination-fraction-to-fi
    lter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands
    8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-ba
    nds 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-
    gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --
    gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands
    51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-b
    ands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --floor-blocks false --indel-size-to-eliminate-in-ref-model 10 --disable-optimizations false --just-determine-active-regions false --dont-genotype false --do-
    not-run-physical-phasing false --do-not-correct-overlapping-quality false --use-filtered-reads-for-annotations false --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-he
    ads false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-bran
    ches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --
    max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --err
    or-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --base-quality-score-threshold 18 --pair-hmm-gap-continua
    tion-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision
    false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --max-mnp-distance 0 --force-call-filtered-alleles false --allele-informativ
    e-reads-overlap-margin 2 --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding
    100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-reads-per-alignment-start 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interva
    l-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false -
    -create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetc
    h-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 7RV168
    1 346 . C T 440.06 . AC=2;AF=1.00;AN=2;DP=15;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=39.89;QD=29.34;SOR=0.818 GT:AD:DP:GQ:PL 1/1:0,15:15:45:454,45,0
    1 502 . T C 976.06 . AC=2;AF=1.00;AN=2;DP=22;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=41.73;QD=25.36;SOR=0.874 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,22:22:
    66:1|1:502_T_C:990,66,0:502
    1 508 . T C 1001.06 . AC=2;AF=1.00;AN=2;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=41.74;QD=28.73;SOR=0.963 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,23:23:
    69:1|1:502_T_C:1015,69,0:502
    1 657 . C T 559.06 . AC=2;AF=1.00;AN=2;DP=20;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=38.27;QD=27.95;SOR=0.892 GT:AD:DP:GQ:PL 1/1:0,20:20:60:573,60,0
    1 698 . G A 914.06 . AC=2;AF=1.00;AN=2;DP=22;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=35.34;QD=30.97;SOR=0.784 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,21:21:
    63:1|1:698_G_A:928,63,0:698
    1 700 . AT A 914.03 . AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=34.99;QD=27.24;SOR=0.784 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,21:21:
    63:1|1:698_G_A:928,63,0:698
    1 711 . T C 912.06 . AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=34.88;QD=28.20;SOR=0.990 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,21:21:
    63:1|1:698_G_A:926,63,0:698
    1 730 . G A 546.06 . AC=2;AF=1.00;AN=2;DP=20;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=35.22;QD=27.30;SOR=1.127 GT:AD:DP:GQ:PL 1/1:0,20:20:60:560,60,0
    1 754 . C T 920.06 . AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=35.57;QD=25.00;SOR=1.230 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,21:21:
    63:1|1:754_C_T:934,63,0:754
    1 755 . T C 920.06 . AC=2;AF=1.00;AN=2;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=35.57;QD=29.56;SOR=1.230 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,21:21:
    63:1|1:754_C_T:934,63,0:754
    1 864 . C T 601.06 . AC=2;AF=1.00;AN=2;DP=22;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=39.71;QD=27.32;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,22:22:66:615,66,0
    --More--

     

    • If you used a non-standard reference (i.e. not available in our resource bundle), we need the .fasta, .fai, and .dict files for the reference.

    I used TAIR10_chr_all.fas from

    https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes%2FTAIR10_genome_release%2FTAIR10_chromosome_files

     

    Thank you for your help,
    Chloé

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Chloé Girard

    I think you might have missed that last step of sending us the files. The instructions on how to send us the files are provided in the article mentioned above.

    0
    Comment actions Permalink
  • Avatar
    gouret

    Hi every one

    I got the same kind of troubles, i just eliminated blank characters in filtering expressions and it works.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for posting your solution gouret!

    0
    Comment actions Permalink
  • Avatar
    Chloé Girard

    Dear all,
    Sorry I didn't update sooner, the project got sidetracked

    @gouret 's solution worked (removing the space) for all --filter-expression
    So the code now reads thus:

    gatk VariantFiltration \
    -R ~/Genomes/TAIR10_chr_all.fasta \
    -V ~/RIL_seq/essais/$sample1$addon1.vcf.gz \
    -O ~/RIL_seq/essais/$sample1$addon1$addon2$addon3.vcf.gz \
    --filter-name "QD2" \
    --filter-expression "QD<2.0" \
    --filter-name "QUAL30" \
    --filter-expression "QUAL<30.0" \
    --filter-name "SOR3" \
    --filter-expression "SOR>3.0"
    --filter-name "FS60" \
    --filter-expression "FS>60.0" \
    --filter-name "MQ40" \
    --filter-expression "MQ<40.0" \
    --filter-name "MQRS-12.5" \
    --filter-expression "MQRankSum<-12.5" \
    --filter-name "RPRS-8" \
    --filter-expression "ReadPosRankSum<-8.0"\

    and it works beautifully, adding the right filter name into the FILTER column of my VCF.

    I'm in the process of figuring out how to call homozygous and heterozygous mutations with the --genotype-filter-expression command, which does not work still.

    Will keep everyone updated.

    Thanks for your help
    Chloé

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the update Chloé Girard!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk