Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 forced calling via --alleles does not always generate an output

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi M Carta

    The site that you are trying to genotype looks like it has a very high allelic fraction therefore it is a potential germline site with a very high GERMQ value. Germline sites by default are filtered and not emitted by Mutect2 so you may need to add the following parameter to your commandline to force call those alleles albeit they will be filtered by FilterMutectCalls command due to high probability of being germline. 

    --genotype-germline-sites <Boolean>    Call all apparent germline site even though they will ultimately be filtered.  Default
                                value: false. Possible values: {true, false}

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    M Carta

    Thank you Gökalp Çelik. I tried adding the option suggested by you, but I'm afraid that the output VCF is still empty:

    gatk Mutect2 \
    -R "hg38.fa" \
    --alleles "variant.vcf.gz" \
    --genotype-germline-sites TRUE \
    -L "variant.vcf.gz" \
    -I "input.bam" \
    -O "output.vcf"

    As a non-expert, to me the issue seems to primarily depend on "variant.vcf.gz", given that using a VCF with a certain format (the first one in my post above) allows an output to be generated, even if the variant I'm looking for isn't present at all.

    In fact, I just tried editing the functional "variant.vcf.gz" and entered random POS values; an output was produced.


    In case you're wondering why I'm doing this: I want to see whether there are any reads supporting certain somatic mutations of interest.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    For the sake of reproducing this issue I created the below vcf file for targets.

    ##fileformat=VCFv4.3
    #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    sample
    chr17    7654101    .    A    G    .    .    .    .    .
    chr17    7654112    .    A    G    .    .    .    .    .

    The site at the top has a variant in my bam file whereas the site at the bottom is HOMREF. When Mutect2 is run with the following command line 

    gatk Mutect2 -R /mnt/E/Genomes/hg38/hg38v3.fasta -I sorted.bam -O gtalleles.vcf --alleles targets2.vcf.gz -L targets2.vcf.gz

    I get the below result

    #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    sample
    chr17    7654101    .    A    G    .    .    AS_SB_TABLE=0,0|71,79;DP=155;ECNT=1;MBQ=0,37;MFRL=0,301;MMQ=60,60;MPOS=37;POPAF=7.30;TLOD=526.99    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:0,150:0.992:150:0,64:0,56:0,122:0,0,71,79

    But when I run the Mutect2 with the additional

    -ERC GVCF 

    I get the following output.

    #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    sample
    chr17    7654101    .    A    G,<NON_REF>    .    .    AS_SB_TABLE=0,0|71,79|0,0;DP=155;ECNT=1;MBQ=0,37,0;MFRL=0,301,0;MMQ=60,60,60;MPOS=37,50;POPAF=7.30,7.30;TLOD=526.15,-1.794e+00    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1/2:0,150,0:0.984,7.893e-03:150:0,64,0:0,56,0:0,122,0:0,0,71,79
    chr17    7654112    .    A    <NON_REF>    .    .    END=7654112    GT:DP:MIN_DP:TLOD    0/0:161:161:-2.210e+00

    Looks like Mutect2 is not emitting sites when there is no evidence. Is this what you have been observing? 

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    M Carta I'm not sure this is the root of the issue but the lack of sequence dictionary lines eg 

    ##contig=<ID=chr2,length=242193529>

    in the header of your "manual" VCFs worries me a bit.  How about you try again after pasting in the sequence dictionary lines to the header (this may require re-indexing as well with IndexFeatureFile) and let us know how it goes?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk