Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Huge difference in number of mutations when switching GATK versions

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Dylan Hennessey

    There are quite a bit of changes underneath the Mutect2 engine since version 4.0 till now therefore Mutect2 has become more sensitive and more intelligent along the way. Using Mutect2's filters will allevieate and remove many of those false positives therefore our suggestion would be to 

    1- Use a matched normal for tumor calling

    2- Use a valid PoN (specifically ours) to remove sequencing artifacts

    3- Use contamination filters to avoid potential cross-sample contamination

    If you wish to know more about Mutect2's filtering strategy I recommend you to read the following documentation from us.

    https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Dylan Hennessey

    Thanks for your reply. This is with using a normal control and using FilterMutectCalls as described in the guide I linked. If it would be helpful I can post the exact commands used for one run. I didn't use a PoN, I thought they had to be sequenced the same as our data, ie basically we had to make our own, but I'll retry it with that.

    Edit: Actually I'll just throw in the way I'm doing it with the new version regardless. I'll have the old version up soon.

    gatk --java-options -Xmx4000m GetPileupSummaries \
    -I GATK_runs/S77_2_1EX_BUC/ApplyBQSR/S77_2_1EX_BUC_recal.bam \
    --interval-set-rule INTERSECTION \
    -L /home/dylan/ref_data/hg38/remapped_S07604624_Padded_primaryOnly.bed \
    -V ~/ref_data/hg38//small_exac_common_3.hg38.vcf.gz \
    -L ~/ref_data/hg38//small_exac_common_3.hg38.vcf.gz \
    -O GATK_runs/S077_2_2EX_OCT/CalculateContamination/normal_pileups.table &> GATK_runs/S077_2_2EX_OCT/CalculateContamination/out.log
          NORMAL_CMD="-matched GATK_runs/S077_2_2EX_OCT/CalculateContamination/normal_pileups.table"

    gatk --java-options -Xmx4000m GetPileupSummaries \
    -R ~/ref_data/hg38//Homo_sapiens_assembly38.fasta \
    -I GATK_runs/S077_2_2EX_OCT/ApplyBQSR/S077_2_2EX_OCT_recal.bam \
    --interval-set-rule INTERSECTION \
    -L /home/dylan/ref_data/hg38/remapped_S07604624_Padded_primaryOnly.bed \
    -V ~/ref_data/hg38//small_exac_common_3.hg38.vcf.gz \
    -L ~/ref_data/hg38//small_exac_common_3.hg38.vcf.gz \
    -O GATK_runs/S077_2_2EX_OCT/CalculateContamination/pileups.table &>> GATK_runs/S077_2_2EX_OCT/CalculateContamination/out.log

    gatk CalculateContamination \
    -I GATK_runs/S077_2_2EX_OCT/CalculateContamination/pileups.table \
    -O GATK_runs/S077_2_2EX_OCT/CalculateContamination/con_tab.table \
    --tumor-segmentation GATK_runs/S077_2_2EX_OCT/CalculateContamination/seg_tab.table \
    -matched GATK_runs/S077_2_2EX_OCT/CalculateContamination/normal_pileups.table &>> GATK_runs/S077_2_2EX_OCT/CalculateContamination/out.log

    gatk --java-options -Xmx4000m GetSampleName \
    -R ~/ref_data/hg38//Homo_sapiens_assembly38.fasta \
    -I GATK_runs/S077_2_2EX_OCT/ApplyBQSR/S077_2_2EX_OCT_recal.bam \
    -O GATK_runs/S077_2_2EX_OCT/M2/tumor_name.txt -encode
    tumor_command_line="-I GATK_runs/S077_2_2EX_OCT/ApplyBQSR/S077_2_2EX_OCT_recal.bam -tumor `cat GATK_runs/S077_2_2EX_OCT/M2/tumor_name.txt`"

    gatk --java-options -Xmx4000m GetSampleName \
    -R ~/ref_data/hg38//Homo_sapiens_assembly38.fasta \
    -I GATK_runs/S77_2_1EX_BUC/ApplyBQSR/S77_2_1EX_BUC_recal.bam \
    -O GATK_runs/S077_2_2EX_OCT/M2/normal_name.txt -encode
    normal_command_line="-I GATK_runs/S77_2_1EX_BUC/ApplyBQSR/S77_2_1EX_BUC_recal.bam -normal `cat GATK_runs/S077_2_2EX_OCT/M2/normal_name.txt`"

    gatk --java-options -Xmx4000m Mutect2 \
    -R ~/ref_data/hg38//Homo_sapiens_assembly38.fasta \
    ${tumor_command_line} \
       ${normal_command_line} \
    --germline-resource ~/ref_data/hg38//af-only-gnomad.hg38.vcf.gz \
    -L /home/dylan/ref_data/hg38/remapped_S07604624_Padded_primaryOnly.bed \
    -O "GATK_runs/S077_2_2EX_OCT/M2/out.vcf" \
    --bam-output GATK_runs/S077_2_2EX_OCT/M2/out.bam \
    --f1r2-tar-gz GATK_runs/S077_2_2EX_OCT/M2/f1r2.tar.gz &> GATK_runs/S077_2_2EX_OCT/M2/out.log

    gatk --java-options -Xmx4000m LearnReadOrientationModel \
    -I "GATK_runs/S077_2_2EX_OCT/M2/f1r2.tar.gz" \
    -O "GATK_runs/S077_2_2EX_OCT/LearnReadOrientationModel/art_tab.tsv.tar.gz" &> GATK_runs/S077_2_2EX_OCT/LearnReadOrientationModel/log.log

    gatk --java-options -Xmx4000m FilterMutectCalls \
    -V GATK_runs/S077_2_2EX_OCT/M2/out.vcf \
    -O GATK_runs/S077_2_2EX_OCT/Filter/S077_2_2EX_OCT.vcf \
    -R ~/ref_data/hg38//Homo_sapiens_assembly38.fasta \
    --contamination-table GATK_runs/S077_2_2EX_OCT/CalculateContamination/con_tab.table \
    --ob-priors GATK_runs/S077_2_2EX_OCT/LearnReadOrientationModel/art_tab.tsv.tar.gz \
    --tumor-segmentation GATK_runs/S077_2_2EX_OCT/CalculateContamination/seg_tab.table &> GATK_runs/S077_2_2EX_OCT/Filter/out.log

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    The way your workflow goes looks totally fine. As I said there are quite a bit of changes under the hood that made Mutect2 more sensitive so observing more variant calls is fine. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk