Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 resulting variant numbers

0

5 comments

  • Avatar
    Gökalp Çelik

    Hi 

    Mutect2 is highly sensitive and ploidy independent variant caller which will call any possible variants possible with given proper parameters. Since you are dealing with a whole genome sample the number of variants you receive without any pre-filtration is quite plausible. Also since you have not provided any intervals to call variants from you will have lots of variation especially from intergenic regions with many repeats and structural changes. 

    Also you mentioned that you have not performed bqsr on your bam files which may result in increased number of over/under scored systematic errors which will clutter all the variant calls. 

    Even if you think 2 samples are derived from the same cell line you can never have 100% pure homogenous collection of cells within your samples to begin with due to characteristics of cell lines. One possible solution might be to have multiple technical and/or biological replicates to call variants from and see common patterns of errors and true variants. 

    These are just the ones that came to my mind but I will also ask the team to see if they have any further suggestions. 

    Regards. 

     

    0
    Comment actions Permalink
  • Avatar
    Tanya Sarkin Jain

    Thank you, further when I run filter mutect2 I see no changes. I am also running mutect with bqsr corrected, and though it is still running, based off the current numbers I don't expect the order of the number of identified variants to drastically change. Below are my commands for the filtering

     

    #! /usr/bin/env bash

    #                     # lines starting with #$ is an instruction to the job scheduler

    #$ -S /bin/bash       # the shell language when run via the job scheduler [IMPORTANT]

    #$ -cwd               # job should run in the current working directory

    #$ -j y               # STDERR and STDOUT should be joined

    #$ -R yes

    #$ -l h_rt=300:00:00   # job requires up to 3 hours of runtime

    #$ -r y               # if job crashes, it should be restarted

     

    #A simple script to align read files (fastq) to a reference file (.fna) resulting in a (bam) format.

     

     

    #this is currently for the non-recalibrated files

     

    module load CBI

    module load gatk

    input="somatic_mt_wo_bqsr.vcf.gz"

    output="somatic_mt_wo_bqsr_PE_filtered.vcf.gz" 

    ref="GRCh38_latest_genomic.fna"

     

    gatk FilterMutectCalls -R "$ref"  -V "$input"  -O "$output"

     

    The stats for the "somatic_mt_wo_bqsr_PE_filtered.vcf.gz" are the same

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    What are the sequencing depths of these samples?

    Depending on the depth of the samples certain variant calls will have increased or decreased sensitivity. 

    Also without a proper filtering applied to those calls it would be quite early to talk about the validity of those calls. We would suggest running a thorough variant filtration using FilterMutectCalls and may be using some other thresholds that you may think of depending on the results you get. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Tanya Sarkin Jain

    Thank you for the reply I'll have to check the depth, in my above post I was attempting to do the FilterMutectCalls with the pasted commands, however, the number of resulting records in the outputted vcf seems to be the same

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Number of records will be the same but with Filters applied this time you can select the Filter=PASS variants to reduce the numbers. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk