Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

FAQ for Mutect2 Follow

3 comments

  • Avatar
    Robert Bremel

    Hi,

    I had a strange occurrence that seems to be a glitch in the system somewhere.  A week or so ago I had something crop up with Funcotator which led me to install v1.7 of the source materials.  I then ran a set of files through our workflow that we have used for nearly a year an sent the results to a colleague to do her part of the process.  She came back saying that a significant number (not all) of the 'Protein_Change' column items were missing.  Thinking that maybe it was something amiss with v1.7 I re-funcotated with v1.6 and got the same results.  So, I went back to an unaligned .bam and re-ran everything -- exactly the same results.  In perusing the output there seemed to be a pattern in that Chr 1 seemed to be totally absent?

    This led me to a bit further check extracted the missense SNPs with an AS_FilterStatus as 'SITE'  from the .maf and tallied three columns that seemed to be problematic in the .maf. Transcript_Position, cDNA_Change, Codon_Change, Protein_change

    Below are tab delimited columns and Nrows = number of SITEs in that particular chromosome.

    Chr Nrows Transcript_Position cDNA_Change Codon_Change Protein_change
    chr1 44 0 0 0 0
    chr2 14 0 9 0 0
    chr3 20 0 20 0 0
    chr4 5 0 5 0 0
    chr5 13 0 13 0 0
    chr6 12 0 12 0 0
    chr7 17 0 17 0 0
    chr8 7 0 7 0 0
    chr9 16 0 16 0 0
    chr10 10 0 10 0 0
    chr11 15 0 15 0 0
    chr12 12 0 12 0 0
    chr13 9 0 9 0 0
    chr14 5 0 5 0 0
    chr15 14 9 14 10 8
    chr16 15 15 15 15 15
    chr17 12 12 12 12 12
    chr18 2 2 2 2 2
    chr19 22 22 22 22 22
    chr20 4 4 4 4 4
    chr21 5 5 5 5 5
    chr22 13 13 13 13 13
    chrX 14 14 14 14 14

    As you can see Chr1 is totally missing and the results are variable up to Chr15 and everything beyond that is okay.

    I also did a manual xcheck of whether the 'ref_context' oligo could be found in the "Annotation_Transcript"..  The ref_context oligo was NOT found the Annotation_Transcript nt sequence (downloaded from ESEMBL) in 194 of 300 SITEs.  When it was found it wasn't always at the 10 nt offset (sometimes it was).  I did not check the Refseq_mRNA.

    I did a bit of manual xchecking of the .vcf generated by mutect2 and that seems okay, matching what is in the funcotator .maf.

    I am not at all sure how to proceed? I can send the problem files.  

    Notes:

    1)  I am using the :latest  docker version of GATK, running on Docker Desktop on a high-end Windows 10 workstation with 128GB RAM

    2)I don't know whether there can be some kind of timing issue at google.  Funcotator connects to google and there is a lot of traffic for an extended period on our slow DSL line in this rural area.  I usually send it to run overnight.  I don't know how it works, i.e. whether the entire mutect .vcf is pushed to google and then the output trickles back over time??   

    :-) we use anonymal to anonymize the data (random adjective + random animal)  This is not a goat sequence!

    ## GATKCommandLine=<ID=Funcotator,CommandLine="Funcotator --output mydata/GBM_00067_NiceGoat/analysis/GBM_00067-92007_DT_NiceGoat_mutect2_funcotator_hg38_1.7.maf --ref-version hg38 --data-sources-path mydata/dataSourcesFolder/funcotator_dataSources.v1.7.20200521s/ --output-file-format MAF --variant mydata/GBM_00067_NiceGoat/analysis/GBM_00067-92007_DT_NiceGoat_mutect2_filtered_hg38.vcf --reference mydata/refs/Homo_sapiens_assembly38.fasta --verbosity ERROR --remove-filtered-variants false --five-prime-flank-size 5000 --three-prime-flank-size 0 --force-b37-to-hg19-reference-contig-conversion false --transcript-selection-mode CANONICAL --lookahead-cache-bp 100000 --min-num-bases-for-segment-funcotation 150 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.2.0.0",Date="June 2, 2021 7:17:41 PM GMT

     

     

     

     

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Youichi Naoe

    Dear GATK team

     I conducted GATK4 Mutect2 with "--tumor-lod-to-emit -10" and "--bam-output". When I was checking the BAM file, I recognized that some of the mismatches (variants) were not found in ArtificialHaplotypeRG. As long as I know, the BAM file from "--bam-output" is composed by 2 type of reads. The first one is the non-Artificial HaplotypeRG that is from raw read data and the another one is the Artificial HaplotypeRG that summarize the non-Artificial HaplotypeRG. I could not understand the reason why some of the mutation in non-Artificial HaplotypeRG failed to join the Artificial HaplotypeRG and failed to be recorded in VCF file. So, I'd appreciate it if you could tell me the reason or options that allow me to incorporate these dropped variants into the VCF.

    Best regards,

    0
    Comment actions Permalink
  • Avatar
    杜鹏

    Hello, I use the 'Scatter Gather' mode, and run mutect2 in parallel on separate chromosomes, but there is a consistency problem between the results obtained by using the 'Scatter Gather' mode and the results obtained by running without this mode. Excuse me, how to ensure consistent use of the 'Scatter Gather' mode.
    https://github.com/broadinstitute/gatk/issues/8152

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk