Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Funcotator and mutect2 coordinate system

0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Robert Bremel could you supply the Mutect2 and Funcotator commands you used, as well as the version numbers?

    0
    Comment actions Permalink
  • Avatar
    Robert Bremel

     

    I am using the docker version 4.1.8.1

    vcf file

    ##source=FilterMutectCalls
    ##source=Mutect2   -default command

     

     

     Funcotator --output mydata/P58772/analysis/P58772_7_mutect2_funcotator_hg38.maf --ref-version hg38 --data-sources-path mydata/dataSourcesFolder/funcotator_dataSources.v1.6.20190124s/ --output-file-format MAF --variant mydata/P58772/analysis/P58772_7_mutect2_filtered_hg38.vcf --reference mydata/refs/Homo_sapiens_assembly38.fasta --verbosity ERROR --remove-filtered-variants false --five-prime-flank-size 5000 --three-prime-flank-size 0 --force-b37-to-hg19-reference-contig-conversion false --transcript-selection-mode CANONICAL --lookahead-cache-bp 100000 --min-num-bases-for-segment-funcotation 150 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false

     

     

    0
    Comment actions Permalink
  • Avatar
    Robert Bremel

    I combined the funcotator.maf files and mutect2.vcf files from 20 variant sets and tallied-up the differences.

    I have only been working at this for a couple of months but the issue seems to arise in both 4.1.7.1 and 4.1.8.1 docker versions

    The issue seems to be confined to the following variant class deletions.

    The 'POS' coordinate in the .vcf matches the 'Start_Position' -1 in the .maf

    For others they coordinates match. 

    In_Frame_Del,DEL
    5'Flank,DEL
    Intron,DEL
    RNA,DEL
    3'UTR,DEL
    IGR,DEL
    Splice_Site,DEL
    Frame_Shift_Del,DEL

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Robert Bremel is this issue present in older GATK versions?

    0
    Comment actions Permalink
  • Avatar
    Robert Bremel

    Sorry, I really don't know about anything prior to 4.1.7.1 (only Docker).   I am relatively new to this whole area having spent  my career downstream in the protein world!  :-) 

    I only happened on it when one of my collaborators asked what had happened to a common mutation that had gone missing from a dataset?  It happened when we hooked the .maf to the .vcf  to cross check a couple of things. After having convinced myself that I hadn't screwed up I set about trying to figure out what had happened.

    Indels can really be a bear to 'proteinate' with only a single nucleotide coordinate, a couple of oligos and a half dozen protein variants.   

    For example, with deletions, the "Reference_Allele' and "Tumor_allele" oligos, although they are the same,  are many times difficult to unambiguously assign in the proper reading frame    So, when the coordinate is different it really can really make a mess of things.  I guess  loading the sequences into Lasergene helps to figure things out most of the time. 

    Actually, inclusion of some type of contextual upstream and downstream oligos in the mutect2 output would really help.  It would seem while in operation mutect2 could do that unambiguously.

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Robert Bremel thank you for the info, I'll look into this and keep you up to date when I have more information.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Robert Bremel I have heard back from the team and confirmed that this is not a bug/issue. The annotation you are referring to for this mutation has a different meaning than what you are writing here. Here the documentation where we go over the annotations:

    This annotation (g.chr5:68293751_68293765delAAATTACATGAATAT) is field 12 - genomeChange (link), which is demonstrating which of the bases were changed, not where the mutation occurs.  For this annotation, you are seeing the bases that have been deleted in this mutation, which is 68293751-68293755, because those are the bases that were specifically changed. If you have the VCF output from funcotator, you will see that the position matches the mutect2 position (chr5   68293750).

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk