Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 Questions

0

4 comments

  • Avatar
    David Benjamin

    Joseph Ong Thank you for organizing your questions so thoroughly and clearly.  This is very helpful to us.

    There are a few good ways to find the sample name:

    (i) In the bam header the sample is contained in the read group line.  For example, the sample is "20" below:

    @RG ID:4 LB:lib1 PL:ILLUMINA SM:20 PU:unit1

    Note that a bam may have multiple read groups all with different sample or with the same sample.

    (ii) There is a GATK tool GetSampleName.  You can run it as follows:

    gatk GetSampleName -I normal.bam -O normal_sample.txt

    (iii) If you give Mutect2 a bogus sample name it gives you an error message with the actual sample names.  For example,

    gatk Mutect2 -R ref.fasta -I tumor.bam -I normal.bam -normal bogus_name -O out.vcf

    gives an error like "Sampel name bogus is not in the list of sample names [tumor_sample, normal_sample]

     

    Running Mutect2 without the -normal flag means Mutect2 runs in multi-sample tumor-only mode (any sample not explicitly stated to be normal is assumed to be a tumor sample).  Thus the 25,000 variants you found are sites where either 18CHa or 0CSa differ from the reference.

     

    Finally, although the -germline-resource option is technically optional, for all human calling you should use our af-only gnomAD vcf.  Depending on your reference these are in the following public google buckets:

    gs://gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf

    gs://gatk-best-practices/somatic-hg38/af-only-gnomad.hg38.vcf.gz

    0
    Comment actions Permalink
  • Avatar
    Joseph Ong

    David Benjamin -- whatever they are paying you, it is not enough. Thank you very much for your help. 

    • Running Mutect2 without the -normal flag means Mutect2 runs in multi-sample tumor-only mode (any sample not explicitly stated to be normal is assumed to be a tumor sample).  Thus the 25,000 variants you found are sites where either 18CHa or 0CSa differ from the reference.

    I was wondering for a very long time why the number of somatic variants identified was so large (it was, in fact, the same as when I just did regular variant calling and not somatic variant calling). I knew I had ~25k variants between my genomes and the reference genome, so I figured something was wrong when I was seeing ~25k somatic variants.

    ---------------

    So if I interpreted everything correctly, the name for the BAM file (the normal in this case, 0CSa) was 20. So you are saying I should run the command like this:

    [jong2@crcfe02 ~/Private]$ gatk-4.1.7.0/gatk Mutect2 -R RedoReference/S288C_reference_sequence_R64-2-1_20150113.fa -I 18CHa_S288C_Groups_8May2020.bam -I 0CSa_S288C_Groups_8May2020.bam -normal [NAME HERE, which was 20 from the above example] -O 18CHa_Mutect2_0CSaS288CBase_8May2020.vcf

    Unfortunately, both my normal and my tumor are both named "20". 

    Would you suggest that I change the name of my BAM files via something like this or this so they all have unique names?

     

    Please pardon the questions. I'm self-learning which has been rewarding but challenging.

    Thanks,

    Joseph

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    That's right, you will need the tumor and normal to have different sample names, and samtools reheader is a good way to do this.  If something goes wrong, please let us know, and good luck with the self-education.

    0
    Comment actions Permalink
  • Avatar
    Joseph Ong

    Hi David Benjamin, I got it to work and my data looks reasonable! Thank you very much!! :)

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk