Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Somatic variant calling of WES using Gh38.p14 reference, and mutect_resources.wdl

1

4 comments

  • Avatar
    Gökalp Çelik

    Hi D S

    The answer to your first question would be to exclude those unlocalized contigs if you do not want those reads to map anywhere but primary contigs. However whose unlocalized contigs usually take over some of the clutter from primary contigs therefore you may need to analyze your data with and without these contigs if you have concerns. You may need to check if you have additional FP and TN calls accumulate as a result of either usage. Normally DRAGEN workflow has its recommendations for the masked reference sequence. 

    https://gatk.broadinstitute.org/hc/en-us/articles/17295731870235-Masked-reference-genomes 

    Common biallelic snps are used to calculate tumor segmentation and contamination therefore we recommend using it. 

    Cromwell options depend on the version as well as the server configuration therefore there may not be simple easy answer. Most of our workflows are already built in and ready to go under Terra however custom wdls may need further testing for compatibility. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    D S

    Hi Gökalp Çelik,

    Thank you so much for your answer! I think even those [preemptible, disks, cpu, memory] commands are not recognized, the system just go with the default settings, which still runs, but slower.  Can I have two follow up questions? 

    I looked into the code of mutect_resources.wdl. gatk/scripts/mutect2_wdl/mutect_resources.wdl at master · broadinstitute/gatk · GitHub

    If this was used to generate the files in best practice, I suppose using it on gnomAD 4.1 would generate valid Allele frequency only vcfs? 

    Secondly, in the SelectCommonBiallelicSNPs function, there is an option of  minimum_allele_frequency. Is the value 0 recommanded for this function?

     

    I am just thinking since gnomAD 4.1 is so big, generating AF-only vcf and common biallelic SNP from it would be more helpful.

    Best,

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    We do have our resource files for such purposes you may use them as well insted of creating your own. We have a AF only gnomad source available inside. 

    https://storage.googleapis.com/gatk-best-practices/somatic-hg38/af-only-gnomad.hg38.vcf.gz 

    All other resource bundle files can be found in the following link

    https://console.cloud.google.com/storage/browser/gatk-best-practices 

    For compatibility purposes you may need to readjust header sections of these resource files and remove any non-applicable contigs from the variant contexts. We usually work with variants in reference contigs therefore anything outside of chr1-22,X,Y is usually not useful unless you have specific purposes. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    D S

    Dear Gökalp Çelik,

    Thank you so much for your reply and information. I understand the adjusting header issue because the new dbSNP and grch38 use a different header.

    The af-only file you provided uses gnomAD 2. I am just a little bit exploring on the outcome of using gnomAD 4.1, and comparing it with gnomAD2. I have been using other files, like indels vcf and pon from the best practices.

    And thank you for the suggestion of removing the non-applicable contigs.

    Hope you have a nice week. :D

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk