Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CreateSomaticPanelOfNormals BETA tag

Answered
0

11 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi G E,

    I am going to move your post into our Community Discussions -> Documentation Questions topic, as the Somatic topic is for reporting bugs and issues with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi G E,

    I confirmed with our developers that CreateSomaticPanelofNormals is no longer in BETA, the results from the tool have been fully tested.

    The BETA tag does not affect Mutect2.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Arijit Panda

    Could you please explain the meaning of Beta value. 

    ```
    chr1    33432   .       A       G       .       .       BETA=0.706,0.141;FRACTION=0.080

    ```

    What does the `BETA=0.706,0.141` value represent  and how it's derived

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Arijit Panda Beta represent the shape parameters of a beta distribution of artifact allele fractions.  For example, in the line above we see that at position 33432 of chr1, a fraction 0.08 of all PoN samples exhibited an artifact, and when artifacts occurred their allele fractions followed a beta distribution with shape parameters (alpha = 0.706, beta = 0.141).  (If you plot these particular parameters you'll see a wide spread from 0 to 1, meaning that allele fractions are all over the place for this artifact).

    1
    Comment actions Permalink
  • Avatar
    Arijit Panda

    Thanks for the response. There are some entires like;

     

    chr1    33521   .       T       A,*     .       .       BETA=1.00,1.00;FRACTION=1.00

    chr1    33530   .       CTT     C,CT    .       .       BETA=1.00,1.00;FRACTION=1.00

    chr1    94119   .       G       T,A     .       .       BETA=1.00,1.00;FRACTION=1.00

    I examined a few instances of above such variants. The variants are found in a few samples but not in all cases. Therefore, I am uncertain about how the fraction reached 1. Could the beta information be relevant in this context?

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    It's possible that all samples had some sort of artifact at these positions, even if they were different from those in the VCF.  It's also important to remember that the tool ignores samples with germline variation (since the PoN is a blacklist of technical artifacts, which we don't wish to conflate with germline variation).  In this part of the genome where there's a lot of unmappability and the reference is less reliable it's quite possible that every sample differed from the reference and hence very few samples were left over as non-germline.

    In any case, I wouldn't worry too much because FilterMutectCalls does not use either of these tags.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Also, here is my public service announcement whenever questions about making a panel of normals comes up: 

    Using the panels of normals in our public google bucket gs://gatk-best-practices/ is almost always superior to creating your own.  Unless you are working with non-human data or you have at least 100 normals and a very good reason you are better off using one of our panels.

    If you do make your own panel, we recommend using our workflow either on Terra: https://app.terra.bio/#workspaces/help-gatk/Somatic-SNVs-Indels-GATK4/workflows/help-gatk/1-Mutect2_PON or by running our WDL script: https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2_pon.wdl.

    0
    Comment actions Permalink
  • Avatar
    Arijit Panda

    Thanks. I generated my own panel of normal variants using normal BAMs .The CreateSomaticPanelOfNormals takes Mutect2 variants as input. If a variant is not present in the Mutect2 output VCF, how will it retrieve the variant details for other samples? Am I missing anything?

    Can I manually filter PON variants solely based on coordinates, or does the FilterMutectCalls program apply additional logic in the filtering process?

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    I don't think I understand your first question.  CreateSomaticPanelOfNormals analyzes all of the variants, or lack of variants, from all input variants at each site.

    FilterMutectCalls uses the PoN only as a coordinate-based blacklist.  However, you should not run FilterMutectCalls without a PoN even if you later do the PoN filtering manually.  This is because FilterMutectCalls refines several other artifact models iteratively and these models would be fit poorly without the benefit of PoN filtering.  For example, the somatic mutation rate would be greatly overestimated.

    0
    Comment actions Permalink
  • Avatar
    Arijit Panda

    The first query is about the variants with fraction 1. 

    I used below steps to generate my merged PON. Ref: https://gatk.broadinstitute.org/hc/en-us/articles/360037058172-CreateSomaticPanelOfNormals-BETA

    # Step 1:

    gatk Mutect2 \ -R reference.fa \ -I normal1.bam \ -tumor normal1_sample_name \ --germline-resource af-only-gnomad.vcf.gz \ -O normal1_for_pon.vcf.gz


    # step 2
    gatk CreateSomaticPanelOfNormals \ -vcfs normal1_for_pon_vcf.gz \ -vcfs normal2_for_pon_vcf.gz \ -vcfs normal3_for_pon_vcf.gz \ -O pon.vcf.gz

    If a variant is reported as fraction 1 means all samples has this artifact. Upon my check  in mutect2 output files, I found there is no record of a variant in few samples. For instance, the variant chr1 94119 . G T,A is found only in 4 samples and not in all.

    Therefore, I am trying to understand how, if a variant is not present in the Mutect2 output VCF (e.g., normal1_for_pon_vcf.gz, normal2_for_pon_vcf.gz, etc.), the tool retrieves variant details for all samples and report fraction as 1. Am I missing anything?

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    I looked into the code and it appears that CreateSomaticPanelOfNormals does this when it encounters a multiallelic site.  Although this behavior is incorrect, the FRACTION and BETA tags are never used.  Anyway, the next version of Mutect is coming out in a few months and it doesn't use a PoN.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk