Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 Shared Variant PASS in one tumor sample, but not other samples

0

11 comments

  • Avatar
    Gökalp Çelik

    Hi David Moline

    Can you share those 3 whole variant lines from their respective VCFs so that we can take a look at a bit more carefully?

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Gökalp Çelik Thank you for responding! Here are the mutect2 VCF outputs for each of the three variants.

    I do want to note that this observation (variant calling discrepancy) is not exclusive to just this variant, but is occurring for many more variants as well (many of them germline in addition to other reasons for failing)

    In general, I'm confused on how to analyze what happens when a variant's filter PASS in one sample, but not others. Any guidance you have would be appreciated!

    Let me know if you need anything else!

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE_NAME
    Sample 1                  
    chr17 7667396 . G C . germline AS_FilterStatus=SITE;AS_SB_TABLE=28,24|70,61;DP=183;ECNT=1;GERMQ=1;MBQ=20,20;MFRL=177,178;MMQ=60,60;MPOS=37;POPAF=3.91;ROQ=93;TLOD=328.89 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:52,131:0.724:183:18,43:11,34:29,78:28,24,70,61
                       
    Sample 2                  
    chr17 7667396 . G C . PASS AS_FilterStatus=SITE;AS_SB_TABLE=2,2|78,68;DP=150;ECNT=1;GERMQ=23;MBQ=20,20;MFRL=141,176;MMQ=60,60;MPOS=41;POPAF=3.91;ROQ=93;TLOD=367.64 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:4,146:0.964:150:1,36:1,43:2,79:2,2,78,68
                       
    Sample 3                  
    chr17 7667396 . G C . germline AS_FilterStatus=SITE;AS_SB_TABLE=2,1|51,56;DP=113;ECNT=1;GERMQ=2;MBQ=41,20;MFRL=332,215;MMQ=60,60;MPOS=37;POPAF=3.91;ROQ=93;TLOD=304.42 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:3,107:0.949:110:2,34:1,31:3,73:2,1,51,56
    0
    Comment actions Permalink
  • Avatar
    David Moline

    Here is a more readable screenshot:

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    I do not see anything to differentiate between calls just from these so I have a few more questions. 

    Can you run these instances without any nfcore scripts? Just from the command line if you can. And you may limit the region around this variant (+ - 500bps on each end) to make your calls just to make sure that we are not hindered by any other stuff along the way. 

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Hello Gökalp Çelik

    Here is a new screenshot using mutect2 via command line. The command I used is shown below (this basic script was repeated for each sample):


    gatk Mutect2 \
    --input /home/.../Moline_David_NEPC_Output_2-13-2024/preprocessing/mapped/TN21_S61/TN21_S61.sorted.bam \
    --output /home/.../mutect2/mutect2_NEPC_2-13-24/TN21_mutect2_2-13-2024.vcf.gz \
    --reference /home/.../igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
    --germline-resource /home/.../igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz \
    --panel-of-normals /home/.../igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz \

    Here is the screenshot of variant 7667396:

    Two things I noticed:

    • This variant calling run took several hours to complete and resulted in hundreds of thousands of variants. I think I'm missing an essential filtering step.
    • The FILTER column lists no filters parameter for any variant. 

    Let me know what your thoughts are and how I can improve my variant calling more! Thank you so much!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi David Moline

    You need to use FilterMutectCalls tool to get your filters applied. Can you run that tool and check the variant states?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Hello Gökalp Çelik

    Here are the results using FilterMutectCalls. I also ran the data using LearnReadOrientationModel, GetPileupSummaries, and Calculate Contamination. This script should be very close to what the nf-core/sarek Mutect2 process is (I tried my best to use the same inputs/programs according to the nf-core/sarek log files).

    Code (repeated for each of the three samples):

    gatk Mutect2 \
    --input /home/.../Desktop/NEPC/Moline_David_NEPC_Output_2-13-2024/preprocessing/mapped/TN21_S61/TN21_S61.sorted.bam \
    --reference /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
    --intervals /home/.../raw_data/WES_720G.30padded.V2.3.bed \
    --germline-resource /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz \
    --panel-of-normals /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz \
    --f1r2-tar-gz f1r2.tar.gz \
    --output unfiltered.vcf

    gatk LearnReadOrientationModel \
    --input f1r2.tar.gz \
    --output read-orientation-model-artifactprior.tar.gz

    gatk GetPileupSummaries \
    --input /home/.../Desktop/NEPC/Moline_David_NEPC_Output_2-13-2024/preprocessing/mapped/TN21_S61/TN21_S61.sorted.bam \
    --reference /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
    --variant /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz \
    --intervals /home/.../raw_data/WES_720G.30padded.V2.3.bed \
    --output getpileupsummaries.table

    gatk CalculateContamination \
    --input getpileupsummaries.table \
    --tumor-segmentation segments.table \
    --output contamination.table

    gatk FilterMutectCalls \
    --variant unfiltered.vcf \
    --reference /home/.../Desktop/igenomes_GRCh38/references/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \
    --tumor-segmentation segments.table \
    --contamination-table contamination.table \
    --orientation-bias-artifact-priors read-orientation-model-artifactprior.tar.gz \
    --output filtered.vcf


    Results:

    # Sample 1
    chr17    7667396    .    G    C    .    germline    AS_FilterStatus=SITE;AS_SB_TABLE=32,27|84,68;DP=211;ECNT=1;GERMQ=1;MBQ=20,20;MFRL=195,187;MMQ=60,60;MPOS=41;POPAF=3.91;ROQ=93;TLOD=386.66    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:59,152:0.732:211:22,52:11,39:33,92:32,27,84,68

    # Sample 2
    chr17    7667396    .    G    C    .    PASS    AS_FilterStatus=SITE;AS_SB_TABLE=2,2|89,78;DP=171;ECNT=1;GERMQ=24;MBQ=20,20;MFRL=141,176;MMQ=60,60;MPOS=39;POPAF=3.91;ROQ=93;TLOD=419.84    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:4,167:0.968:171:1,39:1,50:2,90:2,2,89,78

    # Sample 3
    chr17    7667396    .    G    C    .    germline    AS_FilterStatus=SITE;AS_SB_TABLE=7,1|81,72;DP=164;ECNT=1;GERMQ=1;MBQ=37,20;MFRL=332,228;MMQ=60,60;MPOS=40;POPAF=3.91;ROQ=93;TLOD=452.48    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:8,153:0.925:161:6,50:2,50:8,110:7,1,81,72

    Let me know what you think! Hope all is well.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi David Moline

    It is good that the results you have without any external tools is consistent with that you had before. Something is seems to be different between these 3 samples in terms of that site or other similar sites. Is there another variant around this site in sample 2 vcf or sample1 and 3 vcfs? 

    Can you share IGV views of Bam files for this site and around if it's not a problem? 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    After a brief discussion with our team here is a possible situation. It looks like what Mutect2 and FilterMutectCalls doing is right here. Passing variant has a high allele frequency indicating that there could be a loss of heterozygosity due to copy number variation around that region, which is calculated by GetPileupSummaries and CalculateContamination tools, therefore raising the probability that the passing variant has higher GERMQ score which indicates the likelihood of not being a germline event. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Hi Gökalp Çelik,

    So the results from GetPileupSummaries and CalculateContamination imply loss of heterozygosity potentially from copy number variation. Is there a way to visualize the copy number variation within a region around this variant using the Mutect2 data?

    Thank you for all your help!

    David

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi David Moline

    We have a Somatic CNV workflow that you can use to call and visualize somatic copy number variations. 

    We have workflows in wdls provided in the official GATK Repository.

    https://github.com/broadinstitute/gatk/tree/4.5.0.0/scripts/cnv_wdl/somatic 

    I hope this helps. 

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk