Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ModelSegments - java.lang.IllegalArgumentException: Metadata of the allelic counts and the segments do not match

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Tony Tan,

    Could you provide the CollectAllelicCounts commands that you ran? I noticed this caveat in the tutorial:

    For the matched-control analysis, the allelic count sites for the case and control must match exactly. Otherwise, ModelSegments, which takes the counts in the next step, will error. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Tony Tan

    Thanks Genevieve

    i noticed it could be due to denoisedCR file may be too sparse as it was using another interval. Or must the CollectReadCount has to be done over the same interval?

    .for CollectAllelicCounts, i used

    $GATK SelectVariants -R hg38/gatk.hg38.fasta -V hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz --select-type-to-include SNP --selectExpressions "AF > 0.1" -restrict-alleles-to BIALLELIC -O hg38/CNVsnplist.v2.1kG.af0.1.hg38.vcf.gz

    For both tumor and matched normal,
    $GATK CollectAllelicCounts -L hg38/CNVsnplist.v2.1kG.af0.1.hg38.vcf.gz -I ${SAMPLE}.recalib.cram -R hg38/gatk.hg38.fasta -O ${SAMPLE}.allelicCounts.tsv

    i can run the ModelSegment without the denoisedCR or without the allelicCounts.tsv; but the one using allelicCounts only do not output any log2ratio for the segment.

    another question, it seems i could get copynumber call for using the Part II of tutorial (from CollectAllelicCounts onwards) and skip all the panel of normal creation etc. In what context we should use the panel of normal? it is not clear to me the added benefit given that the panel of normal creation is quite complicated especially in the selection of samples to be included in panel creation.

     

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Tony Tan,

    I see, thank you so much for the follow up information. It seems that there may be some reference mismatch error with your denoised copy ratio counts and your allelic counts. 

    ModelSegments groups together copy and allelic ratios that it determines are contiguous on the same segment.

    Could you take a closer look at the commands generating those files and verify that the references exactly match?

    You can skip the panel of normal creation when you already have a panel of normals, but you need to build a panel of normals for your analysis. Here is an article describing this: https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON-

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk