ModelSegments - java.lang.IllegalArgumentException: Metadata of the allelic counts and the segments do not match
AnsweredHi, anyone has encountered this before?
I encountered this error when following the somatic CNV guideline on my WGS paired normal-tumor samples. No output except hets.tsv & hets.normal.tsv. Any help appreciated. thank you!
The CollectAllelicCount was ran on 1000g_hg38snphigh confidence.vcf.gz (generated using SelectVariant using AF > 0.1. (i can't find the gnomad wgs on resource bundle).
a) GATK version used: v4.2.5.0
b) Exact command used:
$GATK ModelSegments --denoised-copy-ratios T${SAMPLE}.denoisedCR.tsv --allelic-counts AlleleCount/T${SAMPLE}.allelicCounts.tsv --normal-allelic-counts AlleleCount/N${SAMPLE}.allelicCounts.tsv --output . --output-prefix ${SAMPLE}.cr
c) Entire program log:
java.lang.IllegalArgumentException: Metadata of the allelic counts and the segments do not match.
at org.broadinstitute.hellbender.utils.Utils.validateArg(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
at org.broadinstitute.hellbender.Main.runCommandLineProgram(
at org.broadinstitute.hellbender.Main.mainEntry(
at org.broadinstitute.hellbender.Main.main(
Hi Tony Tan,
Could you provide the CollectAllelicCounts commands that you ran? I noticed this caveat in the tutorial:
For the matched-control analysis, the allelic count sites for the case and control must match exactly. Otherwise, ModelSegments, which takes the counts in the next step, will error.
Thanks Genevieve
i noticed it could be due to denoisedCR file may be too sparse as it was using another interval. Or must the CollectReadCount has to be done over the same interval?
.for CollectAllelicCounts, i used
$GATK SelectVariants -R hg38/gatk.hg38.fasta -V hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz --select-type-to-include SNP --selectExpressions "AF > 0.1" -restrict-alleles-to BIALLELIC -O hg38/CNVsnplist.v2.1kG.af0.1.hg38.vcf.gz
For both tumor and matched normal,
$GATK CollectAllelicCounts -L hg38/CNVsnplist.v2.1kG.af0.1.hg38.vcf.gz -I ${SAMPLE}.recalib.cram -R hg38/gatk.hg38.fasta -O ${SAMPLE}.allelicCounts.tsvi can run the ModelSegment without the denoisedCR or without the allelicCounts.tsv; but the one using allelicCounts only do not output any log2ratio for the segment.
another question, it seems i could get copynumber call for using the Part II of tutorial (from CollectAllelicCounts onwards) and skip all the panel of normal creation etc. In what context we should use the panel of normal? it is not clear to me the added benefit given that the panel of normal creation is quite complicated especially in the selection of samples to be included in panel creation.Thank you!
Hi Tony Tan,
I see, thank you so much for the follow up information. It seems that there may be some reference mismatch error with your denoised copy ratio counts and your allelic counts.
ModelSegments groups together copy and allelic ratios that it determines are contiguous on the same segment.
Could you take a closer look at the commands generating those files and verify that the references exactly match?
You can skip the panel of normal creation when you already have a panel of normals, but you need to build a panel of normals for your analysis. Here is an article describing this:
