GATK4.1.7.0 Germline CNV DetermineGermlineContigPloidy predicts a trisomy
AnsweredHello all,
I'm currently working on the germinal cnv calls for a patients exome set who will be enriched over time.
First: Is it possible to integrate into the model(obtained initially with the cohort mode) data obtained with the case mode or I'm obliged to reanalyze the whole cohort when new patients are added?
Second: I had a problem concerning the DetermineGermlineContigPloidy command using.
Command used:
singularity exec -B /data,/home --no-home /data/singularity-cachedir/gatk_4.1.7.0.img bash -c "export MKL_NUM_THREADS=20 ; export OMP_NUM_THREADS=1 ; source activate gatk ; gatk DetermineGermlineContigPloidy -L '+IntervalList+' --interval-merging-rule OVERLAPPING_ONLY'+listReadCount+' --contig-ploidy-priors '+prior+' --output '+newpath+' --output-prefix ploidy --verbosity DEBUG
Indeed this one predicts to me a trisomy for chromosome 19 for the whole patients set(n = 109).
Error returned:
WARNING gcnvkernel.structs.metadata - Sample has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
My contig_ploidie_prior.tsv file is configured as below for this chromosome:
CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
chr19 0.01 0.01 0.97 0.01
DetermineGermlineContigPloidy output example obtained:
/CONTIG PLOIDY PLOIDY_GQ
/chr1 2 128.79120745530386
/chr2 2 88.2662160411951
/chr3 2 135.98102793862896
/chr4 2 74.58305957110926
/chr5 2 135.1374458535775
/chr6 2 132.7949658308639
/chr7 2 129.23786713195176
/chr8 2 105.9052551152978
/chr9 2 128.2833029180387
/chr10 2 135.5963835481727
/chr11 2 25.12729200787742
/chr12 2 135.37154404583782
/chr13 2 75.21306937168566
/chr14 2 74.11104893998986
/chr15 2 136.01506218812585
/chr16 2 88.28107342226903
/chr17 2 30.655464138374793
/chr18 2 74.21185756472741
/*chr19 3 97.29096205907352*
/chr20 2 133.81486913998094
/chr21 2 73.76096975622447
/chr22 2 28.866822912513143
/chrX 1 35.30133817356203
/chrY 1 56.53608102572696
Knowing that chromosome 19 is very gene-rich and that we are in the exome, the reads number is therefore important, this must greatly influence the ploidy prediction.
In this case, should we touch the hyperparameters? If so, which do you recommend?
Does this ploidy error prediction directly influence the CNV calling in the next step?
Thanks in advance for your answer.
-
Hi Lithium22,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
Thank you for your answer.Ok the first point is a general question on the GATK usage, however the second point concerns an abnormal return from the tool.Is it necessary to create a new thread with the second problem in the forum?
-
Lithium22 the warning message suggests this is a limitation of the tool and to verify the ploidy you would need to use orthogonal methods:
WARNING gcnvkernel.structs.metadata - Sample has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
We apologize for not being able to resolve this issue but the message above is the extent of the support we can provide, as there are some limitations of the tools. However, if someone from the community has more knowledge on the this topic or has information that can help in this case we encourage you contribute here.
-
One thing to make sure that this prediction does not happen is to prevent region filtering as much as possible from other chromosomes. Firstly, manually adjust GC rich interval sizes to be within allowed limit or remove GC content filtration. Secondly remove segment duplication filter if you are using it within your annotation file. Third allow more regions to be included in the mappability filter track if you are using it.
This may increase the number of false positive sites slightly (but this can be eliminated using a comparative approach within the cohort therefore only unique events will prevail) however your false contig ploidy inferences will be greatly reduced. Also try to increase the sample size.
-
Hello SkyWarrior,
Thank you very much for your response and your time.
My patient cohort comes from different sequencing technologies.
Therefore, I created a intervals list based on the GRCh38 exonic positions.
I did not filter this interval file.Your advice either to increase the patients number in the cohort made it possible to obtain a correct ploidy prediction for chromosome 19.
However, the associated GQs are rather weak with each prediction (between 10 and 20).
To get a correct prediction, I grouped the patients by technology.
Indeed if I incorporate them all together during the ploidy predictions these are again erroneous. -
Lithium22: How was this made/obtained? contig_ploidie_prior.tsv - Thank you
-
Hello Dr N Ch,
Below is my contig_ploidy_prior.tsv file.
The proportions have been adjusted to my patient cohort which is almost entirely male.CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3 chr1 0.01 0.01 0.97 0.01 chr2 0.01 0.01 0.97 0.01 chr3 0.01 0.01 0.97 0.01 chr4 0.01 0.01 0.97 0.01 chr5 0.01 0.01 0.97 0.01 chr6 0.01 0.01 0.97 0.01 chr7 0.01 0.01 0.97 0.01 chr8 0.01 0.01 0.97 0.01 chr9 0.01 0.01 0.97 0.01 chr10 0.01 0.01 0.97 0.01 chr11 0.01 0.01 0.97 0.01 chr12 0.01 0.01 0.97 0.01 chr13 0.01 0.01 0.97 0.01 chr14 0.01 0.01 0.97 0.01 chr15 0.01 0.01 0.97 0.01 chr16 0.01 0.01 0.97 0.01 chr17 0.01 0.01 0.97 0.01 chr18 0.01 0.01 0.97 0.01 chr19 0.01 0.01 0.97 0.01 chr20 0.01 0.01 0.97 0.01 chr21 0.01 0.01 0.97 0.01 chr22 0.01 0.01 0.97 0.01 chrX 0.01 0.92 0.06 0.01 chrY 0.01 0.97 0.01 0.01 chrM 0.01 0.01 0.97 0.01 -
Thank you Lithium22!
Mine is a mixed population and yes humans! Can u let me know how are the proportions adjusted!
-
The proportions are adjusted for the X chromosome as I have almost only males I don't expect a diploidy for this chromosome.
If you have for example a population composed of 50% males and 50% females your file should probably look like :CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
chrX 0.01 0.49 0.49 0.01
chrY 0.01 0.97 0.01 0.01 -
Hi, I've ran into this issue recently.
Basically, I'm using 43 samples (we only have 43 WES samples captured with the same set of probe) to run DetermineGermlineContigPloidy in COHORT mode.
Several of the sample ID has anomalous ploidy state for contain 16,17,19.
The prior ploidy probability is set as 0.01, 0.01, 0.97, 0.01 respectively for ploidy 0,1,2,3.
I checked the raw read count and found out that though the 3 ploidy samples do have significantly higher reads covered in chr 16,17,19. Their total read counts across the whole genome are also significantly higher than other samples. while the read count fraction for chr16,17,19 remains similar across this batch of samples.
Still, DetermineGermlineContigPloidy determines poly ploidy for these samples.
Here are the TSV file recording: BAM_ID read_count_chr16 read_count_total fraction_of_read_count_chr16
Here are the command line syntax I'm using. The GATK version is 4.2.6.1:
${gatk} DetermineGermlineContigPloidy \
-L ${probe_dir}/${probe}.cohort.gc.filtered.interval_list \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-099.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-100.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160066.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A130542.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A130540.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160134A.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160135A.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID15-018.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160353.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160354.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160788A.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160790B.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160792B.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-178.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-179.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-203A.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-204A.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-205A.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-212.counts.hdf5 \
-I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-213.counts.hdf5 \
--contig-ploidy-priors ${probe_dir}/contig_ploidy_priors_table.tsv \
-imr OVERLAPPING_ONLY \
--output ${ploidy_dir} \
--output-prefix ${ploidy_subdir_prefix} \
--verbosity DEBUGHere is the error log:
18:23:00.742 WARNING gcnvkernel.structs.metadata - Sample A160066 has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy
18:23:00.744 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A130542" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_3"...
18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr16. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr17. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr22. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy
18:23:00.746 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A130540" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_4"...
18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr16. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr17. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr22. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
18:23:00.747 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160134A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_5"...
18:23:00.748 WARNING gcnvkernel.structs.metadata - Sample A160134A has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy
18:23:00.749 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160135A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_6"...
18:23:00.751 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID15-018" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_7"...
18:23:00.751 WARNING gcnvkernel.structs.metadata - Sample PID15-018 has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploid
18:23:00.752 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160353" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_8"...
18:23:00.754 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160354" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_9"...
18:23:00.756 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160788A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_10"...
18:23:00.757 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160790B" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_11"...
18:23:00.759 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160792B" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_12"...
18:23:00.760 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-178" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_13"...
18:23:00.762 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-179" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_14"...
18:23:00.764 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-203A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_15"...
18:23:00.765 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-204A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_16"...
18:23:00.767 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-205A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_17"...
18:23:00.768 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-212" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_18"...
18:23:00.770 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-213" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_19" -
Hi @Yangyxt
Can you check the capture metrics of your exome samples?
Especially check zero coverage target percentage, AT dropout rate and GC dropout rate.
Any obviously outlier samples will give you false trisomy results regardless of what you do. Only solution is to make a new capture with a clean non degraded sample. I had this issue many times in the past and really there is no better way to solve it.
Please sign in to leave a comment.
12 comments