Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK4.1.7.0 Germline CNV DetermineGermlineContigPloidy predicts a trisomy

Answered
0

12 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Lithium22,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

    0
    Comment actions Permalink
  • Avatar
    Lithium22
    Thank you for your answer.
    Ok the first point is a general question on the GATK usage, however the second point concerns an abnormal return from the tool.
    Is it necessary to create a new thread with the second problem in the forum?
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Lithium22 the warning message suggests this is a limitation of the tool and to verify the ploidy you would need to use orthogonal methods:

    WARNING gcnvkernel.structs.metadata - Sample has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.

    We apologize for not being able to resolve this issue but the message above is the extent of the support we can provide, as there are some limitations of the tools. However, if someone from the community has more knowledge on the this topic or has information that can help in this case we encourage you contribute here.

    0
    Comment actions Permalink
  • Avatar
    SkyWarrior

    One thing to make sure that this prediction does not happen is to prevent region filtering as much as possible from other chromosomes. Firstly, manually adjust GC rich interval sizes to be within allowed limit or remove GC content filtration. Secondly remove segment duplication filter if you are using it within your annotation file. Third allow more regions to be included in the mappability filter track if you are using it.

    This may increase the number of false positive sites slightly (but this can be eliminated using a comparative approach within the cohort therefore only unique events will prevail) however your false contig ploidy inferences will be greatly reduced. Also try to increase the sample size.

    0
    Comment actions Permalink
  • Avatar
    Lithium22

    Hello SkyWarrior,

    Thank you very much for your response and your time.

    My patient cohort comes from different sequencing technologies.
    Therefore, I created a intervals list based on the GRCh38 exonic positions.
    I did not filter this interval file.

    Your advice either to increase the patients number in the cohort made it possible to obtain a correct ploidy prediction for chromosome 19.
    However, the associated GQs are rather weak with each prediction (between 10 and 20).
    To get a correct prediction, I grouped the patients by technology.
    Indeed if I incorporate them all together during the ploidy predictions these are again erroneous.

    0
    Comment actions Permalink
  • Avatar
    Dr N Ch

    Lithium22: How was this made/obtained? contig_ploidie_prior.tsv - Thank you

    0
    Comment actions Permalink
  • Avatar
    Lithium22

    Hello Dr N Ch,
    Below is my contig_ploidy_prior.tsv file.
    The proportions have been adjusted to my patient cohort which is almost entirely male.

    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
    chr1 0.01 0.01 0.97 0.01
    chr2 0.01 0.01 0.97 0.01
    chr3 0.01 0.01 0.97 0.01
    chr4 0.01 0.01 0.97 0.01
    chr5 0.01 0.01 0.97 0.01
    chr6 0.01 0.01 0.97 0.01
    chr7 0.01 0.01 0.97 0.01
    chr8 0.01 0.01 0.97 0.01
    chr9 0.01 0.01 0.97 0.01
    chr10 0.01 0.01 0.97 0.01
    chr11 0.01 0.01 0.97 0.01
    chr12 0.01 0.01 0.97 0.01
    chr13 0.01 0.01 0.97 0.01
    chr14 0.01 0.01 0.97 0.01
    chr15 0.01 0.01 0.97 0.01
    chr16 0.01 0.01 0.97 0.01
    chr17 0.01 0.01 0.97 0.01
    chr18 0.01 0.01 0.97 0.01
    chr19 0.01 0.01 0.97 0.01
    chr20 0.01 0.01 0.97 0.01
    chr21 0.01 0.01 0.97 0.01
    chr22 0.01 0.01 0.97 0.01
    chrX 0.01 0.92 0.06 0.01
    chrY 0.01 0.97 0.01 0.01
    chrM 0.01 0.01 0.97 0.01
    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Lithium22,

    Thank you for providing your file! Dr N Ch, does this help answer your question?

    1
    Comment actions Permalink
  • Avatar
    Dr N Ch

    Thank you Lithium22!

    Mine is a  mixed population and yes humans! Can u let me know how are the proportions adjusted!

    0
    Comment actions Permalink
  • Avatar
    Lithium22

    The proportions are adjusted for the X chromosome as I have almost only males I don't expect a diploidy for this chromosome.
    If you have for example a population composed of 50% males and 50% females your file should probably look like :

    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
    chrX 0.01 0.49 0.49 0.01
    chrY 0.01 0.97 0.01 0.01

    0
    Comment actions Permalink
  • Avatar
    Yangyxt

    Hi, I've ran into this issue recently.

    Basically, I'm using 43 samples (we only have 43 WES samples captured with the same set of probe) to run DetermineGermlineContigPloidy in COHORT mode. 

    Several of the sample ID has anomalous ploidy state for contain 16,17,19. 

    The prior ploidy probability is set as 0.01, 0.01, 0.97, 0.01 respectively for ploidy 0,1,2,3.

    I checked the raw read count and found out that though the 3 ploidy samples do have significantly higher reads covered in chr 16,17,19. Their total read counts across the whole genome are also significantly higher than other samples. while the read count fraction for chr16,17,19 remains similar across this batch of samples.

    Still, DetermineGermlineContigPloidy determines poly ploidy for these samples.

    Here are the TSV file recording: BAM_ID read_count_chr16 read_count_total fraction_of_read_count_chr16

    Here are the command line syntax I'm using. The GATK version is 4.2.6.1:

    ${gatk} DetermineGermlineContigPloidy \
            -L ${probe_dir}/${probe}.cohort.gc.filtered.interval_list \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-099.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-100.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160066.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A130542.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A130540.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160134A.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160135A.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID15-018.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160353.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160354.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160788A.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160790B.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/A160792B.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-178.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-179.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-203A.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-204A.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-205A.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-212.counts.hdf5 \
            -I /paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/PID16-213.counts.hdf5 \
            --contig-ploidy-priors ${probe_dir}/contig_ploidy_priors_table.tsv \
            -imr OVERLAPPING_ONLY \
            --output ${ploidy_dir} \
            --output-prefix ${ploidy_subdir_prefix} \
            --verbosity DEBUG

    Here is the error log:

    18:23:00.742 WARNING gcnvkernel.structs.metadata - Sample A160066 has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy
    18:23:00.744 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A130542" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_3"...
    18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr16. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr17. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous ploidy (3) for contig chr22. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.744 WARNING gcnvkernel.structs.metadata - Sample A130542 has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy
    18:23:00.746 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A130540" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_4"...
    18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr16. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr17. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.746 WARNING gcnvkernel.structs.metadata - Sample A130540 has an anomalous ploidy (3) for contig chr22. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy d
    18:23:00.747 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160134A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_5"...
    18:23:00.748 WARNING gcnvkernel.structs.metadata - Sample A160134A has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy
    18:23:00.749 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160135A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_6"...
    18:23:00.751 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID15-018" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_7"...
    18:23:00.751 WARNING gcnvkernel.structs.metadata - Sample PID15-018 has an anomalous karyotype ({'X': 2, 'Y': 1}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploid
    18:23:00.752 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160353" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_8"...
    18:23:00.754 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160354" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_9"...
    18:23:00.756 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160788A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_10"...
    18:23:00.757 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160790B" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_11"...
    18:23:00.759 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "A160792B" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_12"...
    18:23:00.760 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-178" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_13"...
    18:23:00.762 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-179" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_14"...
    18:23:00.764 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-203A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_15"...
    18:23:00.765 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-204A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_16"...
    18:23:00.767 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-205A" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_17"...
    18:23:00.768 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-212" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_18"...
    18:23:00.770 INFO gcnvkernel.io.io_ploidy - Saving posteriors for sample "PID16-213" in "/paedyl01/disk1/yangyxt/wes/healthy_bams_for_CNV/using_xgenres_probe/xgenres_ploidy_model/xgenres_ploidy_normal_cohort-calls/SAMPLE_19"

     

    0
    Comment actions Permalink
  • Avatar
    SkyWarrior

    Hi @Yangyxt

    Can you check the capture metrics of your exome samples?

    Especially check zero coverage target percentage, AT dropout rate and GC dropout rate.

    Any obviously outlier samples will give you false trisomy results regardless of what you do. Only solution is to make a new capture with a clean non degraded sample. I had this issue many times in the past and really there is no better way to solve it.

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk