What is contig ploidy priors table and how to make it?
AnsweredI am using GATK 4.1.2.0.
For germline copy number variation analysis, I have to run GATK tool DetermineGermlineContigPloidy. I have high coverage target sequencing data. How I can make Contig Ploidy Priors Table which is required to run the tool?
-
Hi Muhammad
Thanks for your comments. I've used a pipe of "egrep" and "awk" on the SAM files and it worked very well. Also, "sed" worked well on ref. genome.
Best.
-
The recommended file for hg38 exomes is gs://gatk-sv-resources-public/gcnv-exome/contig_ploidy_prior_hg38.tsv. The ploidy_prior columns refer to the copy number of the contig of interest and the value is the prior probability that that contig takes on that copy number in any given sample. The numbers in each row should sum to one. Mostly this file is a way to specify the sex chromosomes and the expected counts of X and Y for males and females. For humans, autosomes should be most likely to take on ploidy 2, but zero, one or three copies are also possible, but unlikely. For chrX, ploidy 1 or ploidy 2 are equally likely, i.e. we don't make any assumptions about the sample's sex and often use this tool to determine it. I believe there do exist very rare samples with sex genotype XYY, but this priors table won't support that option because the prior probability of ploidy 2 on Y is zero.
-
Dr N Ch Here's the one I used. You can customize as per your needs.
CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
chr1 0.01 0.01 0.97 0.01
chr2 0.01 0.01 0.97 0.01
chr3 0.01 0.01 0.97 0.01
chr4 0.01 0.01 0.97 0.01
chr5 0.01 0.01 0.97 0.01
chr6 0.01 0.01 0.97 0.01
chr7 0.01 0.01 0.97 0.01
chr8 0.01 0.01 0.97 0.01
chr9 0.01 0.01 0.97 0.01
chr10 0.01 0.01 0.97 0.01
chr11 0.01 0.01 0.97 0.01
chr12 0.01 0.01 0.97 0.01
chr13 0.01 0.01 0.97 0.01
chr14 0.01 0.01 0.97 0.01
chr15 0.01 0.01 0.97 0.01
chr16 0.01 0.01 0.97 0.01
chr17 0.01 0.01 0.97 0.01
chr18 0.01 0.01 0.97 0.01
chr19 0.01 0.01 0.97 0.01
chr20 0.01 0.01 0.97 0.01
chr21 0.01 0.01 0.97 0.01
chr22 0.01 0.01 0.97 0.01
chrX 0.01 0.49 0.49 0.01
chrY 0.5 0.5 0 0
chr11_JH159136v1_alt 0.01 0.01 0.97 0.01
chr11_JH159136v1_alt 0.01 0.01 0.97 0.01
chr11_JH159137v1_alt 0.01 0.01 0.97 0.01
chr6_GL000252v2_alt 0.01 0.01 0.97 0.01 -
Thank u..
-
Isadora Machado Ghilardi that's a google cloud path. You can access files like that with the `gcloud` utilities or, since that one's public, though the google cloud console: https://storage.googleapis.com/gatk-sv-resources-public/gcnv-exome/contig_ploidy_prior_hg38.tsv
-
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
Please see the tool index documentation for DetermineGermlineContigPloidy which contains information on the priors table here.
-
Hi
How can i make/get this polidy tsv file...
"A TSV file specifying prior probabilities for each integer ploidy state and for each contig is required in this mode"..
Do provide me the required info please!
-
Hi... thank you very much..
May I know on what criteria it is made... and how do I customize it as well..Thanks once again
-
Dr N Ch Are you dealing with human genome?
-
Hi Muhammad,
I'm working on the camel. Genome is assembled at chromosome level but as always there is a lot of contigs in the ref. For making the priority table should I discard them in the sam/bam file? As you know, all of the contigs in the SAM/BAM files (in fact in the read count files) should be presented in the priority table.Thanks.
-
Dr N Ch genome on which you're working is diploid? If so then just replace contig_names with your contigs except for chrX and chrY.
-
Mehdi make contig ploidy priors as per your reference. You can make this table in accordance with your reference. Yes, all of the contigs in the SAM/BAM files (in fact in the read count files) should be presented in the priority table.
-
Thanks, my problem is that there are a lot of contigs and as it seems I should list all of them in the priority table. I'm going to remove them (contigs with the unknown position/chr) from the sam/bam file. Do you think is it ok? Also, I'm going to replace contig names that have a corresponding chromosome name in the reference genome (Ref just has contig names but I have a table that contains contig names and corresponding chr name ). Do you think it is ok and can you recommend a tool or code chunk for it? Thanks.
-
Mehdi Sorry for delayed response.
Your strategy seems fine to me. To rename chromosomes, I think you can simply use 'sed' command in linux shell. Use to 'man sed' to see 'sed' options.
-
Muhammad Shoaib Akhtar: Yes I am working on human genome! how can in make changes as per my sample sets.
-
Dr N Ch I think you can use same as I uploaded in above comments.
For all autosomal chromosomes, it should be as follows: Please add chromosome names as they're in your reference genome e.g., it maybe chr1 or 1 etc.
chr1 0.01 0.01 0.97 0.01
For X chromosome,
chrX 0.01 0.49 0.49 0.01
For Y chromosome,
chrY 0.5 0.5 0 0
-
Hi Muhammad,
I am trying out this tool for first time. Can you please explain me what is this contig prior table and ploidy_prior 0,1,2,3. What are those numbers given and how are they determined
-
I'm looking for this document: gs://gatk-sv-resources-public/gcnv-exome/contig_ploidy_prior_hg38.tsv. To use in the setp - DetermineGermlineContigPloidy, but I can't find it.
-
CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
chr1 0.01 0.01 0.97 0.01
chr2 0.01 0.01 0.97 0.01
chr3 0.01 0.01 0.97 0.01
chr4 0.01 0.01 0.97 0.01
chr5 0.01 0.01 0.97 0.01
chr6 0.01 0.01 0.97 0.01
chr7 0.01 0.01 0.97 0.01
chr8 0.01 0.01 0.97 0.01
chr9 0.01 0.01 0.97 0.01
chr10 0.01 0.01 0.97 0.01
chr11 0.01 0.01 0.97 0.01
chr12 0.01 0.01 0.97 0.01
chr13 0.01 0.01 0.97 0.01
chr14 0.01 0.01 0.97 0.01
chr15 0.01 0.01 0.97 0.01
chr16 0.01 0.01 0.97 0.01
chr17 0.01 0.01 0.97 0.01
chr18 0.01 0.01 0.97 0.01
chr19 0.01 0.01 0.97 0.01
chr20 0.01 0.01 0.97 0.01
chr21 0.01 0.01 0.97 0.01
chr22 0.01 0.01 0.97 0.01
chrX 0.01 0.49 0.49 0.01
chrY 0.5 0.5 0 0
sample file as given above
Post is closed for comments.
20 comments