Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

What is contig ploidy priors table and how to make it?

Answered
0

15 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Muhammad Shoaib Akhtar,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Please see the tool index documentation for DetermineGermlineContigPloidy which contains information on the priors table here.

    0
    Comment actions Permalink
  • Avatar
    Dr N Ch

    Hi

    How can i make/get this polidy tsv file...

    "A TSV file specifying prior probabilities for each integer ploidy state and for each contig is required in this mode"..

    Do provide me the required info please!

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Dr N Ch Here's the one I used. You can customize as per your needs.

    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
    chr1 0.01 0.01 0.97 0.01
    chr2 0.01 0.01 0.97 0.01
    chr3 0.01 0.01 0.97 0.01
    chr4 0.01 0.01 0.97 0.01
    chr5 0.01 0.01 0.97 0.01
    chr6 0.01 0.01 0.97 0.01
    chr7 0.01 0.01 0.97 0.01
    chr8 0.01 0.01 0.97 0.01
    chr9 0.01 0.01 0.97 0.01
    chr10 0.01 0.01 0.97 0.01
    chr11 0.01 0.01 0.97 0.01
    chr12 0.01 0.01 0.97 0.01
    chr13 0.01 0.01 0.97 0.01
    chr14 0.01 0.01 0.97 0.01
    chr15 0.01 0.01 0.97 0.01
    chr16 0.01 0.01 0.97 0.01
    chr17 0.01 0.01 0.97 0.01
    chr18 0.01 0.01 0.97 0.01
    chr19 0.01 0.01 0.97 0.01
    chr20 0.01 0.01 0.97 0.01
    chr21 0.01 0.01 0.97 0.01
    chr22 0.01 0.01 0.97 0.01
    chrX 0.01 0.49 0.49 0.01
    chrY 0.5 0.5 0 0
    chr11_JH159136v1_alt 0.01 0.01 0.97 0.01
    chr11_JH159136v1_alt 0.01 0.01 0.97 0.01
    chr11_JH159137v1_alt 0.01 0.01 0.97 0.01
    chr6_GL000252v2_alt 0.01 0.01 0.97 0.01
    1
    Comment actions Permalink
  • Avatar
    Dr N Ch

    Hi... thank you very much..
    May I know on what criteria it is made... and how do I customize it as well..

    Thanks once again

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Dr N Ch Are you dealing with human genome?

    0
    Comment actions Permalink
  • Avatar
    Mehdi

    Hi Muhammad,
    I'm working on the camel. Genome is assembled at chromosome level but as always there is a lot of contigs in the ref. For making the priority table should I discard them in the sam/bam file? As you know, all of the contigs in the SAM/BAM files (in fact in the read count files) should be presented in the priority table.

    Thanks.

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Dr N Ch genome on which you're working is diploid? If so then just replace contig_names with your contigs except for chrX and chrY.

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Mehdi make contig ploidy priors as per your reference. You can make this table in accordance with your reference. Yes, all of the contigs in the SAM/BAM files (in fact in the read count files) should be presented in the priority table.

    0
    Comment actions Permalink
  • Avatar
    Mehdi

    Muhammad Shoaib Akhtar

    Thanks, my problem is that there are a lot of contigs and as it seems I should list all of them in the priority table. I'm going to remove them (contigs with the unknown position/chr) from the sam/bam file. Do you think is it ok? Also, I'm going to replace contig names that have a corresponding chromosome name in the reference genome (Ref just has contig names but I have a table that contains contig names and corresponding chr name ). Do you think it is ok and can you recommend a tool or code chunk for it? Thanks.

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Mehdi Sorry for delayed response.

    Your strategy seems fine to me. To rename chromosomes, I think you can simply use 'sed' command in linux shell. Use to 'man sed' to see 'sed' options.

    0
    Comment actions Permalink
  • Avatar
    Dr N Ch

    Muhammad Shoaib Akhtar: Yes I am working on human genome! how can in make changes as per my sample sets.

    0
    Comment actions Permalink
  • Avatar
    Muhammad Shoaib Akhtar

    Dr N Ch I think you can use same as I uploaded in above comments. 

    For all autosomal chromosomes, it should be as follows: Please add chromosome names as they're in your reference genome e.g., it maybe chr1 or 1 etc.

    chr1 0.01 0.01 0.97 0.01

    For X chromosome,

    chrX 0.01 0.49 0.49 0.01

    For Y chromosome,

    chrY 0.5 0.5 0 0
    0
    Comment actions Permalink
  • Avatar
    Dr N Ch

    Thank u..

    1
    Comment actions Permalink
  • Avatar
    Mehdi

    Hi Muhammad

    Thanks for your comments. I've used a pipe of "egrep" and "awk" on the SAM files and it worked very well. Also, "sed" worked well on ref. genome.

    Best.

     

    2
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk