Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

DetermineGermlineContigPloidy; Results does not contain chrY

0

8 comments

  • Avatar
    Samuel Quaynor

    Please I'll be grateful if someone can help me to resolve this issue.
    Thank you

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Samuel Quaynor

    Do you have intervals from chrY within your interval_list file that you used to collect read counts?

    0
    Comment actions Permalink
  • Avatar
    Samuel Quaynor

    thanks for the swift response Gokalp Celik. I just went through my intervals list and realised there are no intervals for chrY in the filtered intervals. But in all the other interval files I used prior to filtering, they were present. What do you think I should do?

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    FilterIntervals tool should have removed those chrY intervals due to cutoffs selected in the tool. 

    Can you check to see if those intervals are populated accordingly in your samples? Are your samples all from a single gender (Females?)? How many intervals are present outside of PAR regions of Y? Can you check the annotation parameters of those intervals to see if they are extremely unmappable or found within segmental duplication tracks?

    0
    Comment actions Permalink
  • Avatar
    Samuel Quaynor

    I am not really sure of the genders of the samples. It looks like majority of intervals fall within the PAR regions of chromosome Y. I didn't add mappable-track or segmental-duplication files

    gatk FilterIntervals \
    -L {input.preprocessed_targets} \
    --interval-merging-rule OVERLAPPING_ONLY \
    -I cnv/gatk_cnv/read_counts/GH20130774-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130800-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130842-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130817-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130873-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130857-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130815-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130796-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130847-1.counts.hdf5 \
    -I cnv/gatk_cnv/read_counts/GH20130801-1.counts.hdf5 \
    --annotated-intervals {input.annotated_intervals} \
    --output {output.filtered_intervals} &> {log}


    0
    Comment actions Permalink
  • Avatar
    Samuel Quaynor

    Hello Gökalp Çelik
    The GC content values for the annotated intervals of chrY intervals were between 0.3 and 0.7

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    You seem to have a low sample count and it is quite possible that those intervals were removed due to this low number of samples and high level fluctuation between those samples. 

    FilterIntervals has count based filters 

    --low-count-filter-count-threshold <Integer>
                                  Count-threshold parameter for the low-count filter.  Intervals with a count strictly less
                                  than this threshold in a percentage of samples strictly greater than
                                  low-count-filter-percentage-of-samples will be filtered out.  (This is the first
                                  count-based filter applied.)  Default value: 10. 

    --low-count-filter-percentage-of-samples <Double>
                                  Percentage-of-samples parameter for the low-count filter.  Intervals with a count strictly
                                  less than low-count-filter-count-threshold in a percentage of samples strictly greater
                                  than this will be filtered out.  (This is the first count-based filter applied.)  Default
                                  value: 50.0.

    --extreme-count-filter-maximum-percentile <Double>
                                  Maximum-percentile parameter for the extreme-count filter.  Intervals with a count that
                                  has a percentile strictly greater than this in a percentage of samples strictly greater
                                  than extreme-count-filter-percentage-of-samples will be filtered out.  (This is the second
                                  count-based filter applied.)  Default value: 99.0. 

    --extreme-count-filter-minimum-percentile <Double>
                                  Minimum-percentile parameter for the extreme-count filter.  Intervals with a count that
                                  has a percentile strictly less than this in a percentage of samples strictly greater than
                                  extreme-count-filter-percentage-of-samples will be filtered out.  (This is the second
                                  count-based filter applied.)  Default value: 1.0. 

    --extreme-count-filter-percentage-of-samples <Double>
                                  Percentage-of-samples parameter for the extreme-count filter.  Intervals with a count that
                                  has a percentile outside of [extreme-count-filter-minimum-percentile,
                                  extreme-count-filter-maximum-percentile] in a percentage of samples strictly greater than
                                  this will be filtered out.  (This is the second count-based filter applied.)  Default
                                  value: 90.0.

    It is possible that your intervals might have been filtered due to these and therefore got lost during the whole analysis. 

    3 things that you can do to remedy this issue. 

    1- Add more samples to get a meaningful cohort set for these thresholds such as >30

    2- Remove FilterIntervals step and use your intervals as is during the analysis. 

    3- Try to flex those parameters even further to ensure that there are enough intervals left from chrY for analysis. PAR regions may be removed regardless if they are hard masked on chrY. 

    Second one might result in heavy fluctuations between segments therefore is less desirable. 

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Samuel Quaynor

    Gökalp Çelik, thank you so much for the clarification and your time. I am most grateful.
    I will get access to a larger sample size soon. I will report back to you with an update when I run with a large enough sample size.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk