DetermineGermlineContigPloidy; Results does not contain chrY
REQUIRED for all errors and issues:
a) GATK version used: 4.3.0.0
b) Exact command used:
gatk DetermineGermlineContigPloidy -L filtered_intervals.interval_list -I cnv/gatk_cnv/read_counts/GH20130774-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130800-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130842-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130817-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130873-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130857-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130815-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130796-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130847-1.counts.hdf5 -I cnv/gatk_cnv/read_counts/GH20130801-1.counts.hdf5 --contig-ploidy-priors wes_contig_ploidy_prior.tsv --output cnv/gatk_cnv/ploidy/ --verbosity DEBUG --interval-merging-rule OVERLAPPING_ONLY --output-prefix ploidy
The command runs perfectly and generates ploidy values for all the chromosomes except chrY even though in the plidy_prior.tsv file, chrY is also specified
-
Please I'll be grateful if someone can help me to resolve this issue.
Thank you -
Do you have intervals from chrY within your interval_list file that you used to collect read counts?
-
thanks for the swift response Gokalp Celik. I just went through my intervals list and realised there are no intervals for chrY in the filtered intervals. But in all the other interval files I used prior to filtering, they were present. What do you think I should do?
-
FilterIntervals tool should have removed those chrY intervals due to cutoffs selected in the tool.
Can you check to see if those intervals are populated accordingly in your samples? Are your samples all from a single gender (Females?)? How many intervals are present outside of PAR regions of Y? Can you check the annotation parameters of those intervals to see if they are extremely unmappable or found within segmental duplication tracks?
-
I am not really sure of the genders of the samples. It looks like majority of intervals fall within the PAR regions of chromosome Y. I didn't add mappable-track or segmental-duplication files
gatk FilterIntervals \-L {input.preprocessed_targets} \--interval-merging-rule OVERLAPPING_ONLY \-I cnv/gatk_cnv/read_counts/GH20130774-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130800-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130842-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130817-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130873-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130857-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130815-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130796-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130847-1.counts.hdf5 \-I cnv/gatk_cnv/read_counts/GH20130801-1.counts.hdf5 \--annotated-intervals {input.annotated_intervals} \--output {output.filtered_intervals} &> {log} -
Hello Gökalp Çelik
The GC content values for the annotated intervals of chrY intervals were between 0.3 and 0.7 -
You seem to have a low sample count and it is quite possible that those intervals were removed due to this low number of samples and high level fluctuation between those samples.
FilterIntervals has count based filters
--low-count-filter-count-threshold <Integer>
Count-threshold parameter for the low-count filter. Intervals with a count strictly less
than this threshold in a percentage of samples strictly greater than
low-count-filter-percentage-of-samples will be filtered out. (This is the first
count-based filter applied.) Default value: 10.
--low-count-filter-percentage-of-samples <Double>
Percentage-of-samples parameter for the low-count filter. Intervals with a count strictly
less than low-count-filter-count-threshold in a percentage of samples strictly greater
than this will be filtered out. (This is the first count-based filter applied.) Default
value: 50.0.--extreme-count-filter-maximum-percentile <Double>
Maximum-percentile parameter for the extreme-count filter. Intervals with a count that
has a percentile strictly greater than this in a percentage of samples strictly greater
than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second
count-based filter applied.) Default value: 99.0.
--extreme-count-filter-minimum-percentile <Double>
Minimum-percentile parameter for the extreme-count filter. Intervals with a count that
has a percentile strictly less than this in a percentage of samples strictly greater than
extreme-count-filter-percentage-of-samples will be filtered out. (This is the second
count-based filter applied.) Default value: 1.0.
--extreme-count-filter-percentage-of-samples <Double>
Percentage-of-samples parameter for the extreme-count filter. Intervals with a count that
has a percentile outside of [extreme-count-filter-minimum-percentile,
extreme-count-filter-maximum-percentile] in a percentage of samples strictly greater than
this will be filtered out. (This is the second count-based filter applied.) Default
value: 90.0.It is possible that your intervals might have been filtered due to these and therefore got lost during the whole analysis.
3 things that you can do to remedy this issue.
1- Add more samples to get a meaningful cohort set for these thresholds such as >30
2- Remove FilterIntervals step and use your intervals as is during the analysis.
3- Try to flex those parameters even further to ensure that there are enough intervals left from chrY for analysis. PAR regions may be removed regardless if they are hard masked on chrY.
Second one might result in heavy fluctuations between segments therefore is less desirable.
I hope this helps.
-
Gökalp Çelik, thank you so much for the clarification and your time. I am most grateful.
I will get access to a larger sample size soon. I will report back to you with an update when I run with a large enough sample size.
Please sign in to leave a comment.
8 comments