No downsampling with max-reads-per-alignment-start
GATK version used: 4.2.2.0
Parameters besides input and output: --max-reads-per-alignment-start 1000 --emit-ref-confidence GVCF
HaplotypeCaller was run on many samples. In all gvcf files, the depths of genotype calls are from 20k to 30k, about the raw sequencing coverage (amplicon sequencing). It seem no downsampling was done. I am wondering if anyone has seen similar results. Do I miss something with that parameter?
-
Hi Liang Ye,
I believe it would be expected not to see any downsampling occurring when the --max-reads-per-alignment-start argument is set as high as 1000. For example, if reads were 150 bases long, there could be as many as 150,000 reads overlapping that site (If 1000 reads can start at position 500, you can also have 1000 at position 499, 498, etc. which will all overlap that site). I hope this is helpful in explaining the results you are seeing, please let me know if this does not make sense.
Kind regards,
Pamela
-
Thanks Pamela! Thought about that too but I didn't realize tagmentation works so well for a 1.5kb amplicon. It looks there are starting sites all over the positions though the number at each site varies a lot.
-
Hi Liang Ye,
Yes, I think your results are expected given your data and arguments used. If you would like to remove reads to reach a specific coverage, you can use DownSampleSam or you could try specifying a lower --max-reads-per-alignment-start to reach your desired coverage.
Kind regards,
Pamela
Please sign in to leave a comment.
3 comments