Whole Genome pipe for use with non whole genome data
I have been tasked with using GATK best practices (WholeGenomeGermlineSingleSample_v3.1.14 WDL from WARP) to process non WG data, mainly capture data from 80 gene panel. I can do it technically, but is the result meaningful?
-
Technically yes but for the sake of fidelity we would not recommend the whole genome workflows to be used with small panels. One immediate issue would be the applicability of BQSR with such small amount of regions and reads and the other one would be the variant filtration parameters. You may modify parts of the workflow to fit your needs such as removing BQSR and you may need to work on the variant filtration parameters based on your data and expectations.
I hope this helps.
-
Thanks Gökalp!
Another question: if I removed the mapping step and made the pipe start from a trimmed down bam (originally WGS or exome) what issues do you foresee? MarkDuplicates, SortSam, CrossCheckFingerprints, CheckContamination, BaseRecalibrator would run, followed by the variant calling steps. My main concern is that I would have no idea what had previoulsy been done to the bam or what it was mapped with. Good chance it was made using the GATk best practices, so the aforementioned steps would end up being run twice on it.
Thanks in advance! -
BaseRecalibrator would be the obvious one to cause issues. Since you have a small panel this would mean small number of variant sites and small number of reads to be sampled. You may need to disable BQSR step as well.
To check what was done to those bam files previously you may check the bam header with samtools view -H to see the PG lines which usually includes command lines used to generate the bam up to that point.
Regards.
Please sign in to leave a comment.
3 comments