BQSR Spark: Why Beta?
AnsweredI'm upgrading a somatic variant calling pipeline from GATK3 to GATK4, and I see that multithreading is no longer a BQSR option in GATK4. The recommended approach appears to be BQSRPipelineSpark, which is still in beta.
Can someone at the Broad clarify the meaning of "beta" here? Is it in beta because of Spark issues and potential crashes, for example, or because the output may be incorrect?
What is the current best practice for parallelizing BQSR?
Thanks in advance!
-
Tavi Nathanson The reason why BQSR Spark is in BETA is that we have not formally evaluated the results to confirm that it is same as the normal BQSR version. When we do this formal evaluation, it will not be BETA anymore. However, most likely you will not have any issues with BQSR Spark in terms of the results, there may be spark specific issues, and you can always post on the forum if there are problems.
-
Hi Genevieve Brandt (she/her), thank you for the quick reply. Given that the differences may potentially not be limited to Spark/performance issues, what are you recommending to folks running production pipelines using GATK4 who need to (a) parallelize the BQSR run and (b) ensure correct output?
-
Tavi Nathanson unfortunately I do not have insight on what would be best in your case since the formal evaluation has not been completed.
If other users have tested these options please post your thoughts here!
Please sign in to leave a comment.
3 comments