Gatk4 rnaseq germline snps indels json file
AnsweredI am trying to use the "Gatk4 rnaseq germline snps indels" workflow from the github (https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels). I want to run it locally on my linux environment.
When I'm editing the json file I notice there are resource file inputs of vcf's for known variants. Is it possible to run the workflow with these fields empty? Or is it a requirement that I have vcfs?
I ask this because we are trying to look at all the variants (including those that may not have been previously identified).
When I try to run the workflow with this in my json:
"##_COMMENT3": "RESOURCE FILES",
"RNAseq.dbSnpVcf": {},
"RNAseq.dbSnpVcfIndex": {},
"RNAseq.knownVcfs": [],
"RNAseq.knownVcfsIndices": [],
"RNAseq.annotationsGTF": "/home/alan/AbSeq/SNV/BROAD/manually/gatk-workflows/inputs/kx576660gtf.gtf",
I get his error:
[2022-05-19 11:01:55,85] [error] WorkflowManagerActor Workflow 05e4d83c-960e-4aed-824a-0bbffe0ec644 failed (during MaterializingWorkflowDescriptorState): cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
No coercion defined from '{}' of type 'spray.json.JsObject' to 'File'.
No coercion defined from '{}' of type 'spray.json.JsObject' to 'File'.
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle
-
Thank you for your post, Alan Foley! I want to let you know we have received your question. We'll get back to you if we have any updates or follow up questions.
Please see our Support Policy for more details about how we prioritize responding to questions.
-
Hi Alan Foley,
The knownVcfs input here is only used in the BaseQualityScoreRecalibrator to mask out the sites with known variation. This is because that tools assumes that any mismatch base from the reference is an error. Removing common known variation helps that assumption hold, however obviously doesn't catch all possible variation. Usually this is fine because there is an overwhelming amount of data that matches the reference and a relatively small number of sites that have novel variation.
dbSnp is included both in BaseQualityScoreRecalibrator for the same reason as knownVcfs and also in HaplotypeCaller so that it will label dbSnp sites in your output VCF. It won't change the variants you discover in your sample at all.
So in both of these cases you will still be able to discover novel variants and the best practices recommendation is to include them so that BaseQualityScoreRecalibrator works optimally. If there is another reason besides wanting to find novel variants that you don't want to include them (such as not having a known variants dataset for your organism) then I'd recommend looking at the BaseQualityScoreRecalibrator documentation to get some more ideas. These inputs are all currently required in the WDL so if you do decide not to include them you'll need to edit the WDL to make those inputs optional (or remove them entirely). I hope this helps!
-
Thanks for this reply!
In fact I went ahead and manually performed the steps instead of using the JSON.
There were a few changes I needed to make.
Alan
Please sign in to leave a comment.
3 comments