Sharded VCFs are not merged into final combined VCF file
Hello,
I've run germline WDL pipeline on 2838 individuals and hg38 assembly.
The code was taken from here:
and linked from here (The JointGenotyping workflow takes the GVCF output ...):
https://github.com/gatk-workflows/gatk4-germline-snps-indels
My final output contains 7094 files. Sharded VCF files are not merged back into a single VCF or chromosome-based VCFs. Is this normal and I need to merged it manually?
Also, I am confused which WDL should I use in future as Terra contains different and older code:
https://github.com/gatk-workflows/gatk4-germline-snps-indels/tree/1.1.1
Thank you!
-
Hi Georgi Hudjashov,
Thanks for reaching out. It looks like you are referencing an outdated workspace, as indicated in the dashboard:
The new workspace with the most up-to-date workflows can be found here: https://app.terra.bio/#workspaces/help-gatk/GATK4-Germline-Preprocessing-VariantCalling-JointCalling
Instructions on how to run the workflows can be found on the workspace dashboard. Please let me know if you have any other questions!
Best,
Samantha
-
Hi Samantha,
I am very confused and cannot find the link to the workspace you are referring to. I followed GATK / Getting Started / Best Practices / Workflows on the main page.
Then I click on "Generic germline short variant joint genotyping / Terra hg38" which brings me here
https://app.terra.bio/#workspaces/help-gatk/Germline-SNPs-Indels-GATK4-hg38
On Workflows / Joint Discovery tab there is a github link
https://github.com/gatk-workflows/gatk4-germline-snps-indels/tree/1.1.1
This is a v.1.1.1 of workflow and on the Master page
https://github.com/gatk-workflows/gatk4-germline-snps-indels
I am being referred to "The JointGenotyping workflow takes the GVCF output...", which I have used. https://github.com/broadinstitute/warp/tree/develop/pipelines/broad/dna_seq/germline/joint_genotyping
Could you please fix this as this is really confusing?
Do you know if there are any major differences between the version I used and the latest version you are referring to? I will manually merge files, but it is vital for me to know if there are any problems in the old workflow. I spend too much CPU time and would like to avoid re-running it by all means.
Thank you for your help!
Best wishes,
Georgi
-
Hi Georgi Hudjashov,
Sorry for the confusion. It looks like the link in that article you mentioned needs to be updated. I'll put in a documentation request to get that changed. However, the workspace dashboard "Germline-SNPs-Indels-GATK4-hg38" has a note about it being out of date, as you can see in the screenshot I shared. It also links to the new workspace, "GATK4-Germline-Preprocessing-VariantCalling-JointCalling." This new workspace combines the hg38 and b37 workflows, instead of having them live in separate workspaces. Since then, the workflows in the old workspace hasn’t been updated.
That said, the JointGenotyping workflow in the new workspace is actually the same as the one you mentioned that you ran, so there is no need to rerun it.
Best,
Samantha
Please sign in to leave a comment.
3 comments