Best practices for splitting unmapped BAM files
Hi GATK Team,
I have ~350 BAM files that are about 150GB each in size. I have been using some of the GATK workflows as a reference but the SamToFastqAndBwaMemAndMba (mapping reads to reference) step is taking a while and I was wondering if there are best practices on how one should be splitting a BAM so they can run the mapping in parallel over multiple chunks?
Thank you,
Lindo
-
Hi Lindo Nkambule,
Alignment does take a long time. The Terra workspace says to expect 6.5 hours (but only $0.40) for the down sampled data in: https://app.terra.bio/#workspaces/warp-pipelines/Whole-Genome-Analysis-Pipeline/data. That smaller dataset could be a good way to test your pipeline.
-Laura
Please sign in to leave a comment.
1 comment