How to set a COMPRESSION_LEVEL of ApplyBQSR
a) GATK version used :4.1.6.0
b) Exact GATK commands used : ApplyBQSR
c) The entire error log if applicable.
How to set a COMPRESSION_LEVEL of ApplyBQSR, I found that the output bam file is twice the size of the original bam file while the the original bam is COMPRESSION_LEVEL=2
-
Hi Nickier
Htsjdk has a property that controls compression level.
You can use this java option to tweak it:
--java-options -Dsamjdk.compression_level=Xwhere X is 0-9, default is 2 since we found that that is the best balance between file size and time taken to read and write the file. -
Hi Bhanu Gandham ,Thanks a lot ~ I will have a try.
-
Compression level 2 effectively doubles the file size. I would be interested to know what kind of speed advantage are we talking about here to outweigh this massive waste of disk space. The BQSR processing takes a long time, so you don't want to re-run it every time you need to use your bam files. Therefore the recalibrated files are the ones you want to keep for long time storage -> MASSIVE WASTE OF DISK SPACE. You can effectively store only half as much samples you could with regular bam files. I am looking into how to compress these monster files now that I have made the mistake of using the default compression level of 2 for a long time.
-
Hi registered_user,
Did the solutions you posted in your other thread (https://gatk.broadinstitute.org/hc/en-us/community/posts/4407291540507-ApplyBQSR-wastes-too-much-disk-space) effectively answer the question that you had here?
Kind regards,
Pamela
-
Hi Pamela Bretscher, yes I think so. Thanks.
-
registered_user glad to hear!
Please sign in to leave a comment.
6 comments