ApplyBQSR
REQUIRED for all errors and issues:
a) GATK version used: GATK4.3
b) Exact command used: gatk ApplyBQSR -R ../reference/Bos_taurus.ARS-UCD1.2.dna.toplevel.fa -I ../data/mapped/ABigar/Abi1_dedup.bam --bqsr-recal-file ../data/mapped/ABigar/AB_haplo/Abi1_recal.table1 -O ../data/mapped/ABigar/AB_haplo/Abi1_recal.bam
c) Entire program log:
Dear All, I hope everyone is doing well. When I run ApplyBQSR some of my samples generate all the chromosomes with MT as indicated below. The other samples came from a similar pipeline that missed MT after lists of chromosomes. In addition, "IntelInflater - Zero Bytes Written : 0" warning message and unmapped regions are generated at the end of the output. Therefore, I sincerely request your suggestions about the output to proceed to the next step. Thank you in advance for your valuable support!
The last portion of the result is posted below ....
22:40:14.206 INFO ProgressMeter - X:126181720 117.3 211182000 1800853.9
22:40:24.231 INFO ProgressMeter - X:128930737 117.4 211409000 1800224.7
22:40:34.253 INFO ProgressMeter - X:132515206 117.6 211701000 1800150.8
22:40:44.264 INFO ProgressMeter - X:136218098 117.8 211997000 1800113.8
22:40:54.265 INFO ProgressMeter - MT:2801 117.9 212241000 1799638.8
22:41:04.266 INFO ProgressMeter - NKLS02002208.1:4545561 118.1 212535000 1799588.0
22:41:14.268 INFO ProgressMeter - NKLS02000910.1:966969 118.3 212839000 1799621.9
22:41:24.278 INFO ProgressMeter - NKLS02002210.1:706467 118.4 213084000 1799155.7
22:41:34.300 INFO ProgressMeter - NKLS02002206.1:164260 118.6 213393000 1799227.0
22:41:44.321 INFO ProgressMeter - NKLS02002204.1:314438 118.8 213704000 1799315.4
22:41:54.346 INFO ProgressMeter - NKLS02001547.1:74827 118.9 214029000 1799520.2
22:42:04.378 INFO ProgressMeter - NKLS02000119.1:55243 119.1 214358000 1799756.3
22:42:14.401 INFO ProgressMeter - NKLS02001545.1:63062 119.3 214681000 1799943.7
22:42:24.409 INFO ProgressMeter - NKLS02000350.1:59646 119.4 214919000 1799422.7
22:42:34.437 INFO ProgressMeter - NKLS02001786.1:89090 119.6 215221000 1799433.2
22:42:44.441 INFO ProgressMeter - NKLS02001967.1:51106 119.8 215536000 1799558.2
22:42:54.468 INFO ProgressMeter - NKLS02000966.1:82063 119.9 215839000 1799577.1
22:43:04.489 INFO ProgressMeter - NKLS02001026.1:119477 120.1 216154000 1799697.4
22:43:14.505 INFO ProgressMeter - NKLS02001289.1:78180 120.3 216492000 1800009.7
22:43:24.517 INFO ProgressMeter - NKLS02002005.1:90153 120.4 216722000 1799425.8
22:43:34.516 INFO ProgressMeter - NKLS02000975.1:73447 120.6 217043000 1799600.7
22:43:44.540 INFO ProgressMeter - NKLS02001054.1:9675 120.8 217358000 1799719.7
22:43:54.566 INFO ProgressMeter - NKLS02001478.1:11884 120.9 217599000 1799225.6
22:44:04.587 INFO ProgressMeter - NKLS02000183.1:36852 121.1 217932000 1799493.9
22:44:14.589 INFO ProgressMeter - NKLS02000790.1:17570 121.3 218243000 1799585.1
22:44:24.600 INFO ProgressMeter - NKLS02000111.1:34997 121.4 218469000 1798973.3
22:44:34.630 INFO ProgressMeter - NKLS02001241.1:28790 121.6 218775000 1799016.7
22:44:44.653 INFO ProgressMeter - NKLS02000483.1:22422 121.8 219094000 1799168.4
22:44:54.656 INFO ProgressMeter - NKLS02001553.1:22892 121.9 219401000 1799226.2
22:45:04.671 INFO ProgressMeter - NKLS02001553.1:58333 122.1 219715000 1799338.2
22:45:14.697 INFO ProgressMeter - NKLS02000941.1:45041 122.3 220030000 1799455.4
22:45:24.725 INFO ProgressMeter - NKLS02000194.1:41972 122.4 220332000 1799465.6
22:45:34.732 INFO ProgressMeter - NKLS02001530.1:3825 122.6 220650000 1799611.5
22:45:44.734 INFO ProgressMeter - NKLS02001530.1:17993 122.8 220979000 1799847.7
22:45:54.750 INFO ProgressMeter - NKLS02001461.1:42262 122.9 221198000 1799185.2
22:46:04.757 INFO ProgressMeter - NKLS02000540.1:20640 123.1 221512000 1799298.3
22:46:14.763 INFO ProgressMeter - NKLS02001746.1:13640 123.3 221825000 1799403.2
22:46:24.790 INFO ProgressMeter - NKLS02001746.1:28894 123.4 222120000 1799357.0
22:46:34.811 INFO ProgressMeter - NKLS02001746.1:37298 123.6 222437000 1799490.3
22:46:44.838 INFO ProgressMeter - NKLS02001746.1:44394 123.8 222758000 1799654.1
22:46:54.858 INFO ProgressMeter - NKLS02001746.1:50459 123.9 222989000 1799093.0
22:47:04.886 INFO ProgressMeter - NKLS02001746.1:57347 124.1 223311000 1799264.7
22:47:14.893 INFO ProgressMeter - NKLS02000995.1:21702 124.3 223635000 1799457.1
22:47:24.921 INFO ProgressMeter - NKLS02001584.1:1774 124.4 223943000 1799515.4
22:47:34.942 INFO ProgressMeter - NKLS02001917.1:29982 124.6 224265000 1799687.5
22:47:44.946 INFO ProgressMeter - NKLS02000752.1:40352 124.8 224596000 1799935.4
22:47:54.946 INFO ProgressMeter - NKLS02002110.1:31043 124.9 224812000 1799263.2
22:48:04.954 INFO ProgressMeter - NKLS02002110.1:49248 125.1 225132000 1799422.1
22:48:14.961 INFO ProgressMeter - NKLS02000393.1:7626 125.3 225458000 1799628.8
22:48:24.970 INFO ProgressMeter - NKLS02000393.1:50242 125.4 225696000 1799133.1
22:48:34.973 INFO ProgressMeter - NKLS02001242.1:25710 125.6 226025000 1799364.2
22:48:44.983 INFO ProgressMeter - NKLS02001242.1:32515 125.8 226349000 1799553.4
22:48:54.995 INFO ProgressMeter - NKLS02001242.1:34145 125.9 226594000 1799114.5
22:49:05.012 INFO ProgressMeter - NKLS02001242.1:42572 126.1 226905000 1799198.8
22:49:15.021 INFO ProgressMeter - NKLS02000055.1:8303 126.3 227227000 1799372.0
22:49:25.051 INFO ProgressMeter - NKLS02000055.1:16285 126.4 227479000 1798986.1
22:49:35.076 INFO ProgressMeter - NKLS02000055.1:25862 126.6 227789000 1799060.7
22:49:45.085 INFO ProgressMeter - NKLS02000055.1:34025 126.8 228103000 1799170.0
22:49:55.090 INFO ProgressMeter - NKLS02000055.1:39572 126.9 228349000 1798744.8
22:50:05.106 INFO ProgressMeter - NKLS02000055.1:47190 127.1 228666000 1798876.1
22:50:16.733 INFO ProgressMeter - NKLS02001580.1:38781 127.3 228989000 1798675.1
22:50:26.733 INFO ProgressMeter - NKLS02001057.1:2087 127.5 229312000 1798857.3
22:50:36.734 INFO ProgressMeter - NKLS02000261.1:20528 127.6 229631000 1799007.4
22:50:46.747 INFO ProgressMeter - NKLS02002178.1:12357 127.8 229952000 1799169.9
22:50:56.768 INFO ProgressMeter - NKLS02002178.1:25083 128.0 230202000 1798775.4
22:51:06.777 INFO ProgressMeter - NKLS02001083.1:23995 128.1 230521000 1798923.2
22:51:16.781 INFO ProgressMeter - NKLS02000731.1:10387 128.3 230783000 1798627.5
22:51:26.808 INFO ProgressMeter - NKLS02000731.1:15552 128.5 231113000 1798856.5
22:51:36.833 INFO ProgressMeter - NKLS02000731.1:22076 128.6 231429000 1798976.7
22:51:46.865 INFO ProgressMeter - NKLS02000731.1:35983 128.8 231671000 1798520.1
22:51:56.873 INFO ProgressMeter - NKLS02001828.1:19323 129.0 231997000 1798721.7
22:52:06.899 INFO ProgressMeter - NKLS02000470.1:30046 129.1 232331000 1798980.6
22:52:16.910 INFO ProgressMeter - NKLS02000255.1:31608 129.3 232609000 1798809.5
22:52:26.921 INFO ProgressMeter - NKLS02000203.1:15850 129.5 232919000 1798885.5
22:52:36.938 INFO ProgressMeter - NKLS02001257.1:5004 129.6 233250000 1799122.1
22:52:46.957 INFO ProgressMeter - NKLS02001042.1:31393 129.8 233530000 1798964.7
22:52:56.986 INFO ProgressMeter - NKLS02000134.1:13968 130.0 233854000 1799144.0
22:53:07.008 INFO ProgressMeter - NKLS02001180.1:15927 130.1 234181000 1799347.5
22:53:17.021 INFO ProgressMeter - NKLS02000368.1:17027 130.3 234510000 1799567.9
22:53:27.049 INFO ProgressMeter - NKLS02000402.1:8181 130.5 234844000 1799822.6
22:53:37.056 INFO ProgressMeter - NKLS02001029.1:18438 130.6 235174000 1800050.8
22:53:47.083 INFO ProgressMeter - NKLS02001231.1:22035 130.8 235463000 1799960.5
22:53:57.104 INFO ProgressMeter - NKLS02001523.1:18900 131.0 235785000 1800123.9
22:54:07.114 INFO ProgressMeter - NKLS02000075.1:10836 131.1 236105000 1800273.7
22:54:17.142 INFO ProgressMeter - NKLS02000075.1:25339 131.3 236352000 1799863.4
22:54:27.156 INFO ProgressMeter - NKLS02000798.1:8737 131.5 236681000 1800081.2
22:54:37.165 INFO ProgressMeter - NKLS02001040.1:9184 131.7 236995000 1800185.1
22:54:47.177 INFO ProgressMeter - NKLS02001245.1:3942 131.8 237278000 1800053.2
22:54:57.178 INFO ProgressMeter - NKLS02001811.1:20320 132.0 237593000 1800166.5
22:55:07.191 INFO ProgressMeter - NKLS02001807.1:15348 132.2 237904000 1800246.6
22:55:17.221 INFO ProgressMeter - NKLS02002189.1:9259 132.3 238194000 1800163.9
22:55:27.245 INFO ProgressMeter - NKLS02001293.1:8790 132.5 238509000 1800271.5
22:55:37.257 INFO ProgressMeter - NKLS02000275.1:14151 132.7 238841000 1800509.7
22:55:47.286 INFO ProgressMeter - NKLS02000275.1:22099 132.8 239110000 1800269.1
22:55:57.287 INFO ProgressMeter - NKLS02000275.1:22136 133.0 239412000 1800283.6
22:56:07.292 INFO ProgressMeter - NKLS02002056.1:25059 133.2 239742000 1800507.4
22:56:17.309 INFO ProgressMeter - NKLS02001176.1:13224 133.3 240060000 1800637.9
22:56:27.311 INFO ProgressMeter - NKLS02000331.1:2882 133.5 240384000 1800816.5
22:56:37.312 INFO ProgressMeter - NKLS02001725.1:7194 133.7 240708000 1800994.8
22:56:47.334 INFO ProgressMeter - NKLS02002125.1:10320 133.8 241035000 1801190.4
22:56:57.356 INFO ProgressMeter - NKLS02000632.1:5292 134.0 241369000 1801437.8
22:57:07.379 INFO ProgressMeter - NKLS02000727.1:20651 134.2 241692000 1801602.3
22:57:17.395 INFO ProgressMeter - NKLS02000450.1:2484 134.3 241982000 1801522.3
22:57:27.411 INFO ProgressMeter - NKLS02001535.1:15976 134.5 242284000 1801531.7
22:57:37.415 INFO ProgressMeter - NKLS02000048.1:13717 134.7 242610000 1801722.0
22:57:47.418 INFO ProgressMeter - NKLS02001954.1:8837 134.8 242894000 1801600.5
22:57:57.426 INFO ProgressMeter - NKLS02001121.1:8008 135.0 243202000 1801656.0
22:58:07.446 INFO ProgressMeter - NKLS02000327.1:7430 135.2 243534000 1801886.3
22:58:17.475 INFO ProgressMeter - NKLS02001925.1:7030 135.3 243812000 1801714.9
22:58:27.494 INFO ProgressMeter - NKLS02000520.1:7211 135.5 244121000 1801775.0
22:58:37.502 INFO ProgressMeter - NKLS02000468.1:10266 135.7 244442000 1801925.9
22:58:48.941 INFO ProgressMeter - NKLS02000468.1:14963 135.8 244721000 1801450.8
22:58:58.943 INFO ProgressMeter - NKLS02000587.1:3254 136.0 245026000 1801485.4
22:59:08.959 INFO ProgressMeter - NKLS02000587.1:12684 136.2 245346000 1801626.9
22:59:18.963 INFO ProgressMeter - NKLS02000599.1:12408 136.3 245667000 1801778.0
22:59:28.971 INFO ProgressMeter - NKLS02002070.1:10346 136.5 245983000 1801891.3
22:59:38.980 INFO ProgressMeter - NKLS02002142.1:9005 136.7 246306000 1802055.3
22:59:48.983 INFO ProgressMeter - NKLS02001633.1:4158 136.8 246592000 1801949.8
22:59:58.993 INFO ProgressMeter - NKLS02000858.1:1958 137.0 246917000 1802128.0
23:00:08.996 INFO ProgressMeter - NKLS02001556.1:3012 137.2 247236000 1802263.0
23:00:19.013 INFO ProgressMeter - NKLS02000784.1:10866 137.3 247484000 1801877.9
23:00:29.047 INFO ProgressMeter - NKLS02000047.1:6048 137.5 247814000 1802086.4
23:00:39.077 INFO ProgressMeter - NKLS02001379.1:365 137.7 248136000 1802237.1
23:00:49.077 INFO ProgressMeter - NKLS02001119.1:7252 137.8 248403000 1801995.0
23:00:59.085 INFO ProgressMeter - NKLS02000395.1:2351 138.0 248730000 1802186.5
23:01:09.091 INFO ProgressMeter - NKLS02001549.1:7797 138.2 249064000 1802428.6
23:01:19.093 INFO ProgressMeter - NKLS02001549.1:8728 138.3 249383000 1802562.6
23:01:29.093 INFO ProgressMeter - NKLS02001249.1:4972 138.5 249702000 1802696.7
23:01:39.103 INFO ProgressMeter - NKLS02001671.1:4679 138.7 250021000 1802828.3
23:01:49.105 INFO ProgressMeter - NKLS02001453.1:1923 138.8 250290000 1802601.2
23:01:59.121 INFO ProgressMeter - unmapped 139.0 250614000 1802767.2
23:02:09.153 INFO ProgressMeter - unmapped 139.2 250946000 1802986.9
23:02:19.179 INFO ProgressMeter - unmapped 139.4 251233000 1802884.5
23:02:29.200 INFO ProgressMeter - unmapped 139.5 251575000 1803177.5
23:02:39.225 INFO ProgressMeter - unmapped 139.7 251914000 1803447.6
23:02:49.245 INFO ProgressMeter - unmapped 139.9 252181000 1803203.2
23:02:49.837 WARN IntelInflater - Zero Bytes Written : 0
23:02:49.840 INFO ApplyBQSR - 0 read(s) filtered by: WellformedReadFilter
23:02:49.840 INFO ProgressMeter - unmapped 139.9 252199945 1803210.8
23:02:49.840 INFO ProgressMeter - Traversal complete. Processed 252199945 total reads in 139.9 minutes.
23:02:50.929 INFO ApplyBQSR - Shutting down engine
[22 February 2023 at 23:02:50 CET] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 139.90 minutes.
Runtime.totalMemory()=2579496960
-
Hi,
Could you please clarify whether your recalibrated output file (Abi1_recal.bam) is missing any reads that were present in the input file (Abi1_dedup.bam)? Are you saying that there are MT-aligned reads present in the input bam that are not showing up in the recalibrated output bam?
You might also try running with the additional options --use-jdk-inflater and --use-jdk-deflater and see if that makes a difference.Regards,
David
-
Dear David,
Thank you for your kind replay. When I check the recalibrated bam file MT is avilable. -
While i am running haplotypecaller, it took more than 6 days for one sample to generate g.vcf file. Menwhile, I am convinced to call only the MT variants and continue the nuclear variants later.
Would you please provide me support in this regard.
Thank you once again for your remarkable support. -
Hi Wondessen Ayalew,
Did you get any support on this? I am having the same problem - it took a week to finish generating g.vcf for 1 sample. The command I used:
/gpfs/gpfs0/software/rhel7/eucleia/gatk-4.2.6.1/gatk --java-options "-Xmx260G" HaplotypeCaller -R /gpfs/gpfs0/scratch/Ecogenome/monkfish/reference_genome_monkfish/chr_level_assembly/LOPHIUS_GENOME_and_ANNOTATION/bf2_chromosomelevel.masked.fasta -I /gpfs/gpfs0/scratch/Ecogenome/reSultS/bam/Sample_15-LOP-095.dedup.fixed.bam -O /gpfs/gpfs0/scratch/Ecogenome/reSultS/hapcaller/monkfish/Sample_15-LOP-095.g.vcf > Sample_15-LOP-095.log
so, surprised that even with 260 GB memory its so slow! Would be grateful if anyone has suggestion that can help speeding up the process.
thanks,
Atal
-
Hi Atal Saha / Wondessen Ayalew,
The most common way to speed up HaplotypeCaller is to parallelize by genomic interval using the -L option, either in a local cluster or on the cloud, and then combine the outputs using MergeVcfs or CombineGVCFs. The basic idea is to run HaplotypeCaller many times in parallel, each with a different -L interval, and then merge the outputs at the end. Users typically will parallelize at least by chromosome, and often more finely. We publish a cloud-based workflow in Terra that can do this parallelization for you here.
If you don't have access to a cluster, and don't want to run on the cloud, you can try running HaplotypeCallerSpark, which is able to parallelize HaplotypeCaller using multiple threads on a local machine.
One other thing you should check is whether you're running on an Intel/AMD CPU, or another architecture such as M1. GATK does not currently have good support for M1 chips, and tools like HaplotypeCaller will run very slowly on such machines.
Regards,
David -
Hi David,
Thanks very much for your reply on this.
Running Spark version was much faster, but I am worried about warning about this version (as it says that we should not use spark if we care about results!). Are results from spark reliable? also spark is not generating .idx files. Does spark not generate .idx file by default as like the normal haplotypecaller?
thanks for your help with this,
Atal
-
Hi Atal Saha,
HaplotypeCallerSpark is a thin wrapper that just calls directly into the regular HaplotypeCaller code, so the results should be extremely close. However, because the Spark version "shards" the input data across multiple threads, there may be calling artifacts near the shard boundaries -- though a lot of work has been done to minimize this possibility. It's also possible that certain arguments that work for regular HaplotypeCaller may not work with HaplotypeCallerSpark. For these reasons, we hesitate to endorse the Spark version for clinical / production use, but for more casual purposes it should be perfectly fine to use. You may need to manually index the output VCF (eg., using GATK's IndexFeatureFile) after running the Spark version of the tool.
What we actually do in production here at the Broad is to use an interval list with carefully-chosen split points at areas of the reference that are filled with N's (such as at the centromeres), and then launch many HaplotypeCaller tasks at once to call variants for these intervals. This approach eliminates the possibility of calling artifacts near the interval boundaries.
Regards,
David -
Thanks very much, David.
I did manage to produce all my vcf files. Seemed using 20-30 gb memory was a good solution.
However, I accidentally did not include -ERC GVCF option while running haplotypecalling and now struglling to run combine vcfs and to run jointgenotyping. Running haplotypecalling one more time for all my samples will again take 2 weeks, so, is there a way out here?
Thanks again,
Atal
-
Hi Atal Saha,
Unfortunately, without the reference confidence scores produced by -ERC GVCF you will be unable to run joint genotyping using GATK. I'm afraid your only option is to re-call your samples.
Sorry!
David
-
Thanks again, David.
Re-calling started.
cheers
Atal
Please sign in to leave a comment.
10 comments