GATK HaplotypeCaller: only first chromosome in output vcf
REQUIRED for all errors and issues:
a) GATK version used:
The Genome Analysis Toolkit (GATK) v4.6.0.0
b) Exact command used:
gatk HaplotypeCaller \
-R barcode01_haploid.final.fa \
-I barcode01_haploid.final.sorted.RG.markdup.bam \
-O barcode01_haploid.final_raw_SNPs_indels.vcf
c) Entire program log:
Loading gatk/4.6.0.0
Loading requirement: java/jdk-17.0.12
Loading vcftools/0.1.16
Loading requirement: perl/5.32.1
Using GATK jar /u/local/apps/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /u/local/apps/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar HaplotypeCaller -R barcode01_haploid.final.fa -I barcode01_haploid.final.sorted.RG.markdup.bam -O barcode01_haploid.final_raw_SNPs_indels.vcf
16:45:49.450 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/u/local/apps/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:45:49.664 INFO HaplotypeCaller - ------------------------------------------------------------
16:45:49.667 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.6.0.0
16:45:49.667 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:45:49.667 INFO HaplotypeCaller - Executing as ldpeck@n1891 on Linux v3.10.0-1160.108.1.el7.x86_64 amd64
16:45:49.667 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v17.0.12+8-LTS-286
16:45:49.667 INFO HaplotypeCaller - Start Date/Time: October 16, 2024 at 4:45:49 PM PDT
16:45:49.668 INFO HaplotypeCaller - ------------------------------------------------------------
16:45:49.668 INFO HaplotypeCaller - ------------------------------------------------------------
16:45:49.668 INFO HaplotypeCaller - HTSJDK Version: 4.1.1
16:45:49.668 INFO HaplotypeCaller - Picard Version: 3.2.0
16:45:49.669 INFO HaplotypeCaller - Built for Spark Version: 3.5.0
16:45:49.669 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:45:49.669 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:45:49.669 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:45:49.669 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:45:49.669 INFO HaplotypeCaller - Deflater: IntelDeflater
16:45:49.669 INFO HaplotypeCaller - Inflater: IntelInflater
16:45:49.669 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:45:49.670 INFO HaplotypeCaller - Requester pays: disabled
16:45:49.670 INFO HaplotypeCaller - Initializing engine
16:45:49.857 INFO HaplotypeCaller - Done initializing engine
16:45:49.875 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/u/local/apps/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:45:49.879 INFO NativeLibraryLoader - Loading libgkl_smithwaterman.so from jar:file:/u/local/apps/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl_smithwaterman.so
16:45:49.880 INFO IntelSmithWaterman - Using CPU-supported AVX-512 instructions
16:45:49.880 INFO SmithWatermanAligner - Using AVX accelerated SmithWaterman implementation
16:45:49.883 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
16:45:49.893 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/u/local/apps/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
16:45:49.919 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
16:45:49.919 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:45:49.920 INFO IntelPairHmm - Available threads: 36
16:45:49.920 INFO IntelPairHmm - Requested threads: 4
16:45:49.920 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:45:49.952 INFO ProgressMeter - Starting traversal
16:45:49.953 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:45:51.033 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position NC_044904.1_RagTag:9446 and possibly subsequent; at least 10 samples must have called genotypes
16:46:00.036 INFO ProgressMeter - NC_044904.1_RagTag:128078 0.2 920 5475.1
16:46:10.059 INFO ProgressMeter - NC_044904.1_RagTag:198361 0.3 1470 4387.0
16:46:20.353 INFO ProgressMeter - NC_044904.1_RagTag:308493 0.5 2320 4578.9
16:46:31.546 INFO ProgressMeter - NC_044904.1_RagTag:368826 0.7 2760 3981.5
16:46:41.711 INFO ProgressMeter - NC_044904.1_RagTag:468319 0.9 3500 4057.3
16:46:51.812 INFO ProgressMeter - NC_044904.1_RagTag:502875 1.0 3740 3627.6
16:47:01.873 INFO ProgressMeter - NC_044904.1_RagTag:552721 1.2 4120 3437.2
16:47:12.006 INFO ProgressMeter - NC_044904.1_RagTag:630644 1.4 4730 3458.7
16:47:22.149 INFO ProgressMeter - NC_044904.1_RagTag:687433 1.5 5140 3345.0
16:47:32.157 INFO ProgressMeter - NC_044904.1_RagTag:944476 1.7 7040 4132.9
16:47:42.296 INFO ProgressMeter - NC_044904.1_RagTag:1038611 1.9 7760 4144.5
16:47:52.315 INFO ProgressMeter - NC_044904.1_RagTag:1139230 2.0 8520 4177.8
16:48:02.346 INFO ProgressMeter - NC_044904.1_RagTag:1205948 2.2 9060 4106.0
16:48:12.409 INFO ProgressMeter - NC_044904.1_RagTag:1313146 2.4 9850 4148.6
Job 5237262 ended on: n1891
Job 5237262 ended on: Wed Oct 16 16:48:22 PDT 2024
The problem:
As you can see from my program log, HaplotypeCaller is only running on the first chromosome NC_044904.1_RagTag, so only this chromosome is in my output vcf. All the chromosomes are in the assembly and bam (see below) and I have indexed the bam so I am confused why this is happening? Any advice would be much appreciated.
Thanks
Lily
samtools idxstats barcode01_haploid.final.sorted.RG.markdup.bam
NC_044904.1_RagTag 58412555 808382 0
NC_044905.1_RagTag 105229727 1677681 0
NC_044906.1_RagTag 73640832 678363 0
NC_044907.1_RagTag 97209389 1011393 0
NC_044908.1_RagTag 94785287 1680499 0
NC_044909.1_RagTag 54564429 1362540 0
NC_044910.1_RagTag 53631442 772246 0
NC_044911.1_RagTag 72723115 2292288 0
NC_044912.1_RagTag 56553625 298004 0
NC_044913.1_RagTag 64678893 897035 0
NC_044914.1_RagTag 57518804 551207 0
NC_044915.1_RagTag 44050833 591073 0
NW_022154702.1_RagTag 44900 166 0
NW_022154703.1_RagTag 264031 606 0
NW_022154704.1_RagTag 1023123 2975 0
NW_022154705.1_RagTag 772174 69351 0
NW_022154706.1_RagTag 42835 135 0
NW_022154707.1_RagTag 127428 409 0
NW_022154708.1_RagTag 558544 1249 0
NW_022154710.1_RagTag 565419 2531 0
NW_022154711.1_RagTag 346351 43115 0
NW_022154715.1_RagTag 960887 5179 0
NW_022154716.1_RagTag 350981 1142 0
NW_022154720.1_RagTag 142013 515 0
NW_022154722.1_RagTag 102071 236 0
NW_022154723.1_RagTag 68812 413 0
NW_022154730.1_RagTag 546301 2796 0
NW_022154731.1_RagTag 294301 690 0
NW_022154735.1_RagTag 135074 1037 0
NW_022154736.1_RagTag 551969 76045 0
NW_022154739.1_RagTag 9880 37 0
NW_022154740.1_RagTag 297111 23890 0
NW_022154742.1_RagTag 17312 53 0
NW_022154749.1_RagTag 156190 1087 0
NW_022154750.1_RagTag 152083 10392 0
NW_022154751.1_RagTag 468528 1623 0
NW_022154761.1_RagTag 8322 18 0
NW_022154762.1_RagTag 105236 926 0
NW_022154766.1_RagTag 812150 3681 0
NW_022154767.1_RagTag 99979 6545 0
NW_022154770.1_RagTag 5985 38 0
NW_022154774.1_RagTag 49438 216 0
NW_022154776.1_RagTag 28184 1530 0
NW_022154778.1_RagTag 71337 249 0
NW_022154791.1_RagTag 112642 12574 0
NW_022154797.1_RagTag 75467 280 0
NW_022154812.1_RagTag 343861 977 0
NW_022154813.1_RagTag 187716 460 0
NW_022154824.1_RagTag 438253 3432 0
NW_022154825.1_RagTag 31570 296 0
NW_022154842.1_RagTag 70467 432 0
NW_022154843.1_RagTag 114935 535 0
NW_022154845.1_RagTag 20469 142 0
NW_022154849.1_RagTag 92154 723 0
NW_022154855.1_RagTag 478962 3019 0
NW_022154860.1_RagTag 303642 2678 0
NW_022154866.1_RagTag 281422 2699 0
NW_022154867.1_RagTag 50693 4703 0
NW_022154868.1_RagTag 51039 160 0
NW_022154879.1_RagTag 411590 6311 0
NW_022154882.1_RagTag 15134 204 0
NW_022154883.1_RagTag 182982 280 0
NW_022154893.1_RagTag 32836 56427 0
NW_022154911.1_RagTag 57875 79 0
NW_022154926.1_RagTag 91576 4944 0
NW_022154930.1_RagTag 81323 2130 0
NW_022154934.1_RagTag 6056 297 0
NW_022154938.1_RagTag 780604 18603 0
NW_022154945.1_RagTag 940079 1367 0
NW_022154946.1_RagTag 485880 3315 0
NW_022154951.1_RagTag 9567 6363 0
NW_022154962.1_RagTag 70889 1163 0
NW_022154994.1_RagTag 23451 95 0
NW_022155015.1_RagTag 200533 1373 0
NW_022155026.1_RagTag 127007 41588 0
NW_022155044.1_RagTag 8813 500 0
NW_022155047.1_RagTag 402759 4059 0
NW_022155058.1_RagTag 189674 898 0
NW_022155076.1_RagTag 34391 3428 0
NW_022155078.1_RagTag 1027398 2081 0
NW_022155086.1_RagTag 199892 3348 0
NW_022155097.1_RagTag 455899 1729 0
NW_022155105.1_RagTag 836844 2509 0
NW_022155156.1_RagTag 725494 12603 0
NW_022155180.1_RagTag 311941 2682 0
NW_022155181.1_RagTag 9998 216 0
NW_022155219.1_RagTag 156927 6491 0
NW_022155235.1_RagTag 1037462 18249 0
NW_022155260.1_RagTag 331826 7051 0
NW_022155294.1_RagTag 154811 753 0
NW_022155314.1_RagTag 300969 5946 0
NW_022155348.1_RagTag 324026 5620 0
NW_022155358.1_RagTag 258843 52061 0
NW_022155456.1_RagTag 24494 189 0
NW_022155468.1_RagTag 137621 1472 0
NW_022155473.1_RagTag 7305 210 0
NW_022155491.1_RagTag 76376 13506 0
NW_022155495.1_RagTag 41473 674 0
NW_022155550.1_RagTag 187353 2389 0
NW_022155616.1_RagTag 53185 1221 0
NW_022155638.1_RagTag 101424 1260 0
NW_022155662.1_RagTag 115589 2479 0
NW_022155744.1_RagTag 45229 2909 0
* 0 0 247
-
Update >>
I am running this as a while loop, looping through each chromosome / unplaced contig. However I would still like to run it once, on every chromosome and contig in the assembly and bam producing one vcf. Is this possible?
Thanks
-
Hi Lily Peck
It is possible that the compute environment you are working has limitations on how long each task should last therefore your jobs do not reach a completion and therefore prematurely terminated by the system.
You may call each chromosome separately into a different file and later use GatherVcfs tool to combine them into a single call file.
I hope this helps.
Regards.
-
Hello
Thank you for getting back to me. The problem is not caused by the computing environment, the job is finishing (not terminating unexpectedly) and it only takes ~ 20 mins.
Is there a way to run all chromosomes at once?
Thanks
Lily
-
Hi again
Is this part of the log that your command produces?
Job 5237262 ended on: n1891
Job 5237262 ended on: Wed Oct 16 16:48:22 PDT 2024From the looks of it your command did not even last more than 3 minutes in time.
This may still be due to compute environment limitations that are not evident to you directly. We strongly recommend you to consult your IT specialists to help you extend the time you need to run GATK. HaplotypeCaller does not have any such limitations to end prematurely.
I hope this helps.
Regards.
Please sign in to leave a comment.
4 comments