Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotypecaller too long ?

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Quentin Chartreux is it running at the same speed and updating the whole time or are these two examples running normally and then slowing down?

    0
    Comment actions Permalink
  • Avatar
    Quentin Chartreux

    Maybe i will show you the region/minute parameter from the log file with graph. 

    For a "normal" interval i obtained this type of graph (the speed is more or less constant): 

     

    For interval 003 : 

     

    and for 041 : 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for sharing these. 

    At least for interval 003, a common reason it would slow down so much would be that the process runs out of memory so it uses the temporary storage for the process running. This results in a lot of file I/O as the tool runs and is much slower than if it uses memory. So, you probably want to increase the memory for 003.

    041 is a bit puzzling to me, it looks like it never gets started running at a similar speed comparable to the other intervals you shared. You may be getting a lot of read depth over this interval or a lot of alternate alleles. If you don't want to change other parameters that could affect the results, you could first try increasing the memory as well. 

    0
    Comment actions Permalink
  • Avatar
    Quentin Chartreux

    Thanks a lot, 

    after increasing the memory and re try all the 50 intervals, all are finished in 2h28. 

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Glad to hear! Thanks for the update.

    0
    Comment actions Permalink
  • Avatar
    Wondessen Ayalew

    Dear GATK team,

    This is my first time running GATK and has faced a long time running GATK4.3 (> 17 hrs) and still not converge/finished.

    1. Is there any solution to speed up the process? because I do have 150 WGS samples ....

    2. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio', and 'QualByDepth' annotations have been disabled. Does this normal for downstream analysis?

    3. Still, the program gives me a warning message in the log file indicated below the command.  

    #!/bin/bash
    #$ -N gatk
    #$ -M wonde.ayalew@slu.se
    #$ -m seab # this is what notification you want to receive about the job on your mail (start ; end ; error ; killed)
    #$ -cwd #Use the directory you're running from
    #$ -l h_rt=48:0:0,h_vmem=4G
    #Setting running time in hours:min:sec and the memory required for the job per cpus (12*2=24g of RAM)
    #$ -j y #Joining the output from standard out and standard error to one file
    #$ -pe smp 12 #Setting the number of threads for the job to best fit for the system between 1 and 48.
    #$ -e haplo-errAfar0.log #stderr output stored in the log file
    #$ -o haplo-outAfar0.log #stdout output stored in the log file

    module load conda
    source ../../../opt/sw/conda/3/etc/profile.d/conda.sh
    module load gatk/4.3

    ### HaplotypeCaller  ####

    gatk HaplotypeCaller -R ../wonde/Bos_taurus.ARS-UCD1.2.dna.toplevel.fa -I ../data/Afar_1_dedup.bam -O ../wonde/Afar1.g.vcf.gz -ERC GVCF

    Running GATK log file .......

    Using GATK jar /export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar HaplotypeCaller -R ../Refg/ARS-UCD1.2_Btau5.0.1Y.fa -I ../data/Afar_rD/trim/Afar_11_dedup.bam -O Afar11.g.vcf.gz -ERC GVCF
    09:01:10.011 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    09:01:10.182 INFO  HaplotypeCaller - ------------------------------------------------------------
    09:01:10.183 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.3.0.0
    09:01:10.183 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:01:10.183 INFO  HaplotypeCaller - Executing as wondossen@compute4.c.hgen.slu.se on Linux v3.10.0-693.21.1.el7.x86_64 amd64
    09:01:10.183 INFO  HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.13+7-b1751.21
    09:01:10.183 INFO  HaplotypeCaller - Start Date/Time: 18 February 2023 at 09:01:09 GMT
    09:01:10.184 INFO  HaplotypeCaller - ------------------------------------------------------------
    09:01:10.184 INFO  HaplotypeCaller - ------------------------------------------------------------
    09:01:10.185 INFO  HaplotypeCaller - HTSJDK Version: 3.0.1
    09:01:10.185 INFO  HaplotypeCaller - Picard Version: 2.27.5
    09:01:10.185 INFO  HaplotypeCaller - Built for Spark Version: 2.4.5
    09:01:10.185 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:01:10.185 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:01:10.185 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:01:10.185 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:01:10.185 INFO  HaplotypeCaller - Deflater: IntelDeflater
    09:01:10.186 INFO  HaplotypeCaller - Inflater: IntelInflater
    09:01:10.186 INFO  HaplotypeCaller - GCS max retries/reopens: 20
    09:01:10.186 INFO  HaplotypeCaller - Requester pays: disabled
    09:01:10.186 INFO  HaplotypeCaller - Initializing engine
    09:01:10.584 INFO  HaplotypeCaller - Done initializing engine
    09:01:10.587 INFO  HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
    09:01:10.686 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output
    09:01:10.686 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
    09:01:10.712 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    09:01:10.715 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    09:01:10.756 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    09:01:10.757 INFO  IntelPairHmm - Available threads: 48
    09:01:10.757 INFO  IntelPairHmm - Requested threads: 4
    09:01:10.757 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    09:01:10.852 INFO  ProgressMeter - Starting traversal
    09:01:10.852 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
    09:01:11.465 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position 1:25960 and possibly subsequent; at least 10 samples must have called genotypes
    09:01:21.509 INFO  ProgressMeter -             1:168563              0.2                   810           4560.4
    09:01:31.520 INFO  ProgressMeter -             1:298226              0.3                  1730           5022.3
    09:01:41.755 INFO  ProgressMeter -             1:393775              0.5                  2460           4776.2
    09:01:52.569 INFO  ProgressMeter -             1:423496              0.7                  2710           3897.7
    09:02:02.817 INFO  ProgressMeter -             1:440892              0.9                  2860           3302.2
    09:02:14.172 INFO  ProgressMeter -             1:457041              1.1                  2990           2833.2
    09:02:24.254 INFO  ProgressMeter -             1:475053              1.2                  3140           2566.7
    09:02:34.377 INFO  ProgressMeter -             1:494323              1.4                  3240           2327.4
    09:02:46.505 INFO  ProgressMeter -             1:504580              1.6                  3310           2076.3
    09:02:57.508 INFO  ProgressMeter -             1:514393              1.8                  3380           1901.4
    09:03:07.753 INFO  ProgressMeter -             1:523101              1.9                  3440           1765.6
    09:03:18.212 INFO  ProgressMeter -             1:533070              2.1                  3510           1653.6
    09:03:28.221 INFO  ProgressMeter -             1:551630              2.3                  3660           1598.6
    09:03:39.305 INFO  ProgressMeter -             1:574427              2.5                  3860           1560.1
    09:03:50.586 INFO  ProgressMeter -             1:586123              2.7                  3960           1487.5
    09:04:02.003 INFO  ProgressMeter -             1:605887              2.9                  4130           1447.8
    09:04:12.463 INFO  ProgressMeter -             1:619129              3.0                  4240           1400.8
    09:04:23.784 INFO  ProgressMeter -             1:630362              3.2                  4330           1346.6
    09:04:33.793 INFO  ProgressMeter -             1:646653              3.4                  4470           1321.6
    09:04:43.964 INFO  ProgressMeter -             1:655160              3.6                  4530           1275.4
    09:04:54.235 INFO  ProgressMeter -             1:674183              3.7                  4680           1257.0
    09:05:04.578 INFO  ProgressMeter -             1:689703              3.9                  4810           1234.8
    09:05:14.858 INFO  ProgressMeter -             1:719915              4.1                  5070           1246.7
    09:05:24.965 INFO  ProgressMeter -             1:806192              4.2                  5750           1357.7
    09:05:35.403 INFO  ProgressMeter -             1:856692              4.4                  6200           1406.2
    09:05:45.427 INFO  ProgressMeter -             1:973319              4.6                  7130           1558.0
    09:05:55.136 WARN  DepthPerSampleHC - Annotation will not be calculated at position 1:1085154 and possibly subsequent; genotype for sample Afa1 is not called
    09:05:55.137 WARN  StrandBiasBySample - Annotation will not be calculated at position 1:1085154 and possibly subsequent; genotype for sample Afa1 is not called
    09:05:55.450 INFO  ProgressMeter -            1:1087402              4.7                  8020           1690.8
    09:06:05.494 INFO  ProgressMeter -            1:1153066              4.9                  8590           1749.2
    09:06:15.516 INFO  ProgressMeter -            1:1218256              5.1                  9130           1798.0
    09:06:25.564 INFO  ProgressMeter -            1:1319886              5.2                  9960           1898.9

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk