Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotypecaller produce unfinished vcf but no idx (no error)

0

13 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Lin Cheng,

    Is the process still running but very slowly, or is it hung (it has stopped)?

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lin Cheng

    Hi Genevieve Brandt ,

       Many thanks for your kind help, the process stopped without error, the last few line show below.  For chr04, it's total length have ##contig=<ID=chr04,length=69236331>. The process did not finish all position but exit without error.

    16:30:49.046 INFO  ProgressMeter -         chr04:996899             20.9                  5730            274.2
    Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa --ERC GVCF -I 01_bam/C382.AR.bam -O 02_gvcf/C382.chr04.g.vcf -L chr04
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lin Cheng,

    Are you determining where it stopped because of the progress meter? The progress meter is not accurate for where the code is exactly running, but yes, I see there can be an issue because your chromosome is much longer.

    Could you check your output VCF with ValidateVariants?

    Also, could you run this tool with --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' to make sure you are getting all the possible information in the stack trace? Here is more of a description. And here are details of how to include multiple java options on the command line if you are not familiar.

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lin Cheng

    Hi Genevieve Brandt ,

         Many thanks for your help,  I checked the gatk running progress and progress meter log file a few times, it's indeed stopped in both respects.

        And follow your option, I used the ValidateVariants for comparison of one of my unfinished sample and one of my finished gvcf.

    commands

    ##  unfinished sample A157 chr08
    gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ValidateVariants -R $ref -V 02_gvcf/A157.chr08.g.vcf -gvcf -L chr08

    ## finished sample A157 chr07
    gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ValidateVariants -R $ref -V 02_gvcf/A157.chr07.g.vcf -gvcf -L chr07

    ###log file for unfinished sample A157 chr08

    Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar ValidateVariants -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa -V 02_gvcf/A157.chr08.g.vcf -gvcf -L chr08

    10:01:59.578 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    Apr 01, 2021 10:01:59 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    10:01:59.798 INFO  ValidateVariants - ------------------------------------------------------------

    10:01:59.799 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0

    10:01:59.799 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/

    10:01:59.799 INFO  ValidateVariants - Executing as chenglin@login1 on Linux v2.6.32-431.el6.x86_64 amd64

    10:01:59.799 INFO  ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08

    10:01:59.799 INFO  ValidateVariants - Start Date/Time: April 1, 2021 10:01:59 AM CST

    10:01:59.799 INFO  ValidateVariants - ------------------------------------------------------------

    10:01:59.799 INFO  ValidateVariants - ------------------------------------------------------------

    10:01:59.800 INFO  ValidateVariants - HTSJDK Version: 2.23.0

    10:01:59.800 INFO  ValidateVariants - Picard Version: 2.23.3

    10:01:59.800 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    10:01:59.800 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    10:01:59.800 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    10:01:59.801 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    10:01:59.801 INFO  ValidateVariants - Deflater: IntelDeflater

    10:01:59.801 INFO  ValidateVariants - Inflater: IntelInflater

    10:01:59.801 INFO  ValidateVariants - GCS max retries/reopens: 20

    10:01:59.801 INFO  ValidateVariants - Requester pays: disabled

    10:01:59.801 INFO  ValidateVariants - Initializing engine

    10:02:00.287 INFO  FeatureManager - Using codec VCFCodec to read file file:///vol1/agis/huangsanwen_group/chenglin/pan-genome_vol1/02_snp/02_gvcf/A157.chr08.g.vcf

    10:02:00.325 INFO  IntervalArgumentCollection - Processing 59226000 bp from intervals

    10:02:00.330 INFO  ValidateVariants - Shutting down engine

    [April 1, 2021 10:02:00 AM CST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.

    Runtime.totalMemory()=1067450368

    ***********************************************************************

     

    A USER ERROR has occurred: Input 02_gvcf/A157.chr08.g.vcf must support random access to enable traversal by intervals. If it's a file, please index it using the bundled tool IndexFeatureFile

     

    ***********************************************************************

    org.broadinstitute.hellbender.exceptions.UserException: Input 02_gvcf/A157.chr08.g.vcf must support random access to enable traversal by intervals. If it's a file, please index it using the bundled tool IndexFeatureFile

    at org.broadinstitute.hellbender.engine.FeatureDataSource.setIntervalsForTraversal(FeatureDataSource.java:454)

    at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:47)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

     

     

    ###log file for finished sample A157 chr07

    Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar ValidateVariants -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa -V 02_gvcf/A157.chr07.g.vcf -gvcf -L chr07

    10:01:18.592 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    Apr 01, 2021 10:01:19 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    10:01:19.205 INFO  ValidateVariants - ------------------------------------------------------------

    10:01:19.205 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0

    10:01:19.206 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/

    10:01:19.206 INFO  ValidateVariants - Executing as chenglin@login1 on Linux v2.6.32-431.el6.x86_64 amd64

    10:01:19.206 INFO  ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08

    10:01:19.206 INFO  ValidateVariants - Start Date/Time: April 1, 2021 10:01:18 AM CST

    10:01:19.206 INFO  ValidateVariants - ------------------------------------------------------------

    10:01:19.206 INFO  ValidateVariants - ------------------------------------------------------------

    10:01:19.207 INFO  ValidateVariants - HTSJDK Version: 2.23.0

    10:01:19.207 INFO  ValidateVariants - Picard Version: 2.23.3

    10:01:19.207 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    10:01:19.207 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    10:01:19.207 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    10:01:19.207 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    10:01:19.207 INFO  ValidateVariants - Deflater: IntelDeflater

    10:01:19.208 INFO  ValidateVariants - Inflater: IntelInflater

    10:01:19.208 INFO  ValidateVariants - GCS max retries/reopens: 20

    10:01:19.208 INFO  ValidateVariants - Requester pays: disabled

    10:01:19.208 INFO  ValidateVariants - Initializing engine

    10:01:19.707 INFO  FeatureManager - Using codec VCFCodec to read file file:///vol1/agis/huangsanwen_group/chenglin/pan-genome_vol1/02_snp/02_gvcf/A157.chr07.g.vcf

    10:01:19.840 INFO  IntervalArgumentCollection - Processing 57639317 bp from intervals

    10:01:19.845 INFO  ValidateVariants - Done initializing engine

    10:01:19.846 WARN  ValidateVariants - GVCF format is currently incompatible with allele validation. Not validating Alleles.

    10:01:19.846 WARN  ValidateVariants - IDS validation cannot be done because no DBSNP file was provided

    10:01:19.846 WARN  ValidateVariants - Other possible validations will still be performed

    10:01:19.846 INFO  ProgressMeter - Starting traversal

    10:01:19.846 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute

    10:01:29.848 INFO  ProgressMeter -       chr07:24618694              0.2               1815000       10890000.0

    10:01:39.849 INFO  ProgressMeter -       chr07:53081570              0.3               3846000       11536269.6

    10:01:41.364 INFO  ProgressMeter -       chr07:57637699              0.4               4162763       11607295.3

    10:01:41.364 INFO  ProgressMeter - Traversal complete. Processed 4162763 total variants in 0.4 minutes.

    10:01:41.366 INFO  ValidateVariants - Shutting down engine

    [April 1, 2021 10:01:41 AM CST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.38 minutes.

    Runtime.totalMemory()=2764046336

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lin Cheng,

    Thank you for running the ValidateVariants command! It is very helpful to verify.

    Could you run your HaplotypeCaller command with the -DGATK_STACKTRACE_ON_USER_EXCEPTION=true java option and post the stack trace here?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lin Cheng

    Hi Genevieve Brandt,

    A good news, at an accidental time yesterday, I tried the parameters you mentioned(- DGATK_STACKTRACE_ON_USER_EXCEPTION=true) and ran one chromosome of a sample, and the program worked perfectly.

    gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/A157.chr08.g.vcf -L chr08

          I think it might be because the parameters you mentioned were added, but later I started to run batches and found that there was still a problem with my issue (unfinished vcf without error). I will continue to try to see what the problem is. (Could this be the reason for running gatk in batches at the same time?)

     

    The log file you mentioned is below.

     

    09:56:22.492 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 02, 2021 9:56:43 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:56:43.339 INFO HaplotypeCaller - ------------------------------------------------------------
    09:56:43.339 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
    09:56:43.339 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:56:43.351 INFO HaplotypeCaller - Executing as chenglin@comput69 on Linux v2.6.32-504.el6.x86_64 amd64
    09:56:43.352 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08
    09:56:43.352 INFO HaplotypeCaller - Start Date/Time: April 2, 2021 9:56:22 AM CST
    09:56:43.352 INFO HaplotypeCaller - ------------------------------------------------------------
    09:56:43.352 INFO HaplotypeCaller - ------------------------------------------------------------
    09:56:43.353 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
    09:56:43.353 INFO HaplotypeCaller - Picard Version: 2.23.3
    09:56:43.353 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:56:43.353 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:56:43.354 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:56:43.354 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:56:43.354 INFO HaplotypeCaller - Deflater: IntelDeflater
    09:56:43.354 INFO HaplotypeCaller - Inflater: IntelInflater
    09:56:43.354 INFO HaplotypeCaller - GCS max retries/reopens: 20
    09:56:43.355 INFO HaplotypeCaller - Requester pays: disabled
    09:56:43.355 INFO HaplotypeCaller - Initializing engine
    09:56:43.875 INFO IntervalArgumentCollection - Processing 59670755 bp from intervals
    09:56:43.896 INFO HaplotypeCaller - Done initializing engine
    09:56:43.900 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
    09:56:43.908 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
    09:56:43.908 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
    09:56:43.952 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    09:56:43.953 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    09:56:44.020 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    09:56:44.021 INFO IntelPairHmm - Available threads: 1
    09:56:44.021 INFO IntelPairHmm - Requested threads: 4
    09:56:44.021 WARN IntelPairHmm - Using 1 available threads, but 4 were requested

    09:56:44.021 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    09:56:44.269 INFO ProgressMeter - Starting traversal
    09:56:44.270 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
    09:56:51.697 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
    09:56:54.294 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.294 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.296 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.298 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.299 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.311 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.313 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.327 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.338 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.350 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.351 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.352 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.357 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.360 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.363 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.377 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.379 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.386 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.388 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.398 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.408 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.410 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.412 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.413 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:56:54.434 INFO ProgressMeter - chr12:125000 0.2 470 2774.5
    09:57:00.597 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:57:00.602 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:57:00.606 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    09:57:00.609 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    ...........

    ...........

    ...........

    15:03:15.392 INFO ProgressMeter - chr12:10886886 306.5 64090 209.1
    15:03:27.165 INFO ProgressMeter - chr12:10894514 306.7 64140 209.1

    15:03:38.353 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    15:03:38.353 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa --ERC GVCF -I 01_bam/C382.AR.bam -O 02_gvcf/C382.chr12.g.vcf -L chr12

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lin Cheng,

    Great news! Glad it is working for now. There is a possibility it is a memory issue. Here are our recommendations to help if that is the cause:

    1. Check memory/disk space availability on your end.

    2. Specify java memory usage using java option -Xmx.

    3. Run the gatk command with the gatk wrapper script command line.

    4. Split your analysis into intervals with the option -L. 
    5. Specify a --tmp-dir that has room for all necessary temporary files.

    6. Verify this issue persists with the latest version of GATK.

    7. Check the depth of coverage of your sample at the area of interest.

    If the issue persists, please let us know and we can continue to look into it.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lin Cheng

    HI Genevieve Brandt ,

        After trying different parameters, I still stuck in the Haplotypecaller step. Here is the command that I used.

    gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr05.g.vcf -L chr05 --tmp-dir $tem
    gatk --java-options "-Xmx50G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr09.g.vcf -L chr09 --tmp-dir $tem
    gatk --java-options "-Xmx10G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr11.g.vcf -L chr11 --tmp-dir $tem

     please help me figure out this questions. Many thanks

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lin Cheng,

    How large is your input BAM? And what is the memory and disk space availability on your end?

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Lin Cheng, I wanted to follow up with a couple more points.

    If the HaplotypeCaller run is successful, it will show a line that it is complete. An example of a successful run is here, in this post:

    15:26:56.288 INFO ProgressMeter - Traversal complete. Processed 150454 total regions in 0.4 minutes.

    How long is the job running before it stops? Could you give more details about how you are running this, is it a shared cluster? Are there any other logs you could check that would give details about why the job is being stopped?

    Unfortunately, there is really not any information in the HaplotypeCaller stack trace that would indicate the process ending because of the tool.

    0
    Comment actions Permalink
  • Avatar
    Lin Cheng

    Hi Genevieve Brandt ,

    For bam size, my bam size from 36G to 466G,

    For memory, I used the SGE qsub to submit my job, so I can ask memory by "-Xmx" up to 252G,

    For disk space, there is about 1T left for my disk.

    However, I want to say, for some of my unfinished sample C001, it's bam file about 40 G, for some of my finished sample C134, it's bam file about 466G.

     

    How long is the job running before it stops? 

    It's about 8 hours for sample C001(bam size ~40G) Chr05 (length=55599697)
    It's about 2 hours for sample C001 Chr09 (length=67600300)
    It's about 1 hour for sample C001(bam size ~40G) Chr11 (length=46777387)

    Running command on SGE cluster

    qsub parameter: "-q queue6 -l mem=10G,nodes=1:ppn=1" gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr05.g.vcf -L chr05 --tmp-dir $tem
    qsub parameter: "-q queue6 -l mem=50G,nodes=1:ppn=1" gatk --java-options "-Xmx50G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr09.g.vcf -L chr09 --tmp-dir $tem
    qsub parameter: "-q queue6 -l mem=10G,nodes=1:ppn=2" gatk --java-options "-Xmx10G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr11.g.vcf -L chr11 --tmp-dir $tem

    I will try to run gatk on my local enviroment not shared cluster and related log file

    0
    Comment actions Permalink
  • Avatar
    Lin Cheng

    Hi Genevieve Brandt ,

    after running on my local environment, I found an error log

    10:37:38.526 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    10:37:38.526 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f8e077d454a, pid=6734, tid=0x00007fa83e9ef700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode linux-amd64 )
    # Problematic frame:
    # C  [libgkl_pairhmm_omp6215386383396637088.so+0x6954a]  double compute_full_prob_avxd<double>(testcase*)+0x34a
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /vol1/agis/huangsanwen_group/chenglin/pan-genome_vol1/02_snp/test/hs_err_pid6734.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #

    And then I used command on my local environment, it's solved perfectly.

    ulimit -c unlimited
    gatk --java-options "-Xmx100G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr11.g.vcf -L chr11 --tmp-dir $tem

     Many thanks for your kind help. have a nice day ~~

    21:46:52.515 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    21:46:52.726 INFO  HaplotypeCaller - 296108 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    0 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    47 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    296155 total reads filtered
    21:46:52.726 INFO  ProgressMeter -       chr11:46775573            616.8                316158            512.6
    21:46:52.726 INFO  ProgressMeter - Traversal complete. Processed 316158 total regions in 616.8 minutes.
    21:46:53.120 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 4.253241578
    21:46:53.120 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 2854.067335999
    21:46:53.120 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 26878.87 sec
    21:46:53.121 INFO  HaplotypeCaller - Shutting down engine
    [April 6, 2021 9:46:53 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 616.82 minutes.
    Runtime.totalMemory()=38404620288
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lin Cheng, glad you were able to solve the issue, thank you for sharing the solution!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk