Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller inconsistency with different interval lists

1

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Gytis Mackevičius,

    Are there no variants in the VCF from the all.list HaplotypeCaller command, or no VCF at all? Could you share the complete program log from your HaplotypeCaller command with the all.list?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Gytis Mackevičius

    Hi,
    Log with chr11.list:


    06:05:49.339 INFO HaplotypeCaller - ------------------------------------------------------------
    06:05:49.339 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.0.0-SNAPSHOT
    06:05:49.339 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    06:05:49.340 INFO HaplotypeCaller - Executing as root on Linux v4.9.0-8-amd64 amd64
    06:05:49.340 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-post-Debian-1deb10u1
    06:05:49.340 INFO HaplotypeCaller - Start Date/Time: September 10, 2021 at 6:05:49 AM UTC
    06:05:49.340 INFO HaplotypeCaller - ------------------------------------------------------------
    06:05:49.340 INFO HaplotypeCaller - ------------------------------------------------------------
    06:05:49.340 INFO HaplotypeCaller - HTSJDK Version: 2.24.0
    06:05:49.340 INFO HaplotypeCaller - Picard Version: 2.25.0
    06:05:49.340 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
    06:05:49.340 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    06:05:49.340 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    06:05:49.340 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    06:05:49.340 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    06:05:49.341 INFO HaplotypeCaller - Deflater: IntelDeflater
    06:05:49.341 INFO HaplotypeCaller - Inflater: IntelInflater
    06:05:49.341 INFO HaplotypeCaller - GCS max retries/reopens: 20
    06:05:49.341 INFO HaplotypeCaller - Requester pays: disabled
    06:05:49.341 INFO HaplotypeCaller - Initializing engine
    06:05:49.492 INFO IntervalArgumentCollection - Processing 11669 bp from intervals
    06:05:49.495 INFO HaplotypeCaller - Done initializing engine
    06:05:49.508 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
    06:05:49.516 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/opt/gatk-package-4.2.0.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.so
    06:05:49.517 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/opt/gatk-package-4.2.0.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    06:05:49.544 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
    06:05:49.544 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    06:05:49.544 INFO IntelPairHmm - Available threads: 48
    06:05:49.545 INFO IntelPairHmm - Requested threads: 4
    06:05:49.545 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    06:05:49.570 INFO ProgressMeter - Starting traversal
    06:05:49.570 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
    06:05:51.486 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position 11:1016704 and possibly subsequent; at least 10 samples must have called genotypes
    06:06:04.649 INFO ProgressMeter - 11:1018020 0.3 10 39.8
    06:06:09.216 INFO HaplotypeCaller - 6171 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    10 read(s) filtered by: NotSecondaryAlignmentReadFilter
    0 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    0 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    6181 total reads filtered
    06:06:09.216 INFO ProgressMeter - 11:121028481 0.3 129 394.0
    06:06:09.216 INFO ProgressMeter - Traversal complete. Processed 129 total regions in 0.3 minutes.
    06:06:09.263 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.048460615000000005
    06:06:09.263 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 9.203253743000001
    06:06:09.263 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 2.89 sec
    06:06:09.263 INFO HaplotypeCaller - Shutting down engine

    I get variant:

    11 1016944 . G T 308.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.080;DP=1217;ExcessHet=3.0103;FS=16.489;MLEAC=1;MLEAF=0.500;MQ=48.21;MQRankSum=-4.107;QD=0.26;ReadPosRankSum=1.175;SOR=3.438 GT:AD:DP:GQ:PL 0/1:1120,88:1208:99:316,0,46731

     

    Log with all.list:

    06:13:46.821 INFO HaplotypeCaller - ------------------------------------------------------------

    06:13:46.821 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.0.0-SNAPSHOT

    06:13:46.821 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/

    06:13:46.821 INFO HaplotypeCaller - Executing as root on Linux v4.9.0-8-amd64 amd64

    06:13:46.821 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-post-Debian-1deb10u1

    06:13:46.821 INFO HaplotypeCaller - Start Date/Time: September 10, 2021 at 6:13:46 AM UTC

    06:13:46.822 INFO HaplotypeCaller - ------------------------------------------------------------

    06:13:46.822 INFO HaplotypeCaller - ------------------------------------------------------------

    06:13:46.822 INFO HaplotypeCaller - HTSJDK Version: 2.24.0

    06:13:46.822 INFO HaplotypeCaller - Picard Version: 2.25.0

    06:13:46.822 INFO HaplotypeCaller - Built for Spark Version: 2.4.5

    06:13:46.822 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    06:13:46.822 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    06:13:46.822 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    06:13:46.822 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    06:13:46.822 INFO HaplotypeCaller - Deflater: IntelDeflater

    06:13:46.822 INFO HaplotypeCaller - Inflater: IntelInflater

    06:13:46.822 INFO HaplotypeCaller - GCS max retries/reopens: 20

    06:13:46.822 INFO HaplotypeCaller - Requester pays: disabled

    06:13:46.822 INFO HaplotypeCaller - Initializing engine

    06:13:46.986 INFO IntervalArgumentCollection - Processing 202370 bp from intervals

    06:13:46.994 INFO HaplotypeCaller - Done initializing engine

    06:13:47.007 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output

    06:13:47.015 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/opt/gatk-package-4.2.0.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.so

    06:13:47.016 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/opt/gatk-package-4.2.0.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so

    06:13:47.043 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions

    06:13:47.043 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM

    06:13:47.044 INFO IntelPairHmm - Available threads: 48

    06:13:47.044 INFO IntelPairHmm - Requested threads: 4

    06:13:47.044 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation

    06:13:47.079 INFO ProgressMeter - Starting traversal

    06:13:47.079 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute

    06:13:47.487 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position 1:1290276 and possibly subsequent; at least 10 samples must have called genotypes

    06:13:57.113 INFO ProgressMeter - 2:179441424 0.2 340 2033.1

    06:14:07.128 INFO ProgressMeter - 6:132891656 0.3 790 2364.2

    06:14:17.763 INFO ProgressMeter - 9:33797817 0.5 1030 2014.1

    06:14:34.030 INFO ProgressMeter - 11:1018020 0.8 1180 1508.0

    06:14:44.031 INFO ProgressMeter - 14:106329355 0.9 1560 1643.5

    06:14:54.031 INFO ProgressMeter - 19:39221730 1.1 2020 1810.3

    06:14:56.195 INFO HaplotypeCaller - 64088 read(s) filtered by: MappingQualityReadFilter

    0 read(s) filtered by: MappingQualityAvailableReadFilter

    0 read(s) filtered by: MappedReadFilter

    59 read(s) filtered by: NotSecondaryAlignmentReadFilter

    0 read(s) filtered by: NotDuplicateReadFilter

    0 read(s) filtered by: PassesVendorQualityCheckReadFilter

    2 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter

    0 read(s) filtered by: GoodCigarReadFilter

    0 read(s) filtered by: WellformedReadFilter

    64149 total reads filtered

    06:14:56.196 INFO ProgressMeter - 22:45278966 1.2 2229 1935.0

    06:14:56.196 INFO ProgressMeter - Traversal complete. Processed 2229 total regions in 1.2 minutes.

    06:14:56.305 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.109933602

    06:14:56.305 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 26.370827184000003

    06:14:56.305 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 15.93 sec

    06:14:56.305 INFO HaplotypeCaller - Shutting down engine


    No variant at position 11 1016944 is detected.
    It looks like information about other chromosomes affect results for one specific chromosome.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Gytis Mackevičius,

    Thanks for sharing those logs, it doesn't look like are any specific error messages. 

    Are there any variants at all in your output VCF when using the all.list?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Gytis Mackevičius

    Yes, there are. Output of bcftools stats:

    # SN [2]id [3]key [4]value
    SN 0 number of samples: 1
    SN 0 number of records: 2289
    SN 0 number of no-ALTs: 904
    SN 0 number of SNPs: 1310
    SN 0 number of MNPs: 0
    SN 0 number of indels: 74
    SN 0 number of others: 1
    SN 0 number of multiallelic sites: 2
    SN 0 number of multiallelic SNP sites: 2

    The problem is inconsistency at same position (11 1016944) with two lists. With longer list there is no variant called, with shorter there is variant. And shorter list is subset of longer (Every position in shorter list exists in longer list). It looks like GATK4 HC takes information from other positions (even from other chromosomes) while doing local assembly, is there any way to solve this inconsistency?

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for updating with this description, this problem is more clear to me now! HaplotypeCaller should not be using information from other positions when doing local reassembly. If you can point me to some data of why you think that is occurring here, I can parse through what is going on.

    I think it would be helpful to take a closer look at the position 11:1016944 and work through this troubleshooting document: When HaplotypeCaller and Mutect2 do not call an expected variant. The document was written by our developers for the case you are describing, which is trying to figure out why there is confusing behavior with HaplotypeCaller. I think you will find it very helpful! 

    Let me know what you find from looking into that site.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk