HaplotypeCaller calling whole chromosome as NON_REF
Answeredv4.2.0.0
gatk HaplotypeCaller \
-R ${REFERENCE}.fa \
-L ${INTERVAL} \
-I ${BAMPATH} \
-ERC GVCF \
-O $TMPDIR/${SAMPLE}.${INTERVAL}.chrom.g.vcf \
> slurm_out/${SAMPLE}.${INTERVAL}.log 2>&1
Using GATK jar /home/tommathy/apps/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/tommathy/apps/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar HaplotypeCaller -R /projects/jah1_lab/poplar_illumina/Ptrichocarpa_533_v4.0.fa -L Chr03 -I input/sorted-249.bam -ERC GVCF -O /localscratch/532003/sorted-249.Chr03.chrom.g.vcf
10:17:54.813 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/tommathy/apps/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 09, 2022 10:17:54 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:17:54.990 INFO HaplotypeCaller - ------------------------------------------------------------
10:17:54.990 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.0.0
10:17:54.990 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:17:54.990 INFO HaplotypeCaller - Executing as tommathy@tc108 on Linux v3.10.0-1062.18.1.el7.x86_64 amd64
10:17:54.990 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_322-b06
10:17:54.990 INFO HaplotypeCaller - Start Date/Time: June 9, 2022 10:17:54 AM EDT
10:17:54.990 INFO HaplotypeCaller - ------------------------------------------------------------
10:17:54.990 INFO HaplotypeCaller - ------------------------------------------------------------
10:17:54.991 INFO HaplotypeCaller - HTSJDK Version: 2.24.0
10:17:54.991 INFO HaplotypeCaller - Picard Version: 2.25.0
10:17:54.991 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
10:17:54.991 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:17:54.991 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:17:54.991 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:17:54.991 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:17:54.991 INFO HaplotypeCaller - Deflater: IntelDeflater
10:17:54.991 INFO HaplotypeCaller - Inflater: IntelInflater
10:17:54.991 INFO HaplotypeCaller - GCS max retries/reopens: 20
10:17:54.991 INFO HaplotypeCaller - Requester pays: disabled
10:17:54.991 INFO HaplotypeCaller - Initializing engine
10:17:55.336 INFO IntervalArgumentCollection - Processing 21678634 bp from intervals
10:17:55.342 INFO HaplotypeCaller - Done initializing engine
10:17:55.343 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
10:17:55.352 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
10:17:55.352 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
10:17:55.367 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/tommathy/apps/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:17:55.368 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/tommathy/apps/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:17:55.397 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:17:55.398 INFO IntelPairHmm - Available threads: 4
10:17:55.398 INFO IntelPairHmm - Requested threads: 4
10:17:55.398 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:17:55.451 INFO ProgressMeter - Starting traversal
10:17:55.452 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
10:18:05.473 INFO ProgressMeter - Chr03:158559 0.2 840 5029.9
[Ommitting traversal log for brevity, program completes full traversal of interval]
11:18:21.410 INFO ProgressMeter - Chr03:21656885 60.4 114950 1902.1
11:18:24.703 INFO HaplotypeCaller - 151718 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
94657 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
246375 total reads filtered
11:18:24.703 INFO ProgressMeter - Chr03:21676821 60.5 115068 1902.3
11:18:24.703 INFO ProgressMeter - Traversal complete. Processed 115068 total regions in 60.5 minutes.
11:18:25.027 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
11:18:25.028 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
11:18:25.028 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
11:18:25.028 INFO HaplotypeCaller - Shutting down engine
[June 9, 2022 11:18:25 AM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 60.50 minutes.
Runtime.totalMemory()=2368208896
I'm currently attempting to use the HaplotypeCaller to genotype pacbio data by chromosome. However, the g.vcf files all end up with just a single variant call that looks like
Chr03 1 . C <NON_REF> . . END=21678634 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
Which appears to be just marking the whole chromosome as NON_REF. I've had success using the HaplotypeCaller on these samples using Illumina reads.
-
Thank you for your post, Tommy Phannareth ! I want to let you know we have received your question. We'll get back to you if we have any updates or follow up questions.
Please see our Support Policy for more details about how we prioritize responding to questions.
-
Hi Tommy Phannareth,
Thanks for writing into the GATK forum and for your patience about this issue you are seeing. I am wondering if a majority of your reads are getting filtered out? How many reads do you have in your input bam?
Here's where I see that information:
11:18:24.703 INFO HaplotypeCaller - 151718 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
94657 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
246375 total reads filteredLet me know if this is the case. If not, we can continue to look into possible issues. Another helpful piece to see is if you look at the input bam in IGV and if you pick out some test sites where you would expect to see variance. Then, we can track what is going on at those sites.
Let me know,
Genevieve
-
Hi Tommy,
We haven't heard back from you so we're going to close out this ticket in our system. If you still require assistance, simply respond to this thread and we'll be happy to pick up where we left off!
Kind regards,
Genevieve
Please sign in to leave a comment.
3 comments