Cannot use haplotype caller on STAR-aligned data
AnsweredREQUIRED for all errors and issues:
a) GATK version used: 4.2.6.0
b) Exact command used: /home/703404669/gatk-4.2.6.0/gatk --java-options "-Xmx4g" HaplotypeCaller --input HG7H5Aligned.sortedByCoord.out.ar.bam --reference /usr/local/share/hg38/hg38.fa -O HGFLY.g.vcf.gz -ERC GVCF
c) Entire program log:
Using GATK jar /home/703404669/gatk-4.2.6.0/gatk-package-4.2.6.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /home/703404669/gatk-4.2.6.0/gatk-package-4.2.6.0-local.jar HaplotypeCaller --input HG7H5Aligned.sortedByCoord.out.ar.bam --reference /usr/local/share/hg38/hg38.fa -O HGFLY.g.vcf.gz -ERC GVCF
10:47:55.924 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/703404669/gatk-4.2.6.0/gatk-package-4.2.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:47:56.013 INFO HaplotypeCaller - ------------------------------------------------------------
10:47:56.013 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.6.0
10:47:56.013 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:47:56.013 INFO HaplotypeCaller - Executing as 703404669@BioITUtil2.sanfordhealth.org on Linux v4.18.0-372.19.1.el8_6.x86_64 amd64
10:47:56.013 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_342-b07
10:47:56.014 INFO HaplotypeCaller - Start Date/Time: August 5, 2022 10:47:55 AM CDT
10:47:56.014 INFO HaplotypeCaller - ------------------------------------------------------------
10:47:56.014 INFO HaplotypeCaller - ------------------------------------------------------------
10:47:56.014 INFO HaplotypeCaller - HTSJDK Version: 2.24.1
10:47:56.014 INFO HaplotypeCaller - Picard Version: 2.27.1
10:47:56.014 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
10:47:56.014 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:47:56.014 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:47:56.014 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:47:56.014 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:47:56.014 INFO HaplotypeCaller - Deflater: IntelDeflater
10:47:56.014 INFO HaplotypeCaller - Inflater: IntelInflater
10:47:56.014 INFO HaplotypeCaller - GCS max retries/reopens: 20
10:47:56.014 INFO HaplotypeCaller - Requester pays: disabled
10:47:56.015 INFO HaplotypeCaller - Initializing engine
10:47:56.285 INFO HaplotypeCaller - Done initializing engine
10:47:56.287 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
10:47:56.296 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
10:47:56.297 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
10:47:56.308 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/703404669/gatk-4.2.6.0/gatk-package-4.2.6.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:47:56.309 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/703404669/gatk-4.2.6.0/gatk-package-4.2.6.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:47:56.327 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
10:47:56.327 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:47:56.327 INFO IntelPairHmm - Available threads: 10
10:47:56.327 INFO IntelPairHmm - Requested threads: 4
10:47:56.328 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:47:56.354 INFO ProgressMeter - Starting traversal
10:47:56.355 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
10:47:56.364 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
10:47:56.364 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
10:47:56.365 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
10:47:56.365 INFO HaplotypeCaller - Shutting down engine
[August 5, 2022 10:47:56 AM CDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1346895872
java.nio.BufferUnderflowException
at java.nio.ByteBuffer.get(ByteBuffer.java:696)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:284)
at java.nio.ByteBuffer.get(ByteBuffer.java:723)
at htsjdk.samtools.MemoryMappedFileBuffer.readBytes(MemoryMappedFileBuffer.java:34)
at htsjdk.samtools.AbstractBAMFileIndex.readBytes(AbstractBAMFileIndex.java:439)
at htsjdk.samtools.AbstractBAMFileIndex.verifyIndexMagicNumber(AbstractBAMFileIndex.java:376)
at htsjdk.samtools.AbstractBAMFileIndex.<init>(AbstractBAMFileIndex.java:70)
at htsjdk.samtools.AbstractBAMFileIndex.<init>(AbstractBAMFileIndex.java:64)
at htsjdk.samtools.DiskBasedBAMFileIndex.<init>(DiskBasedBAMFileIndex.java:46)
at htsjdk.samtools.BAMFileReader.getIndex(BAMFileReader.java:419)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:931)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
at org.broadinstitute.hellbender.engine.MultiIntervalLocalReadShard.iterator(MultiIntervalLocalReadShard.java:134)
at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.<init>(AssemblyRegionIterator.java:86)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:188)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
I have used STAR to align some whole exome sequencing files:
STAR --readFilesCommand zcat --runThreadN 4 --genomeDir /usr/local/share/hg38 --outFileNamePrefix HGFLY --readFilesIn AGKD01_CKDN220004842-1A_HGFLYDSX3_L1_1.fq.gz AGKD01_CKDN220004842-1A_HGFLYDSX3_L1_2.fq.gz --outSAMtype BAM SortedByCoordinate
but when I run haplotypeCaller in the next step:
~/gatk-4.2.6.0/gatk --java-options "-Xmx4g" HaplotypeCaller --input HGFLYAligned.sortedByCoord.out.bam --reference /usr/local/share/hg38/hg38.fa -O HGFLY.g.vcf.gz -ERC GVCF
I get an error:
A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
but when I try to view the samples `samtools view -H HGFLYAligned.sortedByCoord.out.bam | grep RG` I get none.
I have found similar errors reported, https://gatk.broadinstitute.org/hc/en-us/community/posts/360075892511-emit-ref-confidence-error but I don't see a solution to my problem there.
It seems that my alignment produced no sample names, and I don't see how STAR can output sample names.
A sample view of my bam file:
A00564:479:HGFLYDSX3:1:2420:11686:10708 163 chr1 10001 255 6S108M36S = 10191
170948 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
TAACCCTAACCCTAACCCGAAACCCAAAACCCAAAACCCGAAACCCAAAACCCA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF
FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFF,F::FFFF:F:F,FFFFF,,FF,:,,,FFF:F,,F::,:F,F,F,:F,
NH:i:1 HI:i:1 AS:i:200 nM:i:10
A00564:479:HGFLYDSX3:1:2573:2889:29669 163 chr1 10001 255 1S113M36S = 10005
104 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CTAACCCTAACCCAAACCCTAACCCTAACCCTAAACCTAACCCTCACCCTAACC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF
FFFFFF::FFFFFFFFFFF,FFFFFFF,FFF,FFFF,,::FFF,FFFFF,,FFFF,FFFFF,:FFF,,F:FF,:F,FFF,,F,FF,FFFFF,,,F,F,,,,F
NH:i:1 HI:i:1 AS:i:207 nM:i:2
A00564:479:HGFLYDSX3:1:2140:5656:8844 163 chr1 10004 3 106M44S = 10004 110
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CAGCGCCAGAGCCTAGGCGGGGGCCTACGGCTGGGCCGGAGGGTGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFF:FFFFFFFF:F::FF,FF,FFFF:,,F,::,,,,,,,,:,,,,:,,,,,,,,,,,,,:,,:,,,,,,,,,,,,,,:,, NH:i:2
HI:i:1 AS:i:206 nM:i:4
I posted on BioStars, but didn't get an answer.
How can I call haplotypes/variants? Thanks for your help
-
Hi David Condon,
Thank you for writing to the GATK forum! I hope we can help you to sort this out.
Firstly, you mention a User Error in the HaplotypeCaller step; I only see a BufferUnderflowException in the stack trace you included. Could you please clarify where you are seeing this error?
The BufferUnderflowException looks particularly unfortunate; this may be an external I/O issue. Could you also clarify whether your bam has read groups defined?
Hopefully, the above information will help us better understand the origin of the issue. Please let me know if any other questions arise in the meantime!
Best,
Anthony -
Hi Anthony,
the problem is that STAR is not meant for whole-exome sequencing, so it was my fault
Please sign in to leave a comment.
2 comments