GATK4 HaplotypeCaller - read is malformed
Hi,
I used RNA-Seq data (a pair of control and treatment sample) as input for GATK v4.1.5.0 for variant calling.
Here is my code:
gatk --java-options "-Xmx20G -Djava.io.tmpdir=./" HaplotypeCaller -ERC GVCF -R hg38.fa -I Control_recal.bam --dbsnp dbsnp_146.hg38.vcf.gz -O Control_g.vcf
(same code for the treatment sample except for the prefix)
However the engine shut down after a few minutes and an error returned as follows:
A USER ERROR has occurred: Read A00355:100:HJCKMDRXX:1:1154:5367:30765 chr1:43621182-43621257 is malformed: read ends with deletion. Cigar: 58H52M2D2M1D3M1I5M4I2M5I3M2D3M2I1D. Although the SAM spec technically permits such reads, this is often indicative of malformed files.
A brief info of my pipeline:
STAR 2-pass for alignment to human genome hg38
Picard add read groups, sort, and mark duplicates
GATK split 'N' trim, base quality recalibration, apply BQSR, and variant calling
Could you help me fix this problem? Thanks
-
Now I am trying to run the basic mode (no GVCF) and it does not return the error with the malformed read.
What's the difference between basic vs. -ERC GVCF (.vcf vs. g.vcf) when calling variants? -
Yi-Sian Lin I believe we fixed that in this PR: https://github.com/broadinstitute/gatk/pull/6498. You can build the master branch on github, or wait for the next minor release.
Please sign in to leave a comment.
2 comments