Error in GATK CollectAllelicCounts
REQUIRED for all errors and issues:
a) GATK version used: gatk-4.3.0.0
b) Exact command used: gatk --java-options -Xmx25g CollectAllelicCounts -R hs37d5.fa -I X.bam --max-depth-per-sample 30 --minimum-base-quality 10 --minimum-mapping-quality 5 --intervals bed_file.bed -O output.tsv
c) Entire program log:
I am running CollectAllelicCounts on bam files to get the REF and ALT count. But only 10% of the samples run produces the required TSVs, rest of it fails to produce TSV and gives the following error.
18:41:53.752 INFO ProgressMeter - 1:16174831 2.3 3298000 1437057.9
18:41:54.605 INFO ProgressMeter - 1:24439734 2.3 4996000 2152010.5
18:41:59.721 INFO CollectAllelicCounts - Shutting down engine
[August 7, 2023 at 6:41:59 PM EDT] org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts done. Elapsed time: 2.42 minutes.
Runtime.totalMemory()=3997171712
***********************************************************************
A USER ERROR has occurred: Read xxxx:xxx:xxxx:1:xxx:xxxx:xxxx 1:156785763-156785864 is malformed: read starts with deletion. Cigar: 1D101M. Although the SAM spec technically permits such reads, this is often indicative of malformed files.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
real 2m32.774s
user 0m27.456s
sys 0m3.391s
18:42:00.313 INFO CollectAllelicCounts - Shutting down engine
[August 7, 2023 at 6:42:00 PM EDT] org.broadinstitute.hellbender.tools.copynumber.CollectAllelicCounts done. Elapsed time: 2.42 minutes.
Runtime.totalMemory()=3741319168
***********************************************************************
A USER ERROR has occurred: Read A00388:524:H7YW3DRX3:2:2230:23963:26490 1:150922703-150922802 is malformed: read starts with deletion. Cigar: 1D6M1I93M. Although the SAM spec technically permits such reads, this is often indicative of malformed files.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.”
Please note that similar bug has been reported in haplotypecaller previously, https://github.com/broadinstitute/gatk/issues/6490
and https://github.com/broadinstitute/gatk/pull/6498 but I could not find anything reported with context to CollectAlleleicCounts tool .
Any help will be greatly appreciated
-
Hi Saloni Sinha
Are you using any bam modifying tool during post processing steps that may clip reads. CIGAR strings usually do not start with a deletion therefore most GATK tools are not very fond of this notion although it may theoretically be possible in the SAM spec.
Also you may try running the tool with
--read-validation-stringency SILENT
to see if the tool will reach successful completion. Additionally you may wish to check any bam modifying tool parameters you use to see if produce an out-of-spec bam.
If you still observe issues let us know.
-
I am told that we use ABRA for post processing of the bam files. Otherwise we use bwa and GATK.
-
I did try rerunning one of the failed sample with --read-validation-stringency SILENT, but it gives same error.
-
Hi Saloni Sinha
Looks like ABRA could be producing out-of-spec bam files for analysis. If the sole use for ABRA is to realign indels then we may suggest using GATK3 tools for indel realignment which would certainly produce in-spec bam files for further processing. Although we do not support GATK3 tools anymore using them alongside with GATK4 tools for bam processing could still be valid.
Another suggestion from our team (Louis Bergelson) is to enable read filter
--read-filter GoodCigarReadFilter
This may eliminate reads with bad CIGAR strings but if your data all have these kind of CIGAR strings then you may need to use some kind of read transformer tool to fix all CIGAR strings before the analysis.
We hope this helps.
Please sign in to leave a comment.
4 comments