FilterAlignmentArtifacts error
Hey there :)
I'm new and need some advice on this please:
I was trying to run FilterAlignmentArtifacts on a vcf file created by FilterMutectCalls and Mutect2 in tumor normal mode. The tumor is TB and the normal TN.
Everything worked fine for another tumor sample (let's call it TA) from the same patient, using the same normal. For the TB tumor I always get this error.
The output file is already the vcf but it only goes up to some point in chr. 1. So I guess that at some specific mutation there is a problem FilterAlignmentArtifacts gets stuck upon. (?)
Both TA and TB were of similar size and same format..
Running suggestions from https://gatk.broadinstitute.org/hc/en-us/articles/360035532372-Java-is-using-too-many-resources-threads-memory-or-CPU- didn't work.
a) GATK version used: 4.1.6.0
b) Exact GATK commands used:
./gatk FilterAlignmentArtifacts -R Homo_sapiens_assembly19.fasta -V TB_TN_filtered.vcf -I TB.broad_120x.Homo_sapiens_assembly19.fasta.bam --bwa-mem-index-image Homo_sapiens_assembly19.fasta.img -O TB_TN_realignment.vcf
c) The entire error log if applicable.
A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000011b9d7569, pid=85326, tid=0x0000000000001903
#
# JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [libgkl_smithwaterman6597482876379844044.dylib+0x2569] smithWatermanBackTrack(dnaSeqPair*, int, int, int, int, int*, int)+0x959
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /.../hs_err_pid85326.log
The error report file is quite long, I can post it if necessary.
Using the "ulimit -c unlimited" suggestion from the error code gives back a similar error message and log:
A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000000100dd8569, pid=7005, tid=0x0000000000002703
#
# JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [libgkl_smithwaterman2985836283683763049.dylib+0x2569] _Z22smithWatermanBackTrackP10dnaSeqPairiiiiPii+0x959
#
-
Nicola Dick That looks like a bug in a native C library that we depend on. The only quick solution is also a dirty one — just exclude the variant where the output VCF stops, re-run, and paste the variant back into the output. That probably already occurred to you, but if it means anything I am granting my blessing for this hack.
Since it's native code I would not be shocked if it's hardware-specific. In fact, I know it is -- this is Intel code, and it is optimized to run on their processors. Are TA and TB running on different machines, or on a heterogeneous cluster? Just switching computers might work.
By the way, you should use a BWA mem index image generated from the hg38 reference, regardless of the reference to which the original bam is aligned. The idea is to realign reads to the best possible reference. If you're lucky, this change alone will randomly avoid the error.
-
Hello David Benjamin and thank you for your answer.
This is interesting, leaving out the line didn't work. However I tried using the hg38 reference (
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/) but now contigs of reads and reference don't overlap. Do you have a specific suggestion which hg38 reference I should use? I am not sure if I got the correct one, because the error is not only in naming [chr1,chr2,..] into [1,2,..] but the ref looks like this, with a lot of weird names in between:reference contigs = [chr1, chr10, chr11, chr11_KI270721v1_random, chr12, chr13, chr14, chr14_GL000009v2_random, chr14_GL000225v1_random, chr14_KI270722v1_random, chr14_GL000194v1_random, chr14_KI270723v1_random, chr14_KI270724v1_random, chr14_KI270725v1_random, chr14_KI270726v1_random, chr15, chr15_KI270727v1_random, chr16, chr16_KI270728v1_random, chr17, chr17_GL000205v2_random, chr17_KI270729v1_random, chr17_KI270730v1_random, chr18, chr19, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, chr1_KI270711v1_random, chr1_KI270712v1_random, chr1_KI270713v1_random, chr1_KI270714v1_random, chr2, chr20, chr21, chr22, chr22_KI270731v1_random, chr22_KI270732v1_random, chr22_KI270733v1_random, chr22_KI270734v1_random, chr22_KI270735v1_random, chr22_KI270736v1_random, chr22_KI270737v1_random, chr22_KI270738v1_random, chr22_KI270739v1_random, chr2_KI270715v1_random, chr2_KI270716v1_random, chr3, chr3_GL000221v1_random, chr4, chr4_GL000008v2_random, chr5, chr5_GL000208v1_random, chr6, chr7, chr8, chr9, chr9_KI270717v1_random, chr9_KI270718v1_random, chr9_KI270719v1_random, chr9_KI270720v1_random, chr1_KI270762v1_alt, chr1_KI270766v1_alt, chr1_KI270760v1_alt, chr1_KI270765v1_alt, chr1_GL383518v1_alt, chr1_GL383519v1_alt, chr1_GL383520v2_alt, chr1_KI270764v1_alt, chr1_KI270763v1_alt, chr1_KI270759v1_alt, chr1_KI270761v1_alt, chr2_KI270770v1_alt, chr2_KI270773v1_alt, chr2_KI270774v1_alt, chr2_KI270769v1_alt, chr2_GL383521v1_alt, chr2_KI270772v1_alt, chr2_KI270775v1_alt, chr2_KI270771v1_alt, chr2_KI270768v1_alt, chr2_GL582966v2_alt, chr2_GL383522v1_alt, chr2_KI270776v1_alt, chr2_KI270767v1_alt, chr3_JH636055v2_alt, chr3_KI270783v1_alt, chr3_KI270780v1_alt, chr3_GL383526v1_alt, chr3_KI270777v1_alt, chr3_KI270778v1_alt, chr3_KI270781v1_alt, chr3_KI270779v1_alt, chr3_KI270782v1_alt, chr3_KI270784v1_alt, chr4_KI270790v1_alt, chr4_GL383528v1_alt, chr4_KI270787v1_alt, chr4_GL000257v2_alt, chr4_KI270788v1_alt, chr4_GL383527v1_alt, chr4_KI270785v1_alt, chr4_KI270789v1_alt, chr4_KI270786v1_alt, chr5_KI270793v1_alt, chr5_KI270792v1_alt, chr5_KI270791v1_alt, chr5_GL383532v1_alt, chr5_GL949742v1_alt, chr5_KI270794v1_alt, chr5_GL339449v2_alt, chr5_GL383530v1_alt, chr5_KI270796v1_alt, chr5_GL383531v1_alt, chr5_KI270795v1_alt, chr6_GL000250v2_alt, chr6_KI270800v1_alt, chr6_KI270799v1_alt, chr6_GL383533v1_alt, chr6_KI270801v1_alt, chr6_KI270802v1_alt, chr6_KB021644v2_alt, chr6_KI270797v1_alt, chr6_KI270798v1_alt, chr7_KI270804v1_alt, chr7_KI270809v1_alt, chr7_KI270806v1_alt, chr7_GL383534v2_alt, chr7_KI270803v1_alt, chr7_KI270808v1_alt, chr7_KI270807v1_alt, chr7_KI270805v1_alt,.....]
Many Greetings,
Nicola
-
What you are seeing are alternate haplotype contigs that are a standard part of hg38. As long as your reads are aligned to a subset of the reference contigs there will be no error. That is, as long as your reads and reference have the same convention — both "1" or both "chr1" etc — there should be no problem even if the reference has contigs not present in your BAM.
Just to be clear because it really is a bit messy, if your reads are aligned to hg19, you want to use the same hg19 fasta for the -R argument, but you should use any hg38 BWA mem index image for the -bwa-mem-index-image argument. The index image's contigs do not have to match anything else.
-
Try to downgrade GATK to 4.1.3.0
Please sign in to leave a comment.
4 comments