Mutect2 doesn't produce stats file
Hi all,
I'm using Mutect2 on GATKv.4.1.7.0 in order to call somatic short variants on matched tumor/normal samples. I launched the command as follows:
gatk Mutect2 -R ./Genome/Ref.fa -I ./BWA_mem/tumor_RG_MD.bam -I ./BWA_mem/normal_RG_MD.bam -normal normal --panel-of-normals ./PON/PON.vcf.gz -O ./MUTECT/T01.vcf.gz
The software produces the T01.vcf.gz and T01.vcf.gz.tbi outputs but I cannot get the T01.vcf.gz.stats. In the log I always get an error message that says "Error was: Fasta index file could not be opened". But the Ref.fa.fai is present in the same folder of Ref.fa. I produced it with samtools faidx
Anybody has any suggestion?
Thank you
Serena
-
Please share your entire error log.
-
The error log was the following
21:21:17.561 INFO ProgressMeter - AANG04002899.1:1801 548.0 9339460 17044.1
21:21:27.644 INFO ProgressMeter - AANG04001987.1:2701 548.1 9340620 17041.0
21:21:37.709 INFO ProgressMeter - AANG04004341.1:2101 548.3 9341740 17037.8
21:21:47.480 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 30.840286777000003
21:21:47.480 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 11950.995365315
21:21:47.481 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 7200.65 sec
21:21:47.482 INFO Mutect2 - Shutting down engine
[16 giugno 2020 21.21.47 CEST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 548.50 minutes.
Runtime.totalMemory()=23446159360
***********************************************************************A USER ERROR has occurred: Couldn't read file file:///home/sferraresso/sammarco/Data_analysis/./Genome/Felis_catus_9.0.fa. Error was: Fasta index file could not be opened: ./Genome/Felis_catus_9.0.fa.fai
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. -
Hi,
You are seeing this error because Mutect2 is unable to locate the ref file and its index or the files are corrupted. Try:
- providing absolute paths to the files
- if that doesn't work, try re-indexing your reference file.
-
Thank you very much for your suggestions,
I tried both but I always get the same error message. To your knowledge, there is any incompatibility issue between GATK and samtools versions? I'm using samtools version 0.1.19 and GATK v.4.1.7.0
Thank you
Serena
-
My ref.fa.fai index looks like this:
A1 242100913 67 60 61
A2 171471747 246136063 60 61
A3 143202405 420465740 60 61
B1 208212889 566054919 60 61
B2 155302638 777738090 60 61
B3 149751809 935629173 60 61Do you notice something wrong?
-
HI,
- Can you please confirm that the ref file has the right permissions?
- Please share the stacktrace by adding this to the command: --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
-
Hi,
here is the stack trace:
18:32:09.829 INFO ProgressMeter - AANG04003170.1:2101 372.4 9334850 25067.0
18:32:19.876 INFO ProgressMeter - AANG04000699.1:2701 372.6 9336260 25059.5
18:32:29.966 INFO ProgressMeter - AANG04004451.1:1501 372.7 9337600 25051.8
18:32:40.051 INFO ProgressMeter - AANG04003100.1:301 372.9 9339110 25044.6
18:32:50.159 INFO ProgressMeter - AANG04000332.1:2401 373.1 9340630 25037.3
18:32:50.992 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 16.67457894
18:32:50.993 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 6996.308272728001
18:32:50.993 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 5668.21 sec
18:32:50.994 INFO Mutect2 - Shutting down engine
[9 luglio 2020 18.32.50 CEST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 373.11 minutes.
Runtime.totalMemory()=26258964480
***********************************************************************A USER ERROR has occurred: Couldn't read file file:///home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa. Error was: Fasta index file could not be opened: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$CouldNotReadInputFile: Couldn't read file file:///home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa. Error was: Fasta index file could not be opened: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:159)
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:125)
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:110)
at org.broadinstitute.hellbender.engine.ReferenceFileSource.<init>(ReferenceFileSource.java:35)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.makeStandardMutect2PostFilterReadTransformer(Mutect2Engine.java:167)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.makePostReadFilterTransformer(Mutect2.java:242)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:171)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Caused by: htsjdk.samtools.SAMException: Fasta index file could not be opened: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai
at htsjdk.samtools.reference.FastaSequenceIndex.<init>(FastaSequenceIndex.java:74)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:98)
at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:139)
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:148)
... 13 more
Caused by: java.nio.file.FileSystemException: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai: Troppi file aperti
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at htsjdk.samtools.reference.FastaSequenceIndex.<init>(FastaSequenceIndex.java:71)
... 16 more -
Here are the permissions of ref and index files:
-rw-rw-r-- 1 sferraresso sferraresso 2,4G 4 mar 22.39 Felis_catus_9.0.fa
-rw-rw-r-- 1 sferraresso sferraresso 165K 15 giu 12.31 Felis_catus_9.0.fa.fai
-
This is weird. This issue is almost always resolved by recreating the fasta file. As mentioned by you earlier, maybe this issue is because of the old version of samtools you are using(2013). Can you try to use a more recent version and see if that resolves the issue? Also while you are updating versions, you might as well also update to the latest GATK v4.1.8.0. I don't think the GATK version is the cause of this error here but no harm in updating to the latest especially since GATKv.4.1.7.0 had some bugs with Mutect2 and HaplotypeCaller tools.
Also can you try to run HaplotypeCaller with the same ref and input bam. So just a basic - HaplotypeCaller -R -V -O command. This is to test if the issue is in fact with malformed fa.fai file or with Mutect2.
-
Thank you again for your help,
I did run HaplotypeCaller as you suggested and the process completed successfully, no error messages and the output file is ok. I guess the problem is not a malformed fa.fai then. I will try to update both samtools and GATK versions and I will let you know
Thank you
-
Hi again,
I updated both samtools (to v1.10) and GATK (to v 4.1.8.0). I recreated the index file and I launched Mutect2 but, again, I get the same error and Mutect2 does not produce the stats file.
I forgot to mention that I employed Mutect2 with those very same files in order to create the Panel of Normals and everything went fine.
I confess I'm a bit lost now.
Could you please confirm that the permissions of the fa.fai file are correct?
-rw-rw-r-- 1 sferraresso sferraresso 165K Jul 14 16:42 Felis_catus_9.0.fa.fai
Thank you
-
Serena Ferraresso As a hack you may be able to avoid this with the -ignore-itr-artifacts argument. This flag disables a read transformer that repairs a very rare artifact in which transposons and other repeated elements self-hybridize and then T4 polymerase incorrectly fills in overhanging ends using the repeat DNA as the template. Unless your data has 20 or so times as many soft-clips as you would usually expect, the artifact is absent.
We still ought to figure out the issue, though, and I believe I have an answer. Usually when one gets a "could not open file" exception it's because the file does not exist or one does not have access, but that's not the case here. Furthermore, I looked into our code and in the GATK if an index is missing we say so explicitly. The file is present, but not being opened. Also, we know the file is being opened successfully quite a few times because M2 runs for a few hundred minutes before failing.
I think the fundamental problem here is related to the "Troppi file aperti" in your stack trace. For every read shard (about 30000 bases) the GATK initializes a new read transformer. This is fine in HaplotypeCaller because its read transformer does not use the reference, but not in M2 due to the aforementioned inverted tandem repeat transformer.
If you want to run M2 with default settings, you can probably play around with your ulimit settings to avoid the error. I'm also very surprised that the JVM garbage collector is leaving these file handles open, since each read shard is processed separately in series. You could also scatter the genome into multiple runs of M2 and merge the outputs.
-
Dear David,
thank you for your reply. I now launched Mutect2 with --ignore-itr-artifacts argument, I will keep you updated.
I already increased the ulimit to 10000 before the last run but maybe it wasn't enough, I will try to increase it even more.
Thank you
Serena
-
Dear David,
I launched M2 with the --ignore-itr-artifacts argument and it worked fine, M2 completed succesfully and produced the stats file.
Anyway I also increased the ulimit and now I am trying again without --ignore-itr-artifacts argument, lets see if the problem is solved.
I would have another question if you don't mind. I launched M2 in paired tumor/normal mode (see above the command line) but it is not clear to me whether to launch LearnReadOrientationMode I have to run M2 also in tumor-only mode to produce the f1r2.tar.gz file or I can simply add the --f1r2-tar-gz argument to the paired mode.
Thank you for your help
Serena
-
You can add the -f1r2-tar-gz argument in paired mode. Mutect2 does not record F1R2 data for the normal even if it is present.
-
Dear David,
this is just to inform you that, by increasing the ulimit, M2 completed successfully without the --ignore-itr-artifacts argument.
Thank you again for your help.
Serena
-
Thank you for letting me know!
Please sign in to leave a comment.
17 comments