Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 doesn't produce stats file

0

17 comments

  • Avatar
    Bhanu Gandham

    Please share your entire error log.

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    The error log was the following

    21:21:17.561 INFO ProgressMeter - AANG04002899.1:1801 548.0 9339460 17044.1
    21:21:27.644 INFO ProgressMeter - AANG04001987.1:2701 548.1 9340620 17041.0
    21:21:37.709 INFO ProgressMeter - AANG04004341.1:2101 548.3 9341740 17037.8
    21:21:47.480 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 30.840286777000003
    21:21:47.480 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 11950.995365315
    21:21:47.481 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 7200.65 sec
    21:21:47.482 INFO Mutect2 - Shutting down engine
    [16 giugno 2020 21.21.47 CEST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 548.50 minutes.
    Runtime.totalMemory()=23446159360
    ***********************************************************************

    A USER ERROR has occurred: Couldn't read file file:///home/sferraresso/sammarco/Data_analysis/./Genome/Felis_catus_9.0.fa. Error was: Fasta index file could not be opened: ./Genome/Felis_catus_9.0.fa.fai

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi,

     

    You are seeing this error because Mutect2 is unable to locate the ref file and its index or the files are corrupted. Try:

    1. providing absolute paths to the files
    2. if that doesn't work, try re-indexing your reference file.
    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Thank you very much for your suggestions,

    I tried both but I always get the same error message. To your knowledge, there is any incompatibility issue between GATK and samtools versions? I'm using samtools version 0.1.19 and GATK v.4.1.7.0

    Thank you

    Serena

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    My ref.fa.fai index looks like this:

    A1 242100913 67 60 61
    A2 171471747 246136063 60 61
    A3 143202405 420465740 60 61
    B1 208212889 566054919 60 61
    B2 155302638 777738090 60 61
    B3 149751809 935629173 60 61

    Do you notice something wrong?

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    HI,

     

    1. Can you please confirm that the ref file has the right permissions?
    2. Please share the stacktrace by adding this to the command: --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Hi,

    here is the stack trace:


    18:32:09.829 INFO ProgressMeter - AANG04003170.1:2101 372.4 9334850 25067.0
    18:32:19.876 INFO ProgressMeter - AANG04000699.1:2701 372.6 9336260 25059.5
    18:32:29.966 INFO ProgressMeter - AANG04004451.1:1501 372.7 9337600 25051.8
    18:32:40.051 INFO ProgressMeter - AANG04003100.1:301 372.9 9339110 25044.6
    18:32:50.159 INFO ProgressMeter - AANG04000332.1:2401 373.1 9340630 25037.3
    18:32:50.992 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 16.67457894
    18:32:50.993 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 6996.308272728001
    18:32:50.993 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 5668.21 sec
    18:32:50.994 INFO Mutect2 - Shutting down engine
    [9 luglio 2020 18.32.50 CEST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 373.11 minutes.
    Runtime.totalMemory()=26258964480
    ***********************************************************************

    A USER ERROR has occurred: Couldn't read file file:///home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa. Error was: Fasta index file could not be opened: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai

    ***********************************************************************
    org.broadinstitute.hellbender.exceptions.UserException$CouldNotReadInputFile: Couldn't read file file:///home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa. Error was: Fasta index file could not be opened: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:159)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:125)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:110)
    at org.broadinstitute.hellbender.engine.ReferenceFileSource.<init>(ReferenceFileSource.java:35)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.makeStandardMutect2PostFilterReadTransformer(Mutect2Engine.java:167)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.makePostReadFilterTransformer(Mutect2.java:242)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:171)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)
    Caused by: htsjdk.samtools.SAMException: Fasta index file could not be opened: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai
    at htsjdk.samtools.reference.FastaSequenceIndex.<init>(FastaSequenceIndex.java:74)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:98)
    at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:139)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.<init>(CachingIndexedFastaSequenceFile.java:148)
    ... 13 more
    Caused by: java.nio.file.FileSystemException: /home/sferraresso/sammarco/Data_analysis/Genome/Felis_catus_9.0.fa.fai: Troppi file aperti
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at htsjdk.samtools.reference.FastaSequenceIndex.<init>(FastaSequenceIndex.java:71)
    ... 16 more

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Here are the permissions of ref and index files:

    -rw-rw-r-- 1 sferraresso sferraresso 2,4G 4 mar 22.39 Felis_catus_9.0.fa

    -rw-rw-r-- 1 sferraresso sferraresso 165K 15 giu 12.31 Felis_catus_9.0.fa.fai

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Serena Ferraresso

     

    This is weird. This issue is almost always resolved by recreating the fasta file. As mentioned by you earlier, maybe this issue is because of the old version of samtools you are using(2013). Can you try to use a more recent version and see if that resolves the issue? Also while you are updating versions, you might as well also update to the latest GATK v4.1.8.0. I don't think the GATK version is the cause of this error here but no harm in updating to the latest especially since GATKv.4.1.7.0 had some bugs with Mutect2 and HaplotypeCaller tools.

     

    Also can you try to run HaplotypeCaller with the same ref and input bam. So just a basic - HaplotypeCaller -R -V -O command. This is to test if the issue is in fact with malformed fa.fai file or with Mutect2.

     

     

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Thank you again for your help,

    I did run HaplotypeCaller as you suggested and the process completed successfully, no error messages and the output file is ok. I guess the problem is not a malformed fa.fai then. I will try to update both samtools and GATK versions and I will let you know 

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Hi again,

    I updated both samtools (to v1.10) and GATK (to v 4.1.8.0). I recreated the index file and I launched Mutect2 but, again, I get the same error and Mutect2 does not produce the stats file. 

    I forgot to mention that I employed Mutect2 with those very same files in order to create the Panel of Normals and everything went fine.

    I confess I'm a bit lost now.

    Could you please confirm that the permissions of the fa.fai file are correct?

    -rw-rw-r-- 1 sferraresso sferraresso 165K Jul 14 16:42 Felis_catus_9.0.fa.fai

     

    Thank you

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Serena Ferraresso As a hack you may be able to avoid this with the -ignore-itr-artifacts argument.  This flag disables a read transformer that repairs a very rare artifact in which transposons and other repeated elements self-hybridize and then T4 polymerase incorrectly fills in overhanging ends using the repeat DNA as the template.  Unless your data has 20 or so times as many soft-clips as you would usually expect, the artifact is absent.

    We still ought to figure out the issue, though, and I believe I have an answer.  Usually when one gets a "could not open file" exception it's because the file does not exist or one does not have access, but that's not the case here.  Furthermore, I looked into our code and in the GATK if an index is missing we say so explicitly.  The file is present, but not being opened.  Also, we know the file is being opened successfully quite a few times because M2 runs for a few hundred minutes before failing.

    I think the fundamental problem here is related to the "Troppi file aperti" in your stack trace.  For every read shard (about 30000 bases) the GATK initializes a new read transformer.  This is fine in HaplotypeCaller because its read transformer does not use the reference, but not in M2 due to the aforementioned inverted tandem repeat transformer.

    If you want to run M2 with default settings, you can probably play around with your ulimit settings to avoid the error.  I'm also very surprised that the JVM garbage collector is leaving these file handles open, since each read shard is processed separately in series.  You could also scatter the genome into multiple runs of M2 and merge the outputs.

     

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Dear David,

    thank you for your reply. I now launched Mutect2 with --ignore-itr-artifacts argument, I will keep you updated.

    I already increased the ulimit to 10000 before the last run but maybe it wasn't enough, I will try to increase it even more.

    Thank you

    Serena

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Dear David,

    I launched M2 with the --ignore-itr-artifacts argument and it worked fine, M2 completed succesfully and produced the stats file.

    Anyway I also increased the ulimit and now I am trying again without --ignore-itr-artifacts argument, lets see if the problem is solved. 

     

    I would have another question if you don't mind. I launched M2 in paired tumor/normal mode (see above the command line) but it is not clear to me whether to launch LearnReadOrientationMode I have to run M2 also in tumor-only mode to produce the f1r2.tar.gz file or I can simply add the --f1r2-tar-gz argument to the paired mode.

    Thank you for your help

    Serena

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    You can add the -f1r2-tar-gz argument in paired mode.  Mutect2 does not record F1R2 data for the normal even if it is present.

    0
    Comment actions Permalink
  • Avatar
    Serena Ferraresso

    Dear David,

    this is just to inform you that, by increasing the ulimit, M2 completed successfully without the  --ignore-itr-artifacts argument.

     

    Thank you again for your help.

    Serena

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Thank you for letting me know!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk