Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ValidateSamFile tmp not found ERROR with not output or Exception

0

4 comments

  • Avatar
    Bhanu Gandham

    Can you try it without  --TMP_DIR and see if that resolves the issue?

    0
    Comment actions Permalink
  • Avatar
    UGG

    I have the same problem both with ValidateSamFile and MarkDuplicates from Picard tools. I check my aligned bam and fixmate bam file with ValidateSamFile (picard.jar ValidateSamFile I=input.bam MODE=SUMMARY), everything is ok. But when I sort it with either Picard tools (picard.jar MarkDuplicates INPUT=sorted.bam OUTPUT=dedup.bam METRICS_FILE=metrics.txt) itself or through GATK, gives me the same error: 

    ERROR 2020-05-04 19:34:24 ValidateSamFile /tmp/ugg/CSPI.12647109141870313024.tmp/5097.tmpnot found

    Here is the error of MarkDuplicates (gatk MarkDuplicates --VALIDATION_STRINGENCY LENIENT -I "sorted.bam" -O "dedup.bam" --METRICS_FILE "dedup_metrics.txt" --REMOVE_DUPLICATES true --ASSUME_SORTED true --CREATE_INDEX true): 

    htsjdk.samtools.SAMException: /media/ugg/UGG/tmp/ugg/CSPI.5366672143614319761.tmp/5097.tmpnot found
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:64)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
    at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
    at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:566)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)
    Caused by: java.io.FileNotFoundException: /media/ugg/UGG/tmp/ugg/CSPI.5366672143614319761.tmp/5097.tmp (Too many open files)
    at java.base/java.io.FileOutputStream.open0(Native Method)
    at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:61)
    ... 12 more

    The problem seems to be "Too many open files", to solve this, I have used the parameter  -MAX_FILE_HANDLES  in Picard to increase the "maximum number of file handles to keep open when spilling read ends to disk", increased up to 500000. But it did not work, the result is same. I have also tried an older version of picard.jar, also analyzed on a workstation (Xeon® Gold 5118 Processor, 64 GB ram, 48 CPUs x2.30GHz), but the result is the same. 

    Note: setting TMP_DIR is not helping.

    I would be glad if anyone could share the solution if you have..

    Thanks..

    0
    Comment actions Permalink
  • Avatar
    James

    Removing the TMP_DIR argument did not fix the issue. I tried using an older version of Piccard (v.2.1.1) to see if the problem with ValidateSamFiles persists and it did. However it printed a more detailed error message saying too many open files

    The solution from this post suggests that the limit on the number of open files needs to be increased. In my case, I had to get the server admin the increase it to 65,000 and ValidateSamFiles now works without issue.

    It's strange that the Error was not being printed in full with the most recent version of GATK, despite VERBOSITY being set to DEBUG. Is this a bug, perhaps? Nonetheless, the program works as intended and I suspect this solution will also work for MarkDuplicates as well. 

    0
    Comment actions Permalink
  • Avatar
    UGG

    Thank you, I have solved it at last :) 

    We can raise the limit with the command below:

    sudo sh -c "ulimit -n 65000 && exec su $LOGNAME"


    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk