Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GetPileupSummary and Mutect2 have an issue with symlinked large files (>150GB)

0

5 comments

  • Avatar
    Gökalp Çelik

    Hi Daniel

    Are you using docker to run your GATK workflows? If yes mounting the folder containing input files should be able to solve the problem. Why do you need to use symlinks? File systems do different stuff about symlinks and sometimes they don't work as expected with many different tools. 

    Can you elaborate more on the need for the symlinks? 

    0
    Comment actions Permalink
  • Avatar
    Daniel

    Hi Gökalp,

    Yes and no - I use the docker image provided at dockerhub and run it with apptainer.
    Mounting the necessary directories does not seem to be the issue (it works on the smaller files but not the larger ones), same symlinking strategy and same directories.

    I need the files as inputs for a workflow I have written in snakemake, which had a different file structure.

    I will check with my Admins if their is something "wired" happening to the symlinks after a certain file size.

    As mentioned before, I can just grab the real path (os.path.realpath) in snakemake to circumvent this issue, but I was still wondering why this is.

    0
    Comment actions Permalink
  • Avatar
    Daniel

    After some digging and help, I figured out that changing the path to something shorter actually allowed me to run the tool!

    Could you comment on a maximum path lengtg, as in number of characters?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. Our code does not have a limit per se however usually file systems and OS kernel's have limits for how long symlinks could be. That could be the reason why your code runs with shorter symlinks but not with longer ones. 

    The original error message is from the java itself not GATK. Looks like the nio is trying to readthrough a file but unable to get the correct data or gets a bunch of junk bytes through the link therefore it throws this error. 

    In short this does not look like a GATK issue but most likely a filesystem and/or Java issue. 

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    Daniel

    Thank you for your help!

    This makes sense and I now know how to avoid it ;)

    Cheers!
    Daniel

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk