Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mysterious Funcotator issue

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Robert Bremel,

    There is an issue with the Funcotator data sources 1.6 and we are recommending that users do not use these data sources. Here is the issue ticket for more information: https://github.com/broadinstitute/gatk/issues/7265

    Could you try with the 1.7 data sources and then paste your stack trace again if you get the same issue?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robert Bremel

    Thanks Genevieve, 

    I will download and install the 1.7 data sources. 

    I thought I was really up to date using 1.6 as 1.2 is in your examples. 

    I found a workaround that I could remove two lines from the .vcf  -- the ostensible offending one and the next one then it worked okay.  I will get around to do a test in the next week or so. 

    _Bob

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Bob,

    I was looking at this with my coworker and found that I was actually mistaken, the issue is only with the germline 1.6 resources and not the somatic 1.6 resources. So the resources are not the issue here!

    It looks like you have found another example of this known GATK issue: https://github.com/broadinstitute/gatk/issues/6651. If you want to help out our dev team, you can comment on that issue ticket and provide test data if necessary. Your workaround seems like the best idea as of now, but hopefully we will be able to get a fix for this bug soon!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Robert Bremel

    Thanks again Genevieve,

    I think I have stashed the offending lines somewhere I will see if I can find them.  It was a really unusual indel with three TLOD values each of which was mapping perhaps an indel 100nt in length, so I thought the line length limit may have made sense.  

    Looks like I am having unlucky streak.  We live in a rural area so I set about downloading the v1.7 overnight.  It almost finished?  No evidence of the files anywhere.  I was downloading to an SSD mapped as a docker volume

    Seems as though it almost finished and then things vanished?

     

    _Bob

     

    ./gatk FuncotatorDataSourceDownloader --somatic --validate-integrity --extract-after-download

    03:56:45.563 INFO NioFileCopierWithProgressMeter - Transfer: 87.50% complete. Est. time remaining: 46:19.537 (@1413.35 kbps)

    03:57:45.773 INFO NioFileCopierWithProgressMeter - Transfer: 87.75% complete. Est. time remaining: 45:51.935 (@1399.24 kbps)

    03:58:42.348 INFO NioFileCopierWithProgressMeter - Transfer: 88.00% complete. Est. time remaining: 45:01.062 (@1396.40 kbps)

    03:59:38.357 INFO NioFileCopierWithProgressMeter - Transfer: 88.25% complete. Est. time remaining: 44:07.781 (@1394.73 kbps)

    04:00:33.141 INFO NioFileCopierWithProgressMeter - Transfer: 88.50% complete. Est. time remaining: 43:09.865 (@1395.47 kbps)

    04:01:28.669 INFO NioFileCopierWithProgressMeter - Transfer: 88.75% complete. Est. time remaining: 42:14.593 (@1395.20 kbps)

    04:02:03.698 INFO FuncotatorDataSourceDownloader - Shutting down engine

    [May 26, 2021 4:02:03 AM GMT] org.broadinstitute.hellbender.tools.funcotator.FuncotatorDataSourceDownloader done. Elapsed time: 336.46 minutes.

    Runtime.totalMemory()=1351090176

    code: 0

    message: All 0 reopens failed. Waited a total of 0 ms between attempts

    reason: null

    location: null

    retryable: false

    com.google.cloud.storage.StorageException: All 0 reopens failed. Waited a total of 0 ms between attempts

    at com.google.cloud.storage.contrib.nio.CloudStorageRetryHandler.handleReopenForStorageException(CloudStorageRetryHandler.java:156)

    at com.google.cloud.storage.contrib.nio.CloudStorageRetryHandler.handleStorageException(CloudStorageRetryHandler.java:119)

    at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.handleStorageException(CloudStorageReadChannel.java:272)

    at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.read(CloudStorageReadChannel.java:162)

    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)

    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)

    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)

    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)

    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)

    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)

     

    etc.....

     

     

     





     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    In the beginning of the log when GATK starts up, could you share the line that looks like this?

    GCS max retries/reopens: X

    Was your command exactly this? gatk FuncotatorDataSourceDownloader --somatic --validate-integrity --extract-after-download

    0
    Comment actions Permalink
  • Avatar
    Robert Bremel

    Hi,

    I copied it from the Funcotator webpage,  Just to be clear -- I am running the Docker version in Docker Desktop on a Windows 10 workstation.  The workstation has 128 GB memory and I believe Docker grabs about 100 GB at the outset and commandeers more if needed.

    This is all I see when I start up an interactive shell 


    PS C:\Users\Owner> docker run -v G:/gatk_dock:/gatk/mydata -it broadinstitute/gatk:latest
    (gatk) root@6d4d08142dae:/gatk#

     

    Here is the header from the download

    There is a line in it 

    GCS max retries/reopens: 20

    there is 250 GB free space on the SSD where mydata lives

    What struck me as a little puzzling is that is wasn't clear where the file was to go.  The dataSourcesFolder with v1.6 is in the mydata directory.  Is it to be overwritten?

     

    Using GATK jar /gatk/gatk-package-4.2.0.0-local.jar

    Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.0.0-local.jar FuncotatorDataSourceDownloader --somatic --validate-integrity --extract-after-download

    22:25:36.350 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    May 25, 2021 10:25:36 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    22:25:36.603 INFO FuncotatorDataSourceDownloader - ------------------------------------------------------------

    22:25:36.603 INFO FuncotatorDataSourceDownloader - The Genome Analysis Toolkit (GATK) v4.2.0.0

    22:25:36.603 INFO FuncotatorDataSourceDownloader - For support and documentation go to https://software.broadinstitute.org/gatk/

    22:25:36.604 INFO FuncotatorDataSourceDownloader - Executing as root@d61b3408e462 on Linux v5.4.72-microsoft-standard-WSL2 amd64

    22:25:36.604 INFO FuncotatorDataSourceDownloader - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08

    22:25:36.604 INFO FuncotatorDataSourceDownloader - Start Date/Time: May 25, 2021 10:25:36 PM GMT

    22:25:36.605 INFO FuncotatorDataSourceDownloader - ------------------------------------------------------------

    22:25:36.605 INFO FuncotatorDataSourceDownloader - ------------------------------------------------------------

    22:25:36.606 INFO FuncotatorDataSourceDownloader - HTSJDK Version: 2.24.0

    22:25:36.606 INFO FuncotatorDataSourceDownloader - Picard Version: 2.25.0

    22:25:36.606 INFO FuncotatorDataSourceDownloader - Built for Spark Version: 2.4.5

    22:25:36.606 INFO FuncotatorDataSourceDownloader - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    22:25:36.606 INFO FuncotatorDataSourceDownloader - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    22:25:36.606 INFO FuncotatorDataSourceDownloader - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    22:25:36.606 INFO FuncotatorDataSourceDownloader - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    22:25:36.606 INFO FuncotatorDataSourceDownloader - Deflater: IntelDeflater

    22:25:36.606 INFO FuncotatorDataSourceDownloader - Inflater: IntelInflater

    22:25:36.607 INFO FuncotatorDataSourceDownloader - GCS max retries/reopens: 20

    22:25:36.607 INFO FuncotatorDataSourceDownloader - Requester pays: disabled

    22:25:36.607 INFO FuncotatorDataSourceDownloader - Initializing engine

    22:25:36.607 INFO FuncotatorDataSourceDownloader - Done initializing engine

    22:25:36.607 INFO FuncotatorDataSourceDownloader - Somatic data sources selected.

    22:25:36.618 INFO FuncotatorDataSourceDownloader - Collecting expected checksum...

    22:25:38.605 INFO FuncotatorDataSourceDownloader - Collection complete!

    22:25:38.754 INFO NioFileCopierWithProgressMeter - Initiating copy from gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.7.20200521s.tar.gz to file:///gatk/funcotator_dataSources.v1.7.20200521s.tar.gz

    22:25:38.754 INFO NioFileCopierWithProgressMeter - File size: 32188531109 bytes (29 GB).

    22:25:38.754 INFO NioFileCopierWithProgressMeter - Please wait. This could take a while...

    22:26:34.409 INFO NioFileCopierWithProgressMeter - Transfer: 0.25% complete. Est. time remaining: 06:09:40.684 (@1413.63 kbps)

    22:27:28.572 INFO NioFileCopierWithProgressMeter - Transfer: 0.50% complete. Est. time remaining: 06:03:25.201 (@1434.36 kbps)

    22:28:24.809 INFO NioFileCopierWithProgressMeter - Transfer: 0.75% complete. Est. time remaining: 06:05:14.546 (@1423.60 kbps)

    22:29:20.546 INFO NioFileCopierWithProgressMeter - Transfer: 1.00% complete. Est. time remaining: 06:06:05.297 (@1416.77 kbps)

    22:30:20.601 INFO NioFileCopierWithProgressMeter - Transfer: 1.25% complete. Est. time remaining: 06:10:35.585 (@1396.00 kbps)

    22:31:16.326 INFO NioFileCopierWithProgressMeter - Transfer: 1.50% complete. Est. time remaining: 06:08:49.045 (@1399.16 kbps)

    22:32:12.392 INFO NioFileCopierWithProgressMeter - Transfer: 1.75% complete. Est. time remaining: 06:07:36.457 (@1400.19 kbps)

    22:33:07.357 INFO NioFileCopierWithProgressMeter - Transfer: 2.00% complete. Est. time remaining: 06:06:10.094 (@1402.15 kbps)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Bob,

    It looks like that command line is fine, though you can specify a specific location for the data resources with the --output argument. The download issue is most likely a spurious connection issue and so you can re-try without changing anything in your command. You could though specify a larger number for the --gcs-max-retries, the default is 20.

    Let me know what you find,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk