Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

No Pileup Tables

Answered
0

21 comments

  • Avatar
    Josh Evans

    Hi Wesley,

    Thanks for writing in! Since this sounds like an issue on Terra can you share the workspace where you are seeing this issue with Terra Support by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Toggle the "Share with support" button to "Yes"
    2. Click Save

     

    Please provide us with a link to your workspace. We’ll be happy to take a closer look as soon as we can!

    Please let me know if you have any questions.

    Best,

    Josh

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi Josh,

    I have toggled the support button. 

    The workspace link is here.

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Josh Evans

    Hi Wesley,

    Thanks for getting back to me! I'm going to investigate this for you and I'll let you know once I have any updates.  

    Best,

    Josh

    0
    Comment actions Permalink
  • Avatar
    Josh Evans

    Hi Wesley, 

    I've been looking at the workflow and I have two suggestions for things to look at that might be likely causes of these errors:

    1. It could be possible that the Workflow actually needs more memory to fully create the tables, so I'd suggest running it with more memory.
    2. I noticed that variants_for_contamination variable referenced a bucket that is outside of this workspace. Could you please confirm you have access to data from that location as that could be a possible cause for this error as well.

    Please let me know if those two suggestions are helpful or if you have any questions.

    Best,

    Josh

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi Josh,

    I reran the workflow with more memory by clicking the rerun with more memory option with a memory retry factor of 1.5 and still getting the same error.

    I can confirm that terra has access to this data.

    0
    Comment actions Permalink
  • Avatar
    Josh Evans

    Hi Wesley,

    Thanks for getting back to me! This workflow might require more than double the base memory to process this data, so I would suggest modifying the mem variable for the M2 task and increasing the memory volume from there.  

    That should give more memory directly to the task where we are seeing our errors.

    Please let me know how that goes or if you have any questions.

    Best,

    Josh

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi Josh,

    Modifying the memory outputted seems to have solved one problem. Now I'm getting this error for both the normal and tumor pileup tables.

    2022/06/06 19:42:33 Starting container setup.
    2022/06/06 19:42:35 Done container setup.
    2022/06/06 19:42:38 Starting localization.
    2022/06/06 19:42:53 Localization script execution started...
    2022/06/06 19:42:53 Localizing input gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/script -> /cromwell_root/script
    2022/06/06 19:42:57 Localizing input gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict -> /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict
    2022/06/06 19:42:58 Localization script execution complete.
    2022/06/06 19:43:05 Done localization.
    2022/06/06 19:43:06 Running user action: docker run -v /mnt/local-disk:/cromwell_root -v /mnt/d-1b2e5749db4b0a6439c4895809508e1e:/mnt/af7b5955462dc70f18fa6a82eae18e22:ro --entrypoint=/bin/bash broadinstitute/gatk@sha256:21c3cb43b7d11891ed4b63cc7274f20187f00387cfaa0433b3e7991b5be34dbe /cromwell_root/script
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.a0fd8f31
    USAGE: GatherPileupSummaries [arguments]
    
    Combine output files from GetPileupSummary in the order defined by a sequence dictionary
    Version:4.2.6.1
    
    
    Required Arguments:
    
    --I <File>                    an output of PileupSummaryTable  This argument must be specified at least once. Required. 
    
    --O <File>                    output  Required. 
    
    --sequence-dictionary <File>  sequence dictionary file  Required. 
    
    
    Optional Arguments:
    
    --arguments_file <File>       read one or more arguments files and add them to the command line  This argument may be
                                  specified 0 or more times. Default value: null. 
    
    --gatk-config-file <String>   A configuration file to use with the GATK.  Default value: null. 
    
    --gcs-max-retries,-gcs-retries <Integer>
                                  If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
                                  connection  Default value: 20. 
    
    --gcs-project-for-requester-pays <String>
                                  Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be
                                  accessed.  User must have storage.buckets.get permission on the bucket being accessed. 
                                  Default value: . 
    
    --help,-h <Boolean>           display the help message  Default value: false. Possible values: {true, false} 
    
    --QUIET <Boolean>             Whether to suppress job-summary info on System.err.  Default value: false. Possible
                                  values: {true, false} 
    
    --tmp-dir <GATKPath>          Temp directory to use.  Default value: null. 
    
    --use-jdk-deflater,-jdk-deflater <Boolean>
                                  Whether to use the JdkDeflater (as opposed to IntelDeflater)  Default value: false.
                                  Possible values: {true, false} 
    
    --use-jdk-inflater,-jdk-inflater <Boolean>
                                  Whether to use the JdkInflater (as opposed to IntelInflater)  Default value: false.
                                  Possible values: {true, false} 
    
    --verbosity <LogLevel>        Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                                  INFO, DEBUG} 
    
    --version <Boolean>           display the version number for this tool  Default value: false. Possible values: {true,
                                  false} 
    
    
    Advanced Arguments:
    
    --showHidden <Boolean>        display hidden arguments  Default value: false. Possible values: {true, false} 
    
    
    ***********************************************************************
    
    A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',SRR7588418.hg38.tsv}' but no positional argument is defined for this tool.
    
    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3500m -jar /root/gatk.jar GatherPileupSummaries --sequence-dictionary /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict -I -O SRR7588418.hg38.tsv
    2022/06/06 19:43:15 Starting delocalization.
    2022/06/06 19:43:16 Delocalization script execution started...
    2022/06/06 19:43:16 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/memory_retry_rc
    2022/06/06 19:43:19 Delocalizing output /cromwell_root/rc -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/rc
    2022/06/06 19:43:21 Delocalizing output /cromwell_root/stdout -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/stdout
    2022/06/06 19:43:22 Delocalizing output /cromwell_root/stderr -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/stderr
    2022/06/06 19:43:24 Delocalizing output /cromwell_root/SRR7588418.hg38.tsv -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/SRR7588418.hg38.tsv
    Required file output '/cromwell_root/SRR7588418.hg38.tsv' does not exist.
    0
    Comment actions Permalink
  • Avatar
    Josh Evans

    Hi Wesley,

    I'm glad we were able to help you resolve the first issue! From what I can see, it appears that the .TSV file from gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict is passing into the commands in a way they don't expect.  I'm going to do some more research on my end and let you know once I have an update.

    Best,

    Josh

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Wesley Kwong,

    Thanks for your patience. I took a look at this issue with Josh and found that your script running GatherPileupSummaries is not configured properly. This error message indicates that you did not provide the arguments as required by the tool:

    A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',SRR7588418.hg38.tsv}' but no positional argument is defined for this tool.

    To solve this error message, I would recommend taking another look at your command line and make corrections so that each input is configured properly.

    Let us know if you have any other questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi Josh and Genevieve,

    After some troubleshooting, I was able to run the Mutect2 pipeline successfully when I added both the pon and genomAD files without the variant for contamination files. But once I add the variant for contamination files (including the index) taken from the GCP gatk-best-practices bucket, I get this error:

    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx149500m -jar /root/gatk.jar Mutect2 -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/b4adf777-4f97-425c-b3e2-b37c9d927667/call-GatherBamFiles/SRR7588418.hg38.bam -tumor SRR7588418 -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/380dbed8-90e7-42a3-9fb8-10607c1ac950/call-GatherBamFiles/SRR7588413.hg38.bam -normal SRR7588413 --germline-resource gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/af-only-gnomad.hg38.vcf.gz -pon gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/1000g_pon.hg38.vcf.gz -L gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/81583498-648e-4e70-8452-80509b626927/Mutect2/dbb6ef96-ea07-4cfe-9e85-3b133c6d89ea/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list -O output.vcf
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.c880de1b
    21:30:55.896 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:30:55.924 INFO  GetPileupSummaries - ------------------------------------------------------------
    21:30:55.925 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.6.1
    21:30:55.925 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:30:55.925 INFO  GetPileupSummaries - Executing as root@42c5b048ff41 on Linux v5.10.107+ amd64
    21:30:55.925 INFO  GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
    21:30:55.925 INFO  GetPileupSummaries - Start Date/Time: June 22, 2022 9:30:55 PM GMT
    21:30:55.925 INFO  GetPileupSummaries - ------------------------------------------------------------
    21:30:55.925 INFO  GetPileupSummaries - ------------------------------------------------------------
    21:30:55.926 INFO  GetPileupSummaries - HTSJDK Version: 2.24.1
    21:30:55.926 INFO  GetPileupSummaries - Picard Version: 2.27.1
    21:30:55.926 INFO  GetPileupSummaries - Built for Spark Version: 2.4.5
    21:30:55.926 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    21:30:55.926 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:30:55.926 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:30:55.926 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:30:55.926 INFO  GetPileupSummaries - Deflater: IntelDeflater
    21:30:55.926 INFO  GetPileupSummaries - Inflater: IntelInflater
    21:30:55.926 INFO  GetPileupSummaries - GCS max retries/reopens: 20
    21:30:55.926 INFO  GetPileupSummaries - Requester pays: disabled
    21:30:55.927 INFO  GetPileupSummaries - Initializing engine
    21:30:59.931 INFO  FeatureManager - Using codec VCFCodec to read file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz
    21:31:00.474 INFO  GetPileupSummaries - Shutting down engine
    [June 22, 2022 9:31:00 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.08 minutes.
    Runtime.totalMemory()=2452094976
    ***********************************************************************
    
    A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
    
    ***********************************************************************

    When I took a look at the WDL script in github, I noticed that the M2 tasks accepts both the variants_for_contamination and variants_for_contamination_idx files as input. But I do not see the variants_for_contamination_idx used anywhere unlike the variants_for_contamination variable. Could you look into this?

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    Hi Wesley Kwong,

    Can you share the Submission ID so we can take a closer look at the issue?

    Best,

    Samantha

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi Samantha,

    The submission id is 81583498-648e-4e70-8452-80509b626927.

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    Hi Wesley Kwong,

    It looks like the error message in that submission is the one Genevieve pointed out in her latest message:

    A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',SRR7588418.hg38.tsv}' but no positional argument is defined for this tool.

    As she recommended, to solve this error, you should take another look at your command line and make corrections so that each input is configured properly.

    I'm still not sure where you are seeing this error:

    A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
    

    If you are still encountering this error, please let me know the submission ID so I can take a closer look.

    Best,

    Samantha

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi Samantha,

    I recognize that I am still getting the same error message where its asking for the SRR7588418.hg38.tsv file. But I believe this file that would be inputted to generate the pileup tables comes from the output of M2 running successfully.

    If you look under the M2 logs, this is where you would find the error that would occur before the pileup error message.

    A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    I see. Can you temporarily share the gs://bruce-processed-data bucket with me (svelasqu@broadinstitute.org)?

    Even though the index file path isn't explicitly being passed to the GetPileupSummaries command, the path should be inferred automatically.

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    I have added you as admin in the bucket.

    0
    Comment actions Permalink
  • Avatar
    Mike Schachter

    Hi Samantha (she/her), I work with Wesley on this project. Just wanted to check in on this issue, we are super blocked here. It seems like there could be a few problems, and I elaborated on them by posting an issue to the gatk github.

    We are passing a variants contamination index in to the workflow, but it is never used. It seems like either:

    1. The source code to mutect2.wdl has to be changed to actually use the variants_for_contamination_idx workflow variable, if GetPileupSummaries ever supports it as an input argument.
    2. The source code to mutect2.wdl has to be changed so that it runs IndexFeatureFile on the variants_for_contamination file, prior to calling GetPileupSummaries.
    3. The variants_for_contamination file should be localized before running?
    4. There is something wrong with sending a compressed .gz file as input. I believe this is not an issue, Wesley has tried passing in an uncompressed file and it still failed.

    We're super blocked but are willing to try different approaches, please reach out if you or your team can think of anything!

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Hi all,

    the gatk tools look for the index files at the same position where the feature file is located. Hence, the index files should be treated as hidden arguments in the command line prompt. The reason the index files appear as arguments in the WDL is to also localize them if the feature file is to be localized for a specific task, so that the gatk tool can find it. Now, most gatk tools use NIO to stream files so that localization can be optional.

    Your point 2. is a good idea for adjusting the wdl to make the index files optional and run IndexFeatureFile if they are not supplied. However, since gatk tools essentially always also create an index file as their output, I suppose it is assumed that the index file is present and can be supplied as a workflow argument.

    3. You can try to localize both variants_for_contamination and variants_for_contamination_idx, but GetPileupSummaries uses NIO, so localization shouldn't be necessary.

    I see that you are working with hg38. In order to create your own resource of variants for contamination, you can take the publicly available gnomad.v3.1.2 chr1 data set and filter and subset it to AF > 0.05 with 

    gatk SelectVariants -V gs://gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz -select 'AF > 0.05' --restrict-alleles-to BIALLELIC --exclude-filtered true -O variants_for_contamination.vcf.gz

    This essentially recreates the best practice resource. SelectVariants also creates an index file as output, which you should supply to the variant calling WDL. 

    If you still don't have luck and it indeed is an issue with the wdl, you can have a look at my updated mutect2 workflow wdls, which I've recently successfully run.

    Best,

    Philipp

    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    Thanks, Philipp Hähnel.

    Wesley Kwong - are you able to resolve your issue with Philipp's advice?

    Best,

    Samantha

    0
    Comment actions Permalink
  • Avatar
    Wesley Kwong

    Hi all,

    My apologies for not being able to reply quickly.

    The issue was utilizing the preexisting contamination file. Using Philipp's command to generate my own contamination files solved the problem.

    Thank you so much for all your help guys! 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great, glad you were able to solve the issue! Thanks Wesley!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk