Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 PON - gnomAD input?

0

18 comments

  • Avatar
    Jason Cerrato

    Hi Mia!

    I think you may find the answer in the Mutect2 doc here (Ctrl/Cmd+F for "gnomad" to find the information easier). It may also help to view the files in the Somatic-SNVs-Indels featured workspace, so you can take a look at the files directly.

    If you still have questions about this, please let us know!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Thanks Jason! This is super helpful ! 

    I now actually ran the workflow, but it failed. If I go to 'Job Manager' I get:

         Job Manager is running but encountered a problem getting data from its workflow server.

         500: Internal Server Error

    And so I am having a hard time figuring out what went wrong.

    Other jobs have the logs working, any clues? 

    Thanks ! 

    Mia 

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Hmm, can you see the job details if you look at the job in the FireCloud interface, rather than the Terra interface?

    https://portal.firecloud.org/

    You should see a page like the following:

    If you search for your workspace and click on it, you should be able to find your job in the Monitor tab.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Thanks Jason ! I can see it now, it seems that the issue is :

    Bucket is requester pays bucket but no user project provided.

    How can I specify that? I thought charges are always automatically taken off the project codes linked to my work space. 

    I copy the whole error below, in case something else is in fact a problem

    Many thanks,

    Mia

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.0733e1f9
    23:26:46.226 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    23:26:47.421 INFO  GetSampleName - ------------------------------------------------------------
    23:26:47.422 INFO  GetSampleName - The Genome Analysis Toolkit (GATK) v4.1.2.0
    23:26:47.423 INFO  GetSampleName - For support and documentation go to https://software.broadinstitute.org/gatk/
    23:26:47.424 INFO  GetSampleName - Executing as root@8607337218f1 on Linux v4.19.112+ amd64
    23:26:47.425 INFO  GetSampleName - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
    23:26:47.426 INFO  GetSampleName - Start Date/Time: June 4, 2020 11:26:46 PM UTC
    23:26:47.426 INFO  GetSampleName - ------------------------------------------------------------
    23:26:47.427 INFO  GetSampleName - ------------------------------------------------------------
    23:26:47.427 INFO  GetSampleName - HTSJDK Version: 2.19.0
    23:26:47.428 INFO  GetSampleName - Picard Version: 2.19.0
    23:26:47.428 INFO  GetSampleName - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:26:47.430 INFO  GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:26:47.430 INFO  GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:26:47.430 INFO  GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:26:47.431 INFO  GetSampleName - Deflater: IntelDeflater
    23:26:47.431 INFO  GetSampleName - Inflater: IntelInflater
    23:26:47.431 INFO  GetSampleName - GCS max retries/reopens: 20
    23:26:47.432 INFO  GetSampleName - Requester pays: disabled
    23:26:47.433 WARN  GetSampleName - 
    
    [1m[31m   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    
       Warning: GetSampleName is a BETA tool and is not yet ready for use in production
    
       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!![0m
    
    
    23:26:47.433 INFO  GetSampleName - Initializing engine
    23:26:50.521 INFO  GetSampleName - Shutting down engine
    [June 4, 2020 11:26:50 PM UTC] org.broadinstitute.hellbender.tools.GetSampleName done. Elapsed time: 0.08 minutes.
    Runtime.totalMemory()=54853632
    code:      400
    message:   Bucket is requester pays bucket but no user project provided.
    reason:    required
    location:  null
    retryable: false
    com.google.cloud.storage.StorageException: Bucket is requester pays bucket but no user project provided.
    	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:227)
    	at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:438)
    	at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:239)
    	at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:236)
    	at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
    	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
    	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    	at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:235)
    	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:687)
    	at java.nio.file.Files.exists(Files.java:2385)
    	at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:404)
    	at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:206)
    	at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:162)
    	at org.broadinstitute.hellbender.engine.GATKTool.initializeReads(GATKTool.java:446)
    	at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:695)
    	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    	at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
    {
      "code" : 400,
      "errors" : [ {
        "domain" : "global",
        "message" : "Bucket is requester pays bucket but no user project provided.",
        "reason" : "required"
      } ],
      "message" : "Bucket is requester pays bucket but no user project provided."
    }
    	at shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
    	at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
    	at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
    	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:401)
    	at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1132)
    	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
    	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
    	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
    	at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:435)
    	... 19 more
    Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetSampleName -R gs://gatk-best-practices/somatic-b37/Homo_sapiens_assembly19.fasta -I gs://fc-secure-ff8156a3-ddf3-42e4-9211-0fd89da62108/GTEx_Analysis_2017-06-05_v8_WES_BAM_files/GTEX-11EM3-0004-SM-58Q9C.bam -O tumor_name.txt -encode
    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    This actually isn't automatic for workflows! You'll want to add this flag to your command: https://gatk.broadinstitute.org/hc/en-us/articles/360041416112-Mutect2#--gcs-project-for-requester-pays

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Thanks for following up.

    The json file that came along this particular PON workflow did not have this argument included, should I just add this to json directly and reupload?  

    I am a little confused as I was able to run a Mutect2 workflow itself successfully from the same workspace before without specifying the cost code. How would I know when this needs to be specified ? 

    Thanks,

    Mia

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Team,

    Kind reminder, I would really appreciate a follow up, as I am still stuck. 

    Many thanks,

    Mia

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Apologies for the delay here. The original WDL is not expecting a requester pays bucket file so it doesn't account for it. As far as knowing when it needs to be specified, that would likely require some level of understanding about the nature of the bucket you are trying to get data from.

    To get around this, you will have to make a copy of the WDL and add that argument to the command. I recommend writing it in such a way that it's optional, where if you provide a project in the method configuration it will run it with the command argument, and if you don't provide a project it will run it normally. This is to avoid the case of getting charged for every bucket you pull data from, as I'm not entirely certain what the behavior is on the Google side if you include the option and access a bucket that is not requester pays. I would hate to see you getting charged unnecessarily.

    The input block for the task would include something like String? project_to_bill which shows it's an optional parameter, and the command would look something like this:

    gatk --java-options "-Xmx~{command_mem}m" Mutect2 \
    -R ~{ref_fasta} \
    $tumor_command_line \
    $normal_command_line \
    ~{"--germline-resource " + gnomad} \
    ~{"-pon " + pon} \
    ~{"-L " + intervals} \
    ~{"--alleles " + gga_vcf} \
    -O "~{output_vcf}" \
    ~{true='--bam-output bamout.bam' false='' make_bamout} \
    ~{true='--f1r2-tar-gz f1r2.tar.gz' false='' run_ob_filter} \
    ~{m2_extra_args} \
    ~{"--gcs-project-for-requester-pays" + project_to_bill}

    Notice in the last line I added an extra argument with a tilde, meaning it'll only add it if a value for project_to_bill is provided. In cases you know or find out you are using a requester pays bucket, you can provide your project in the method configuration and the command will run with the argument. You would want to set something like this up for any tasks that you know might come in contact with a requester pays bucket.

     

    Alternatively, you can copy the file to your local workspace and give it to the original workflow, pulling it from the workspace rather than its original requester pays location.

     

    If you have any questions about any of this, please let me know.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

     

    I have introduced the project to bill argument, but the job still fails:

    1) Can you please have a look at whether I did this correctly?

    2) I again cannot get the error file, I tried via Terra and FireCloud interface, and I see the screen in attachment. So at this point, I am not sure whether the workflow fails because of how I did the project to bill specification or because fo something else - can you please let me know?

     

    Thanks

    Mia

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    I can take a look. Can you provide the submission ID for the job you're referring to so I can make sure I'm looking at the right job? Is this also happening in 661-Clonal hematopoiesis?

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Yes still  661-Clonal hematopoiesis and the submission ID is 8d633868-5d66-4efe-aa3b-f266ca03e279

    Many thanks !

    Mia

     

     

    0
    Comment actions Permalink
  • Avatar
    Josh Gould

    It would be nice to set the billing project with an environment variable so that users do not have to modify every command in a task (e.g the M2 task has 5 commands that need to be modified). Thanks.

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Thank you for that. I'll take a look as soon as I can.

    Josh, thank you for the suggestion. I will pass this along to the GATK workflow team.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Josh Gould

    Actually, it would be even better if gatk automatically used the current billing project for requester pays buckets, unless a project was explicitly specified. Thanks.

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Josh,

    I've been informed by a colleague that the best way to provide feedback for the workflows team is to raise an issue on the GATK Github. Would you be able to raise your feature request so that you will be able to track its status by posting here?: https://github.com/broadinstitute/gatk/issues

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi MPetlj,

    I see you have a successful run of Mutect2_PON on TEST-GTEx-GRCh37-WES in submission ID 6dc2993c-f65c-41b9-b33f-b1ebc07a0f2f. Does this show resolution of the issue you experienced in 
    8d633868-5d66-4efe-aa3b-f266ca03e279?

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    MPetlj

    Hi Jason,

    Joshua uploaded the modified workflow to my workspace. I believe that the issue was that the billing project had to be specified at multiple places in the script. Joshua followed up on the GATK github, as you suggested. I think the workflow would be more user-friendly if it was to either use the default cost object associated with the workspace, or if it had a single input argument that can address the cost code throughout the script.

    Thanks for helping, 

    Mia

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Mia,

    Gotcha - thanks for the explanation. I agree that either of these options would make for a better user experience. Hoping that the GATK team implements one of these changes in their next version!

    If we can be of any further assistance, please let us know!

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk