Mutect2 PON - gnomAD input?
Can you please provide
a) GATK version used
b) Exact GATK commands used
c) The entire error log if applicable.
I am trying to run Mutect2 PON workflow (https://portal.firecloud.org/?return=terra#methods/gatk/mutect2_pon/9) off Terra, and one of the required input variables seems to be 'gnomAD' (to be entered as a string). What sort of gnomAD file should be linked here and what will this do in the process of constructing PON?
Apologies if this is somewhere in the documentation, I did look and could not find among the linked readme docs.
Thanks !
Mia
-
Hi Mia!
I think you may find the answer in the Mutect2 doc here (Ctrl/Cmd+F for "gnomad" to find the information easier). It may also help to view the files in the Somatic-SNVs-Indels featured workspace, so you can take a look at the files directly.
If you still have questions about this, please let us know!
Kind regards,
Jason
-
Thanks Jason! This is super helpful !
I now actually ran the workflow, but it failed. If I go to 'Job Manager' I get:
Job Manager is running but encountered a problem getting data from its workflow server.
500: Internal Server Error
And so I am having a hard time figuring out what went wrong.
Other jobs have the logs working, any clues?
Thanks !
Mia
-
Hi Mia,
Hmm, can you see the job details if you look at the job in the FireCloud interface, rather than the Terra interface?
You should see a page like the following:
If you search for your workspace and click on it, you should be able to find your job in the Monitor tab.
Kind regards,
Jason
-
Thanks Jason ! I can see it now, it seems that the issue is :
Bucket is requester pays bucket but no user project provided.
How can I specify that? I thought charges are always automatically taken off the project codes linked to my work space.
I copy the whole error below, in case something else is in fact a problem
Many thanks,
Mia
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.0733e1f9 23:26:46.226 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:26:47.421 INFO GetSampleName - ------------------------------------------------------------ 23:26:47.422 INFO GetSampleName - The Genome Analysis Toolkit (GATK) v4.1.2.0 23:26:47.423 INFO GetSampleName - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:26:47.424 INFO GetSampleName - Executing as root@8607337218f1 on Linux v4.19.112+ amd64 23:26:47.425 INFO GetSampleName - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12 23:26:47.426 INFO GetSampleName - Start Date/Time: June 4, 2020 11:26:46 PM UTC 23:26:47.426 INFO GetSampleName - ------------------------------------------------------------ 23:26:47.427 INFO GetSampleName - ------------------------------------------------------------ 23:26:47.427 INFO GetSampleName - HTSJDK Version: 2.19.0 23:26:47.428 INFO GetSampleName - Picard Version: 2.19.0 23:26:47.428 INFO GetSampleName - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:26:47.430 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:26:47.430 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:26:47.430 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:26:47.431 INFO GetSampleName - Deflater: IntelDeflater 23:26:47.431 INFO GetSampleName - Inflater: IntelInflater 23:26:47.431 INFO GetSampleName - GCS max retries/reopens: 20 23:26:47.432 INFO GetSampleName - Requester pays: disabled 23:26:47.433 WARN GetSampleName - [1m[31m !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Warning: GetSampleName is a BETA tool and is not yet ready for use in production !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!![0m 23:26:47.433 INFO GetSampleName - Initializing engine 23:26:50.521 INFO GetSampleName - Shutting down engine [June 4, 2020 11:26:50 PM UTC] org.broadinstitute.hellbender.tools.GetSampleName done. Elapsed time: 0.08 minutes. Runtime.totalMemory()=54853632 code: 400 message: Bucket is requester pays bucket but no user project provided. reason: required location: null retryable: false com.google.cloud.storage.StorageException: Bucket is requester pays bucket but no user project provided. at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:227) at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:438) at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:239) at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:236) at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run(RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50) at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:235) at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:687) at java.nio.file.Files.exists(Files.java:2385) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:404) at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:206) at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:162) at org.broadinstitute.hellbender.engine.GATKTool.initializeReads(GATKTool.java:446) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:695) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291) Caused by: shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "Bucket is requester pays bucket but no user project provided.", "reason" : "required" } ], "message" : "Bucket is requester pays bucket but no user project provided." } at shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150) at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) at shaded.cloud_nio.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:401) at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1132) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:435) ... 19 more Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetSampleName -R gs://gatk-best-practices/somatic-b37/Homo_sapiens_assembly19.fasta -I gs://fc-secure-ff8156a3-ddf3-42e4-9211-0fd89da62108/GTEx_Analysis_2017-06-05_v8_WES_BAM_files/GTEX-11EM3-0004-SM-58Q9C.bam -O tumor_name.txt -encode
-
Hi Mia,
This actually isn't automatic for workflows! You'll want to add this flag to your command: https://gatk.broadinstitute.org/hc/en-us/articles/360041416112-Mutect2#--gcs-project-for-requester-pays
Kind regards,
Jason
-
Hi Jason,
Thanks for following up.
The json file that came along this particular PON workflow did not have this argument included, should I just add this to json directly and reupload?
I am a little confused as I was able to run a Mutect2 workflow itself successfully from the same workspace before without specifying the cost code. How would I know when this needs to be specified ?
Thanks,
Mia
-
Hi Team,
Kind reminder, I would really appreciate a follow up, as I am still stuck.
Many thanks,
Mia
-
Hi Mia,
Apologies for the delay here. The original WDL is not expecting a requester pays bucket file so it doesn't account for it. As far as knowing when it needs to be specified, that would likely require some level of understanding about the nature of the bucket you are trying to get data from.
To get around this, you will have to make a copy of the WDL and add that argument to the command. I recommend writing it in such a way that it's optional, where if you provide a project in the method configuration it will run it with the command argument, and if you don't provide a project it will run it normally. This is to avoid the case of getting charged for every bucket you pull data from, as I'm not entirely certain what the behavior is on the Google side if you include the option and access a bucket that is not requester pays. I would hate to see you getting charged unnecessarily.
The input block for the task would include something like String? project_to_bill which shows it's an optional parameter, and the command would look something like this:
gatk --java-options "-Xmx~{command_mem}m" Mutect2 \
-R ~{ref_fasta} \
$tumor_command_line \
$normal_command_line \
~{"--germline-resource " + gnomad} \
~{"-pon " + pon} \
~{"-L " + intervals} \
~{"--alleles " + gga_vcf} \
-O "~{output_vcf}" \
~{true='--bam-output bamout.bam' false='' make_bamout} \
~{true='--f1r2-tar-gz f1r2.tar.gz' false='' run_ob_filter} \
~{m2_extra_args} \
~{"--gcs-project-for-requester-pays" + project_to_bill}Notice in the last line I added an extra argument with a tilde, meaning it'll only add it if a value for project_to_bill is provided. In cases you know or find out you are using a requester pays bucket, you can provide your project in the method configuration and the command will run with the argument. You would want to set something like this up for any tasks that you know might come in contact with a requester pays bucket.
Alternatively, you can copy the file to your local workspace and give it to the original workflow, pulling it from the workspace rather than its original requester pays location.
If you have any questions about any of this, please let me know.
Kind regards,
Jason
-
Hi Jason,
I have introduced the project to bill argument, but the job still fails:
1) Can you please have a look at whether I did this correctly?
2) I again cannot get the error file, I tried via Terra and FireCloud interface, and I see the screen in attachment. So at this point, I am not sure whether the workflow fails because of how I did the project to bill specification or because fo something else - can you please let me know?
Thanks
Mia
-
Hi Mia,
I can take a look. Can you provide the submission ID for the job you're referring to so I can make sure I'm looking at the right job? Is this also happening in 661-Clonal hematopoiesis?
Kind regards,
Jason
-
Hi Jason,
Yes still 661-Clonal hematopoiesis and the submission ID is 8d633868-5d66-4efe-aa3b-f266ca03e279
Many thanks !
Mia
-
It would be nice to set the billing project with an environment variable so that users do not have to modify every command in a task (e.g the M2 task has 5 commands that need to be modified). Thanks.
-
Hi Mia,
Thank you for that. I'll take a look as soon as I can.
Josh, thank you for the suggestion. I will pass this along to the GATK workflow team.
Kind regards,
Jason
-
Actually, it would be even better if gatk automatically used the current billing project for requester pays buckets, unless a project was explicitly specified. Thanks.
-
Hi Josh,
I've been informed by a colleague that the best way to provide feedback for the workflows team is to raise an issue on the GATK Github. Would you be able to raise your feature request so that you will be able to track its status by posting here?: https://github.com/broadinstitute/gatk/issues
Kind regards,
Jason
-
Hi MPetlj,
I see you have a successful run of Mutect2_PON on TEST-GTEx-GRCh37-WES in submission ID 6dc2993c-f65c-41b9-b33f-b1ebc07a0f2f. Does this show resolution of the issue you experienced in
8d633868-5d66-4efe-aa3b-f266ca03e279?Kind regards,
Jason
-
Hi Jason,
Joshua uploaded the modified workflow to my workspace. I believe that the issue was that the billing project had to be specified at multiple places in the script. Joshua followed up on the GATK github, as you suggested. I think the workflow would be more user-friendly if it was to either use the default cost object associated with the workspace, or if it had a single input argument that can address the cost code throughout the script.
Thanks for helping,
Mia
-
Hi Mia,
Gotcha - thanks for the explanation. I agree that either of these options would make for a better user experience. Hoping that the GATK team implements one of these changes in their next version!
If we can be of any further assistance, please let us know!
Kind regards,
Jason
Please sign in to leave a comment.
18 comments