Happy New Year, everyone! What better time to roll out something new than the start of a new decade?
Pedants be advised: I don't want to hear it. We're zero-indexing this one.
If you’ve been following our modest little blog, you know that we’ve been preparing to transfer the GATK website and forum to a new home. Well, moving day has finally arrived! If you're reading this, the new site is now public at https://gatk.broadinstitute.org. Yes, this is different from what we originally announced (keeping the same URL); turns out it was a technical necessity to change it, and frankly, we like the shorter URL more. What do you think?
We hope you'll welcome the streamlined, consolidated platform, and forgive the occasional bump in the road as we get everything settled. We'll do our best to make the transition as smooth as possible and we appreciate your patience.
Read on for a rundown of the most important changes, and troubleshooting tips and tricks.
Launch day only (we hope)
Depending on when you read this, the old site… may still be on its way out. We have been working hard to map the old content to its new location(s), in order to provide automatic redirects for as much of the content as possible. However, this is a huge move for us, so we're being cautious and rolling out the redirects in batches, mainly so that we can test that everything works before cutting off your access to the old pages. Because of that, there may be a span of several hours where some links take you to old pages and others take you to the new digs. Please bear with us as we work through this transition period.
Action item: register for a new forum account
You’ll need a new account on the new platform if you want to ask any questions; unfortunately we couldn't migrate existing accounts from the old forum. On the bright side, your new account will extend to all the software we support, including Terra and Cromwell/WDL. Speaking of which, if you already have a Terra support account already, you’re all set.
The new platform supports single sign-on with a Twitter, Facebook. Google, or Microsoft account, or you can create an account on the new platform. Once you are registered, you will have immediate access.
About those redirects
No one likes a broken link, so as I mentioned above, we put a lot of effort into mapping content and putting in automatic redirects so that old links will magically take you to the right places. Links to the top level sections (like User Guide, Blog etc) will definitely be redirected automatically. Similarly, all the articles from the GATK4 docs section should redirect just fine to the correct article on the new site.
For GATK3 docs it's a bit different; we created a new repository in Github to serve as documentation archive rather than put them in the new site. As we announced in December, we could no longer bear the burden of maintaining those older docs, and we wanted to make a very clear distinction between current docs and the older ones that are deprecated. However, we did include them in the redirection, so your old links should still take you to the right content. Links to pictures may break, however.
We gave the blog the same treatment; all articles published before the migration were banished to Github as well, but the links will be redirected appropriately.
Finally, there is a fairly small number of pages or documentation articles that don't have an equivalent in the new world order and/or that we might have missed in our inventory. Those should all get redirected by default to the new website, where you might then need to do some digging to find what you were looking for. The good news is that the search functionality on the new platform is better, so try searching for the subject or the title if you get a broken link. If searching doesn’t help, send us the old link or a description of the article (via a forum post), and we’ll find you the right link or explain why that article was not transferred. If we missed anything that affects a lot of people, we'll do our best to remediate the situation.
Old forum discussions remain accessible… for now
This is where things get a bit hairy. The 17,000 discussion threads in the “Ask a question” section of the forum will eventually be taken offline. It’s just not practical to migrate all of the discussions, and much of the information is out of date. However, we do understand that many of you find value in useful nuggets of information that are not represented in the documentation, so we're not going to turn off the forum right away. There will be a transition phase of a few months during which the content from the old forum will be available in read-only format. That will give us a chance to comb through the threads, pick out the good stuff, and transfer it to actual documentation articles.
If there are old threads you find useful in your work, we’re open to requests for what to convert to documentation. Contact us in the new forum, where we will be generating and tracking tickets. We also recommend that you save any threads that are important to you personally as a PDF or HTML page on your computer just in case. As I've said before, if all else fails, the Internet Archive's "Way Back Machine" does preserve snapshots of the forum, so it's very likely that those old forum discussions will actually outlive us all.
Tell us what you need
Our ultimate goal is to help you to use GATK effectively in your work, so don't be shy about telling us what's not working for you, whether it's the new content organization, the redirects, or the transition timescale. We can't change everything -- e.g. in terms of choosing the new platform, that ship has sailed -- but there is a heck of a lot that we can tweak. We just need to know what's bugging you so we can try to make it better.
So, please have a look around and let us know what you think!
26 comments
What happens to unanswered questions in the old forum? Will they somehow still be answered or do we ask them again in the new forum?
You will need to ask your question again in the new forum. Sorry about that -- I forgot to include that point in my post. If you let them know that it's a repost from the old forum (eg add "Repost: …" in the title), I believe the team will prioritize answering your question.
Great! Thank you
It would be really good if you guys could find a way to port over the old announcement threads that talked about new features.
For example, the discussion going on in this thread:
https://gatkforums.broadinstitute.org/gatk/discussion/23598/new-mitochondrial-analysis-with-mutect2#latest
was useful in my recent work.
I imagine there were a lot of ongoing discussions that got cut off. Perhaps it might be worthwhile to find a way to fully mirror all the threads that had new posts going back 50-100 days. Maybe mark all the comments as anonymous users?
Have all the old tutorials and documentation been wiped out?
User experience so far:
Google "GATK4 haplotypecaller"
First link: appears to be haplotypecaller Doc... redirects to front page.
There's countless links from the web and the site itself to pages that now seem to all redirect to the front page.
For example if I'm trying to follow a guide like this:
https://github.com/broadinstitute/gatk-docs/blob/master/blog-2012-to-2019/2016-08-17-9_Takeaways_to_help_you_get_started_with_GRCh38.md?id=8180
It links to "Tutorial#8017" but that link, and indeed most google search results for the entire site (including gatk4) now appear entirely broken.
There seems to be archived forum posts ... but they link internally heavily. So someone will ask a question that matches my own problem perfectly and someone will have answered with a link along the lines of "see here[link]"
But now all links get redirected to the front page.
This changeover seems to have utterly broken vast quantities of documentation and support.
I started earlier today thinking I had a simple problem to solve and that there was useful guides to help me but now I'm finding most of the resources unusable.
this seems to be a bit of a disaster for anyone looking for support.
Hi David, thanks for reporting this — what you describe is definitely not intentional. It looks like one of the problems you encountered is caused by an inadequate redirect instruction on links that have “guide” in them, which was replaced last year by “documentation”. We’ll fix that ASAP but in the meantime you can find what you need by copying the link and replacing guide by documentation. If you encounter other broken links that don’t fit that pattern, please post them here. I’m sorry it’s been so rocky for you so far.
Ajwils, to clarify, are you saying it’s the discussion threads specifically that you’d like us to preserve for the new feature announcements? For things like that I think we’d like to try to summarize the information into feature docs rather than keep it around in discussion format. Would that work for you?
Geraldine Van der Auwera
I suppose I am saying that:
A) The announcement threads themselves should be migrated over as is, as they are a great informational resource for the newer features. This shouldn't be too hard. It's just copy pasting the archived announcements into new announcement threads and fixing any links.
and
B) Find some way to migrate/mirror the actual ongoing discussions, both in those announcement posts, and across the Ask GATK threads. I don' think it would need to go too far back. If this really isn't possible even just for threads with recent posts within the last month or two (I know you are short staffed), then summarized info in feature docs would be good. But honestly, really going through and summarizing two or three months worth of open discussions is probably more work than say making a few hundred threads, copying the code for the original post, then at the bottom of the post (or in a comment) add an archived version of the old page as a saved html attachment.
Ah I see what you mean, thanks for clarifying. To be frank, we considered doing that but decided to take a different approach in order to boost the sustainability of our knowledge base. Part of why we're in this pickle now is because for far too long, we relied on letting useful information live in announcements and in discussion threads that really should have become part of the main documentation. As a result, we evolved this labyrinthine structure where the information is scattered all over the place, which makes it difficult for people to find in the first place and even harder for us to maintain adequately over time. It's also what makes a migration like this so difficult for everyone. In contrast, if we had a more neatly organized knowledge base with a process to pull in every tidbit of useful information as it materializes in discussion into the main corpus, we'd be in a much better state. That's what we're trying to build with this reboot. So you're right that it would be only a small amount of effort to copy paste content over, but then we'd be perpetuating this semi-chaotic state. We would rather invest a larger amount of effort into laying a better foundation for the future, even if it comes at the cost of some bumpiness during the transition period.
David Murphy: I just fixed the issue with the redirects on URLs containing "guide" that you reported. Let me know if you come across any other broken links.
Thank you for reporting this, it's likely that many others had the same problem but didn't report it. You just did a big favor to a lot of people.
Geraldine Van der Auwera
I fully respect the choice to trim down, despite being apprehensive about potential lost content. If you're all really prepared to painstakingly port over vast amounts of info into feature docs, well then I wish you luck.
Unrelated side-note, would it be possible to implement the rich format editor for comments and posts, or is this still unavailable for end-users of zendesk? 7 options is pretty barebones.
Thanks ajwils, we'll do our best not to let you all down.
I asked about the editor, unfortunately that's a limitation of Zendesk. Apparently there's an upgrade coming so we'll see what that provides. Is there anything specific you want to be able to do?
Geraldine Van der Auwera
Off the top of my head:
Thanks so much for the remarkably fast fix!
I'll let you know if I hit any more but so far what I've tried has worked perfectly!
Can I access GATK release v3.6?
I tried to post another reply here about some other dead links but it just never turned up.
I'm getting a lot of google links to the site that lead to SSL_ERROR_BAD_CERT_DOMAIN
https://gatkforums.broadinstitute.org/gatk/discussion/6447/gzipped-gvcf-files
There also appears to be pages missing from the linked github archive of the forums.
Is there any up to date zip of the old/removed/moved forum answers?
Hi David Murphy, sorry I missed your reply. I believe the team is collecting reports etc in the "Other" section of the forum here: https://gatk.broadinstitute.org/hc/en-us/community/topics/360001488892.
The bad cert problem popped up just before the weekend, the team is working on getting that addressed. Sorry about that!
Re: missing pages in the github archive, can you give an example? I can try to hunt down whatever's missing and see if there's a bulk fix we can do.
I am finding the tool docs for GATK3.8 very impractical to use. Below is what I see when I try to find the documentation for indel realignment (and yes, indel realignment is still a very useful tool). Then, in order to access the tool documentation I need to download the raw file and open it manually in a new browser tab. This is madness. Why can't these files not be hosted statically on a server somewhere? Maybe something like what htslib does: http://www.htslib.org/doc/#manual-pages
Hi warren kretzschmar, I understand it's not ideal. Unfortunately this was necessary to reduce our maintenance burden. I would recommend that you clone the repository so that you have a local copy on your computer. Then you should be able to browse the docs from the index page like you would do on a website. Let me know if you run into any problems with that approach.
Hi Geraldine, that is helpful. Thank you.
Hi Geraldine Van der Auwera While running cfsan SNP-pipiline with following command
https://snp-pipeline.readthedocs.io/en/latest/usage.html#all-in-one-workflow-lambda
has given the error as given below.
***********************************************************************
A USER ERROR has occurred: '-T' is not a valid command.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Error occured while running:
java -Xmx3500m -jar /home/swamy123/softwares/GenomeAnalysisTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference/lambda_virus.fasta -I samples/sample1/reads.sorted.deduped.bam -o samples/sample1/realign.target.intervals --logging_level WARN --num_threads 4
# Command : /home/swamy123/.local/bin/cfsan_snp_pipeline map_reads --threads 4 reference/lambda_virus.fasta samples/sample2/sample2_1.fastq samples/sample2/sample2_2.fastq
# Working Directory : /home/swamy123/Desktop/swamy/test/lambdaVirusInputs
# Hostname : swamy123-Inspiron-N5110
# RAM : 3,842 MB
# Python Version : 2.7.17 (default, Nov 7 2019, 10:07:09) [GCC 7.4.0]
# Program Version : cfsan_snp_pipeline map_reads 2.1.1
# 2020-04-23 08:12:53 cfsan_snp_pipeline map_reads --threads 4 reference/lambda_virus.fasta samples/sample2/sample2_1.fastq samples/sample2/sample2_2.fastq
Options:
forceFlag=False
referenceFile=reference/lambda_virus.fasta
sampleFastqFile1=samples/sample2/sample2_1.fastq
sampleFastqFile2=samples/sample2/sample2_2.fastq
threads=4
verbose=1
# Align sequence sample2 to reference lambda_virus
# 2020-04-23 08:12:56 bowtie2 --rg-id 1 --rg SM:sample2 --rg LB:1 --rg PU:sample2 --reorder -X 1000 -p 4 -x reference/lambda_virus -1 samples/sample2/sample2_1.fastq -2 samples/sample2/sample2_2.fastq
# bowtie2 version 2.3.4.1
10000 reads; of these:
10000 (100.00%) were paired; of these:
888 (8.88%) aligned concordantly 0 times
9112 (91.12%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
888 pairs aligned concordantly 0 times; of these:
59 (6.64%) aligned discordantly 1 time
----
829 pairs aligned 0 times concordantly or discordantly; of these:
1658 mates make up the pairs; of these:
1044 (62.97%) aligned 0 times
614 (37.03%) aligned exactly 1 time
0 (0.00%) aligned >1 times
94.78% overall alignment rate
# Convert sam file to bam file with only mapped positions.
# 2020-04-23 08:13:00 samtools view -S -b -F 4 -q 30 --threads 4 -o samples/sample2/reads.unsorted.bam samples/sample2/reads.sam
# SAMtools version 1.4
# Convert bam to sorted bam file.
# 2020-04-23 08:13:00 samtools sort --threads 4 -o samples/sample2/reads.sorted.bam samples/sample2/reads.unsorted.bam
# SAMtools version 1.4
# Mark duplicate reads in bam file.
# 2020-04-23 08:13:00 java -Xmx2000m -jar /home/swamy123/softwares/picard.jar MarkDuplicates INPUT=samples/sample2/reads.sorted.bam OUTPUT=samples/sample2/reads.sorted.deduped.bam METRICS_FILE=samples/sample2/duplicate_reads_metrics.txt VERBOSITY=WARNING
# Picard version 2.22.3
INFO 2020-04-23 08:13:01 MarkDuplicates
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** MarkDuplicates -INPUT samples/sample2/reads.sorted.bam -OUTPUT samples/sample2/reads.sorted.deduped.bam -METRICS_FILE samples/sample2/duplicate_reads_metrics.txt -VERBOSITY WARNING
**********
08:13:02.068 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/swamy123/softwares/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Apr 23 08:13:02 IST 2020] MarkDuplicates INPUT=[samples/sample2/reads.sorted.bam] OUTPUT=samples/sample2/reads.sorted.deduped.bam METRICS_FILE=samples/sample2/duplicate_reads_metrics.txt VERBOSITY=WARNING MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Apr 23 08:13:02 IST 2020] Executing as swamy123@swamy123-Inspiron-N5110 on Linux 5.3.0-46-generic amd64; OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.22.3
WARNING 2020-04-23 08:13:02 AbstractOpticalDuplicateFinderCommandLineProgramA field field parsed out of a read name was expected to contain an integer and did not. Read name: r2141. Cause: String 'r2141' did not start with a parsable number.
[Thu Apr 23 08:13:04 IST 2020] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=28311552
# Index bam file.
# 2020-04-23 08:13:04 samtools index -@ 4 samples/sample2/reads.sorted.deduped.bam samples/sample2/reads.sorted.deduped.bai
# SAMtools version 1.4
# Identify targets for realignment.
# 2020-04-23 08:13:04 java -Xmx3500m -jar /home/swamy123/softwares/GenomeAnalysisTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference/lambda_virus.fasta -I samples/sample2/reads.sorted.deduped.bam -o samples/sample2/realign.target.intervals --logging_level WARN --num_threads 4
# GATK version 2.21.2
I am using GATK 4.1.6.0 and has given path. How to resolve the error.
Please guide.
Hi Dr. venkateswara swamy, the syntax has changed in GATK 4 so you cannot use scripts written for earlier versions without modification. For example, this error is pointing out that the “-T” argument is no longer used to specify the tool. There are other changes as well. For additional assistance, please post a question in the Forum section of the website (this is just the Blog).
Hi Geraldine Van der Auwera
Did you change GATK best practices regarding the "pre-processing" steps?
I can not find the "Indel realignment" tools (RealignerTargetCreator and IndelRealigner) that were important in the pre-processing step. Why did you remove this step from the GATK best practices?
Thanks,
Ana Marques
Hi Ana Rita Marques,
Yes we dropped the indel realignment step from the Best Practices; I think that was about 2 years ago. There was a blog post about it. We found that this step was no longer useful when the variant calling was done with HaplotypeCaller or Mutect2, which implement a more sophisticated and effective form of realignment.
Thanks a lot for your explanation Geraldine Van der Auwera.
I was not using GATK for a while and I did not notice.
Hello Geraldine Van der Auwera,
I just started working with GATK4 and was looking for new version of RealignerTargetCreator and IndelRealigner and notice this post. Could you please confirm that no need to run RealignerTargetCreator and IndelRealigner for HaplotypeCaller ?
Thank you,
Lida
Please sign in to leave a comment.