Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Welcome to the new home of GATK Follow

26 comments

  • Avatar
    ikeoluwao_o

    What happens to unanswered questions in the old forum? Will they somehow still be answered or do we ask them again in the new forum?

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    You will need to ask your question again in the new forum. Sorry about that -- I forgot to include that point in my post. If you let them know that it's a repost from the old forum (eg add "Repost: …" in the title), I believe the team will prioritize answering your question. 

    1
    Comment actions Permalink
  • Avatar
    ikeoluwao_o

    Great! Thank you

    0
    Comment actions Permalink
  • Avatar
    ajwils

    It would be really good if you guys could find a way to port over the old announcement threads that talked about new features.

     

    For example, the discussion going on in this thread:

    https://gatkforums.broadinstitute.org/gatk/discussion/23598/new-mitochondrial-analysis-with-mutect2#latest

    was useful in my recent work. 

     

    I imagine there were a lot of ongoing discussions that got cut off. Perhaps it might be worthwhile to find a way to fully mirror all the threads that had new posts going back 50-100 days. Maybe mark all the comments as anonymous users?

    0
    Comment actions Permalink
  • Avatar
    David Murphy

    Have all the old tutorials and documentation been wiped out?

    User experience so far:

    Google "GATK4 haplotypecaller"

    First link: appears to be haplotypecaller Doc... redirects to front page.

    There's countless links from the web and the site itself to pages that now seem to all redirect to the front page.

    For example if I'm trying to follow a guide like this:

    https://github.com/broadinstitute/gatk-docs/blob/master/blog-2012-to-2019/2016-08-17-9_Takeaways_to_help_you_get_started_with_GRCh38.md?id=8180

    It links to "Tutorial#8017" but that link, and indeed most google search results for the entire site (including gatk4) now appear entirely broken.

    There seems to be archived forum posts ... but they link  internally heavily. So someone will ask a question that matches my own problem perfectly and someone will have answered with a link along the lines of "see here[link]"

    But now all links get redirected to the front page.

    This changeover seems to have utterly broken vast quantities of documentation and support.

    I started earlier today thinking I had a simple problem to solve and that there was useful guides to help me but now I'm finding most of the resources unusable.

    this seems to be a bit of a disaster for anyone looking for support.

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Hi David, thanks for reporting this — what you describe is definitely not intentional. It looks like one of the problems you encountered is caused by an inadequate redirect instruction on links that have “guide” in them, which was replaced last year by “documentation”. We’ll fix that ASAP but in the meantime you can find what you need by copying the link and replacing guide by documentation. If you encounter other broken links that don’t fit that pattern, please post them here. I’m sorry it’s been so rocky for you so far.

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Ajwils, to clarify, are you saying it’s the discussion threads specifically that you’d like us to preserve for the new feature announcements? For things like that I think we’d like to try to summarize the information into feature docs rather than keep it around in discussion format. Would that work for you?

    0
    Comment actions Permalink
  • Avatar
    ajwils

    Geraldine Van der Auwera

    I suppose I am saying that:

    A) The announcement threads themselves should be migrated over as is, as they are a great informational resource for the newer features. This shouldn't be too hard. It's just copy pasting the archived announcements into new announcement threads and fixing any links.

    and 

    B) Find some way to migrate/mirror the actual ongoing discussions, both in those announcement posts, and across the Ask GATK threads. I don' think it would need to go too far back. If this really isn't possible even just for threads with recent posts within the last month or two (I know you are short staffed), then summarized info in feature docs would be good. But honestly, really going through and summarizing two or three months worth of open discussions is probably more work than say making a few hundred threads, copying the code for the original post, then at the bottom of the post (or in a comment) add an archived version of the old page as a saved html attachment.

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Ah I see what you mean, thanks for clarifying. To be frank, we considered doing that but decided to take a different approach in order to boost the sustainability of our knowledge base. Part of why we're in this pickle now is because for far too long, we relied on letting useful information live in announcements and in discussion threads that really should have become part of the main documentation. As a result, we evolved this labyrinthine structure where the information is scattered all over the place, which makes it difficult for people to find in the first place and even harder for us to maintain adequately over time. It's also what makes a migration like this so difficult for everyone. In contrast, if we had a more neatly organized knowledge base with a process to pull in every tidbit of useful information as it materializes in discussion into the main corpus, we'd be in a much better state. That's what we're trying to build with this reboot. So you're right that it would be only a small amount of effort to copy paste content over, but then we'd be perpetuating this semi-chaotic state. We would rather invest a larger amount of effort into laying a better foundation for the future, even if it comes at the cost of some bumpiness during the transition period. 

    1
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    David Murphy: I just fixed the issue with the redirects on URLs containing "guide" that you reported. Let me know if you come across any other broken links.

    Thank you for reporting this, it's likely that many others had the same problem but didn't report it. You just did a big favor to a lot of people. 

    0
    Comment actions Permalink
  • Avatar
    ajwils

    Geraldine Van der Auwera

    I fully respect the choice to trim down, despite being apprehensive about potential lost content. If you're all really prepared to painstakingly port over vast amounts of info into feature docs, well then I wish you luck.

     

    Unrelated side-note, would it be possible to implement the rich format editor for comments and posts, or is this still unavailable for end-users of zendesk? 7 options is pretty barebones.

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Thanks ajwils, we'll do our best not to let you all down. 

    I asked about the editor, unfortunately that's a limitation of Zendesk. Apparently there's an upgrade coming so we'll see what that provides. Is there anything specific you want to be able to do? 

    1
    Comment actions Permalink
  • Avatar
    ajwils

    Geraldine Van der Auwera

    Off the top of my head:

    • attachments, although in the meantime we can certainly just hyperlink to a filehoster
    • block quotes
    • indentation control
    • code blocks
    0
    Comment actions Permalink
  • Avatar
    David Murphy

    Thanks so much for the remarkably fast fix!

    I'll let you know if I hit any more but so far what I've tried has worked perfectly!

    0
    Comment actions Permalink
  • Avatar
    Jacob “Buzz” Roberts

    Can I access GATK release v3.6?

    0
    Comment actions Permalink
  • Avatar
    David Murphy

    I tried to post another reply here about some other dead links but it just never turned up.

    I'm getting a lot of google links to the site that lead to SSL_ERROR_BAD_CERT_DOMAIN

    https://gatkforums.broadinstitute.org/gatk/discussion/6447/gzipped-gvcf-files

    There also appears to be pages missing from the linked github archive of the forums.

    Is there any up to date zip of the old/removed/moved forum answers?

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Hi David Murphy, sorry I missed your reply. I believe the team is collecting reports etc in the "Other" section of the forum here: https://gatk.broadinstitute.org/hc/en-us/community/topics/360001488892

    The bad cert problem popped up just before the weekend, the team is working on getting that addressed. Sorry about that! 

    Re: missing pages in the github archive, can you give an example? I can try to hunt down whatever's missing and see if there's a bulk fix we can do.

     

    0
    Comment actions Permalink
  • Avatar
    warren kretzschmar

    I am finding the tool docs for GATK3.8 very impractical to use. Below is what I see when I try to find the documentation for indel realignment (and yes, indel realignment is still a very useful tool). Then, in order to access the tool documentation I need to download the raw file and open it manually in a new browser tab. This is madness. Why can't these files not be hosted statically on a server somewhere? Maybe something like what htslib does: http://www.htslib.org/doc/#manual-pages 

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Hi warren kretzschmar, I understand it's not ideal. Unfortunately this was necessary to reduce our maintenance burden. I would recommend that you clone the repository so that you have a local copy on your computer. Then you should be able to browse the docs from the index page like you would do on a website. Let me know if you run into any problems with that approach. 

    0
    Comment actions Permalink
  • Avatar
    warren kretzschmar

    Hi Geraldine, that is helpful. Thank you.

    0
    Comment actions Permalink
  • Avatar
    Dr. venkateswara swamy

    Hi Geraldine Van der Auwera   While running cfsan SNP-pipiline with following command

    cfsan_snp_pipeline run -s samples reference/lambda_virus.fasta

    https://snp-pipeline.readthedocs.io/en/latest/usage.html#all-in-one-workflow-lambda

    has given the error as given below.

    ***********************************************************************

    A USER ERROR has occurred: '-T' is not a valid command.


    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    Error occured while running:
    java -Xmx3500m -jar /home/swamy123/softwares/GenomeAnalysisTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference/lambda_virus.fasta -I samples/sample1/reads.sorted.deduped.bam -o samples/sample1/realign.target.intervals --logging_level WARN --num_threads 4
    # Command : /home/swamy123/.local/bin/cfsan_snp_pipeline map_reads --threads 4 reference/lambda_virus.fasta samples/sample2/sample2_1.fastq samples/sample2/sample2_2.fastq
    # Working Directory : /home/swamy123/Desktop/swamy/test/lambdaVirusInputs
    # Hostname : swamy123-Inspiron-N5110
    # RAM : 3,842 MB
    # Python Version : 2.7.17 (default, Nov 7 2019, 10:07:09) [GCC 7.4.0]
    # Program Version : cfsan_snp_pipeline map_reads 2.1.1

    # 2020-04-23 08:12:53 cfsan_snp_pipeline map_reads --threads 4 reference/lambda_virus.fasta samples/sample2/sample2_1.fastq samples/sample2/sample2_2.fastq
    Options:
    forceFlag=False
    referenceFile=reference/lambda_virus.fasta
    sampleFastqFile1=samples/sample2/sample2_1.fastq
    sampleFastqFile2=samples/sample2/sample2_2.fastq
    threads=4
    verbose=1

    # Align sequence sample2 to reference lambda_virus
    # 2020-04-23 08:12:56 bowtie2 --rg-id 1 --rg SM:sample2 --rg LB:1 --rg PU:sample2 --reorder -X 1000 -p 4 -x reference/lambda_virus -1 samples/sample2/sample2_1.fastq -2 samples/sample2/sample2_2.fastq
    # bowtie2 version 2.3.4.1
    10000 reads; of these:
    10000 (100.00%) were paired; of these:
    888 (8.88%) aligned concordantly 0 times
    9112 (91.12%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    888 pairs aligned concordantly 0 times; of these:
    59 (6.64%) aligned discordantly 1 time
    ----
    829 pairs aligned 0 times concordantly or discordantly; of these:
    1658 mates make up the pairs; of these:
    1044 (62.97%) aligned 0 times
    614 (37.03%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
    94.78% overall alignment rate

    # Convert sam file to bam file with only mapped positions.
    # 2020-04-23 08:13:00 samtools view -S -b -F 4 -q 30 --threads 4 -o samples/sample2/reads.unsorted.bam samples/sample2/reads.sam
    # SAMtools version 1.4

    # Convert bam to sorted bam file.
    # 2020-04-23 08:13:00 samtools sort --threads 4 -o samples/sample2/reads.sorted.bam samples/sample2/reads.unsorted.bam
    # SAMtools version 1.4

    # Mark duplicate reads in bam file.
    # 2020-04-23 08:13:00 java -Xmx2000m -jar /home/swamy123/softwares/picard.jar MarkDuplicates INPUT=samples/sample2/reads.sorted.bam OUTPUT=samples/sample2/reads.sorted.deduped.bam METRICS_FILE=samples/sample2/duplicate_reads_metrics.txt VERBOSITY=WARNING
    # Picard version 2.22.3
    INFO 2020-04-23 08:13:01 MarkDuplicates

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    ********** MarkDuplicates -INPUT samples/sample2/reads.sorted.bam -OUTPUT samples/sample2/reads.sorted.deduped.bam -METRICS_FILE samples/sample2/duplicate_reads_metrics.txt -VERBOSITY WARNING
    **********


    08:13:02.068 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/swamy123/softwares/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Apr 23 08:13:02 IST 2020] MarkDuplicates INPUT=[samples/sample2/reads.sorted.bam] OUTPUT=samples/sample2/reads.sorted.deduped.bam METRICS_FILE=samples/sample2/duplicate_reads_metrics.txt VERBOSITY=WARNING MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Apr 23 08:13:02 IST 2020] Executing as swamy123@swamy123-Inspiron-N5110 on Linux 5.3.0-46-generic amd64; OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.22.3
    WARNING 2020-04-23 08:13:02 AbstractOpticalDuplicateFinderCommandLineProgramA field field parsed out of a read name was expected to contain an integer and did not. Read name: r2141. Cause: String 'r2141' did not start with a parsable number.
    [Thu Apr 23 08:13:04 IST 2020] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=28311552

    # Index bam file.
    # 2020-04-23 08:13:04 samtools index -@ 4 samples/sample2/reads.sorted.deduped.bam samples/sample2/reads.sorted.deduped.bai
    # SAMtools version 1.4

    # Identify targets for realignment.
    # 2020-04-23 08:13:04 java -Xmx3500m -jar /home/swamy123/softwares/GenomeAnalysisTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference/lambda_virus.fasta -I samples/sample2/reads.sorted.deduped.bam -o samples/sample2/realign.target.intervals --logging_level WARN --num_threads 4
    # GATK version 2.21.2

    I am using GATK 4.1.6.0 and has given path. How to resolve the error.

    Please guide.

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Hi Dr. venkateswara swamy, the syntax has changed in GATK 4 so you cannot use scripts written for earlier versions without modification. For example, this error is pointing out that the “-T” argument is no longer used to specify the tool. There are other changes as well. For additional assistance, please post a question in the Forum section of the website (this is just the Blog).

    0
    Comment actions Permalink
  • Avatar
    Ana Rita Marques

    Hi Geraldine Van der Auwera  

    Did you change GATK best practices regarding the "pre-processing" steps?
    I can not find the "Indel realignment" tools (RealignerTargetCreator and IndelRealigner) that were important in the pre-processing step. Why did you remove this step from the GATK best practices?

    Thanks,
    Ana Marques

    0
    Comment actions Permalink
  • Avatar
    Geraldine Van der Auwera

    Hi Ana Rita Marques,

    Yes we dropped the indel realignment step from the Best Practices; I think that was about 2 years ago. There was a blog post about it. We found that this step was no longer useful when the variant calling was done with HaplotypeCaller or Mutect2, which implement a more sophisticated and effective form of realignment.

    0
    Comment actions Permalink
  • Avatar
    Ana Rita Marques

    Thanks a lot for your explanation Geraldine Van der Auwera.

    I was not using GATK for a while and I did not notice.

    0
    Comment actions Permalink
  • Avatar
    lid.zigh

    Hello Geraldine Van der Auwera,

    I just started working with GATK4 and was looking for new version of RealignerTargetCreator and IndelRealigner and notice this post. Could you please confirm that no need to run RealignerTargetCreator and IndelRealigner for HaplotypeCaller ? 

    Thank you,

    Lida 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk