Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCF by intervals

0

11 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Anna, we have made improvements to GenomicsDB and GenotypeGVCFs since GATK version gatk/4.0.10.0, I would recommend updating your GATK to 4.1.9.0 [our current version] to run GenotypeGVCFs. If you are running on a cluster, you can also use the new option --genomicsdb-shared-posixfs-optimizations to get the best performance. 

    0
    Comment actions Permalink
  • Avatar
    @Anna

    Dear Genevieve,

    Some of the vcf files were obtained with version gatk/4.0.10.0, and so when I run GenotypeGVCF with the updated version it shows me this:

    A USER ERROR has occurred: Bad input: Presence of '-RAW_MQ' annotation is detected. This GATK version expects key RAW_MQandDP with a tuple of sum of squared MQ values and total reads over variant genotypes as the value. This could indicate that the provided input was produced with an older version of GATK. Use the argument '--allow-old-rms-mapping-quality-annotation-data' to override and attempt the deprecated MQ calculation. There may be differences in how newer GATK versions calculate DP and MQ that may result in worse MQ results. Use at your own risk.

     

    Can you please tell me if it is ok to use the recent version despite of this error?

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Anna,

    It is up to you and how you are using your data. There is a discussion at our legacy forum site that summarizes the changes to the RMSMappingQuality annotation. Ideally, we would recommend using the same GATK version for all steps of the platform, but if you want to get the best performance for GenotypeGVCFs, you will need to use a newer version.

    0
    Comment actions Permalink
  • Avatar
    @Anna

    Hi Genevieve,

    ok, I will have a look and try to decide what is best at this time.

    Still, I don't understand with I didn't have the same problem for chr 1-15, which ran smoothly. Do you know why? Can you please explain it to me?

    Again, thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Anna,

    You said some of the files were created with different versions of GATK. Do you know which version was used for chr 1-15? 

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    @Anna

    Hi Genevieve,

    I'm sorry. All vcf files were created with gatk/4.0.10.0. I could eventually do the calling with the updated version, but only for some of the samples.
    The batch is the same for chr1-15 and chr16-22.

    Anna

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Did you use the newer version of GATK with the chr1-15? If not, you would not have seen this error: 

    A USER ERROR has occurred: Bad input: Presence of '-RAW_MQ' annotation is detected. This GATK version expects key RAW_MQandDP with a tuple of sum of squared MQ values and total reads over variant genotypes as the value. This could indicate that the provided input was produced with an older version of GATK. Use the argument '--allow-old-rms-mapping-quality-annotation-data' to override and attempt the deprecated MQ calculation. There may be differences in how newer GATK versions calculate DP and MQ that may result in worse MQ results. Use at your own risk.

    0
    Comment actions Permalink
  • Avatar
    @Anna

    I used gatk/4.0.10.0 on all the steps until the genotyping. The genotyping went well for chr 1-15, but for the rest it was taking too long (event when I tried using smaller intervals). So I have tried your suggestion of using the latest version for the genotyping of chr 16-22, and that is when I have that error.
    I would prefer doing everything with the same version, as you also recommended, but I cannot understand why I am having such differencies, if my scripts are the same (just change the - L option, by chr).

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Anna, 

    I wouldn't expect that there would be a difference in the time and memory between chromosomes like you are seeing. I wonder if there is an issue with the space available in location that GenotypeGVCFs is using as temporary space. If you re-run one of the chromosomes 1-15 now with the same command, does it run easily? You can use the option --tmp-dir with 4.0.10.0 (Tool Docs page) to specify a temporary space with enough room. 

    Please note, the GATK Team is out of office and resolving this issue may take longer than normal. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    @Anna

    Ok, I will try that.

    Thank you so much for your help!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    No problem, hope this solves your issue!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk