Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Funcotator - all IGR classification

Answered
0

10 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Noam Rudberg,

    Yes, this looks like the same issue from the previous post you linked. I would recommend looking closer into your VCF file to verify that it matches the reference version you are using for your data sources, because Funcotator did not find any matches.

    Let me know if you have any other questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Noam Rudberg

    Hi Genevieve,

    Thanks for your response.

    I'm not sure I fully understand what your meaning is by looking into the VCF file. I used the ValidateVariants tool this way:

    gatk ValidateVariants -R Homo_sapiens_assembly19.fasta -V unique_variants.vcf

    and got this output which seems fine?

    16:19:09.125 INFO  ValidateVariants - ------------------------------------------------------------
    16:19:09.126 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
    16:19:09.126 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:19:09.127 INFO  ValidateVariants - Executing as student@ubuntu18 on Linux v4.15.0-60-generic amd64
    16:19:09.127 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.15+10-Ubuntu-0ubuntu0.18.04.1
    16:19:09.128 INFO  ValidateVariants - Start Date/Time: May 25, 2022 at 4:19:08 PM UTC
    16:19:09.128 INFO  ValidateVariants - ------------------------------------------------------------
    16:19:09.129 INFO  ValidateVariants - ------------------------------------------------------------
    16:19:09.129 INFO  ValidateVariants - HTSJDK Version: 2.24.1
    16:19:09.130 INFO  ValidateVariants - Picard Version: 2.27.1
    16:19:09.130 INFO  ValidateVariants - Built for Spark Version: 2.4.5
    16:19:09.130 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:19:09.130 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:19:09.130 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:19:09.131 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:19:09.131 INFO  ValidateVariants - Deflater: IntelDeflater
    16:19:09.131 INFO  ValidateVariants - Inflater: IntelInflater
    16:19:09.131 INFO  ValidateVariants - GCS max retries/reopens: 20
    16:19:09.131 INFO  ValidateVariants - Requester pays: disabled
    16:19:09.132 INFO  ValidateVariants - Initializing engine
    16:19:09.341 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/unique_variants.vcf
    16:19:09.370 INFO  ValidateVariants - Done initializing engine
    16:19:09.370 WARN  ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
    16:19:09.370 WARN  ValidateVariants - Other possible validations will still be performed
    16:19:09.375 INFO  ProgressMeter - Starting traversal
    16:19:09.375 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    16:19:19.406 INFO  ProgressMeter -           7:72424622              0.2                537000        3212683.2
    16:19:29.406 INFO  ProgressMeter -          22:29184280              0.3               1341000        4016974.5
    16:19:30.690 INFO  ProgressMeter -           Y:15363045              0.4               1403333        3950268.8
    16:19:30.690 INFO  ProgressMeter - Traversal complete. Processed 1403333 total variants in 0.4 minutes.
    16:19:30.691 INFO  ValidateVariants - Shutting down engine
    [May 25, 2022 at 4:19:30 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.37 minutes.
    Runtime.totalMemory()=161480704

     

    Do you have any other ideas for verifying the match between the files?

    Thanks,

    Noam

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Noam Rudberg,

    Yes, this tool is a great first step. I noticed though that here you are validating unique_variants.vcf instead of the file you used for Funcotator new_vars.vcf. Could you try this command with new_vars.vcf?

    Another troubleshooting option is to manually look at the VCF file and the annotation files to verify that the chromosome naming conventions match and make sure there are matching variants if possible.

    Also, I see that in your unique_variants.vcf file, this is what the positions look like: 7:72424622. Generally hg19 has a naming convention of "chr1" instead of "1". This indicates to me that your variants might be an alternative version of hg19 that is different than the Funcotator hg19. You can take a look at this article for more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035890951-Human-genome-reference-builds-GRCh38-or-hg38-b37-hg19

    Let me know if you have any further questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Noam Rudberg

    Hi Genevieve-Brandt-she-her,

    new_vars.vcf is just a subset of the full VCF file I use to try Funcotator with, to save me some time until it works well :)

    Anyway, here's what you asked for:

    16:54:29.554 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/student/Downloads/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    16:54:29.806 INFO  ValidateVariants - ------------------------------------------------------------
    16:54:29.807 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
    16:54:29.807 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:54:29.807 INFO  ValidateVariants - Executing as student@ubuntu18 on Linux v4.15.0-60-generic amd64
    16:54:29.808 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.15+10-Ubuntu-0ubuntu0.18.04.1
    16:54:29.808 INFO  ValidateVariants - Start Date/Time: May 25, 2022 at 4:54:29 PM UTC
    16:54:29.808 INFO  ValidateVariants - ------------------------------------------------------------
    16:54:29.808 INFO  ValidateVariants - ------------------------------------------------------------
    16:54:29.809 INFO  ValidateVariants - HTSJDK Version: 2.24.1
    16:54:29.809 INFO  ValidateVariants - Picard Version: 2.27.1
    16:54:29.809 INFO  ValidateVariants - Built for Spark Version: 2.4.5
    16:54:29.810 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:54:29.810 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:54:29.810 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:54:29.810 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:54:29.810 INFO  ValidateVariants - Deflater: IntelDeflater
    16:54:29.810 INFO  ValidateVariants - Inflater: IntelInflater
    16:54:29.810 INFO  ValidateVariants - GCS max retries/reopens: 20
    16:54:29.810 INFO  ValidateVariants - Requester pays: disabled
    16:54:29.811 INFO  ValidateVariants - Initializing engine
    16:54:30.052 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/student/Downloads/gatk-4.2.6.1/new_vars.vcf
    16:54:30.071 INFO  ValidateVariants - Done initializing engine
    16:54:30.072 WARN  ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
    16:54:30.072 WARN  ValidateVariants - Other possible validations will still be performed
    16:54:30.080 INFO  ProgressMeter - Starting traversal
    16:54:30.080 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    16:54:30.125 INFO  ProgressMeter -             unmapped              0.0                   173         235909.1
    16:54:30.125 INFO  ProgressMeter - Traversal complete. Processed 173 total variants in 0.0 minutes.
    16:54:30.125 INFO  ValidateVariants - Shutting down engine
    [May 25, 2022 at 4:54:30 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=161480704

     

    Regarding the chromosome convention: it seemed that in the reference file I'm using the version is GRCh37.

    From the .fasta file:

    I can easily change the "1" to "chr1" in my VCF file, but not sure it will work with the reference this way?

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yeah, it looks like your reference for this VCF file is GRCh37, which is different than hg19. There are probably more differences than just renaming the chromosomes. We recommend using LiftOver to change the reference version of VCF files. 

    0
    Comment actions Permalink
  • Avatar
    Noam Rudberg

    Thanks!

    On the LiftOver page, there's a "b37tohg38.chain" chain file while there's no mention of b37 on the chain file download page. So I have two questions:

    1. Do you know another source of chain files?

    2. Did you mean that I should convert my GRCh37 to hg19?

    BTW, from this article, it seems that my version is actually b37.. (reference file name + the chromosome naming)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)
    1. The file you are referring to is just an example, we don't maintain these chain files. Information about our resources can be found on our resource bundle page.
    2. Yes, you can convert your file or you can re-call your variants with a reference version that will be more compatible with Funcotator. Whatever works best for your goals. You can also create your own Funcotator data sources with the b37 reference.
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Noam Rudberg actually, I think it's possible to run Funcotator out of the box with b37. Check out this other forum post: https://gatk.broadinstitute.org/hc/en-us/community/posts/360060979451-Funcotator-b37-and-hg19-contig-compatibility-issue

    1
    Comment actions Permalink
  • Avatar
    Noam Rudberg

    Genevieve, thanks a lot!

    Adding the --force-b37-to-hg19-reference-contig-conversion flag worked.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great news!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk