Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Issues with LiftoverVcf

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Mohammad Deen Hayatu,

    If GATK is having a hard time finding the dictionary, I would recommend using GATK to create an updated sequence dictionary and then trying to run the tool again. 

    You can check out GATK's CreateSequenceDictionary: https://gatk.broadinstitute.org/hc/en-us/articles/5358872471963-CreateSequenceDictionary-Picard-

    Let me know if this still doesn't work!

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Mohammad Deen Hayatu

    Hello Genevieve Brandt (she/her),

    The code finally worked when i removed the reference and all the associated dictionaries and redownloaded from the link provided by gatk. However i run into a different issue subsequently and i will state that below;


    **********    LiftoverVcf -I Trio_Nijmegen.vcf -O 220648.hg19.trio.vcf -CHAIN hg38ToHg19.over.chain.gz -REJECT rejected_variants.vcf -R /home/sbsuser/Deen/picard-2.27.4/Homo_sapiens_assembly19.fasta
    **********


    16:37:02.253 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/sbsuser/Deen/picard-2.27.4/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Oct 06 16:37:02 CEST 2022] LiftoverVcf INPUT=Trio_Nijmegen.vcf OUTPUT=220648.hg19.trio.vcf CHAIN=hg38ToHg19.over.chain.gz REJECT=rejected_variants.vcf REFERENCE_SEQUENCE=/home/sbsuser/Deen/picard-2.27.4/Homo_sapiens_assembly19.fasta    WARN_ON_MISSING_CONTIG=false LOG_FAILED_INTERVALS=true WRITE_ORIGINAL_POSITION=false WRITE_ORIGINAL_ALLELES=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false RECOVER_SWAPPED_REF_ALT=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] DISABLE_SORT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Oct 06 16:37:02 CEST 2022] Executing as sbsuser@proton on Linux 4.15.0-189-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT
    INFO    2022-10-06 16:37:02     LiftoverVcf     Loading up the target reference genome.
    INFO    2022-10-06 16:37:12     LiftoverVcf     Lifting variants over and sorting (not yet writing the output file.)
    ERROR   2022-10-06 16:37:12     LiftoverVcf     Encountered a contig, chr1 that is not part of the target reference.
    [Thu Oct 06 16:37:12 CEST 2022] picard.vcf.LiftoverVcf done. Elapsed time: 0.17 minutes.
    Runtime.totalMemory()=6689390592
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    (base) sbsuser@proton:~/Deen/picard-2.27.4$  LiftoverVcf     Encountered a contig, chr1 that is not part of the target reference

    0
    Comment actions Permalink
  • Avatar
    Mohammad Deen Hayatu

    Update.. I finally got it to run by matching my reference to the chain file and also setting WARN_ON_MiSSING_CONTIG=true . I got this warning in the end. WARNING 2022-10-07 09:47:55     LiftoverVcf     28445 variants with a swapped REF/ALT were identified, but were not recovered.  See RECOVER_SWAPPED_REF_ALT and associated caveats.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great news! Thank you for posting your solution Mohammad Deen Hayatu!

    0
    Comment actions Permalink
  • Avatar
    patrick oconnell

    I am having the same issue with the lack of recognition of the .dict file. I rebuilt the .dict file as suggested and the issue persists. Mohammad Deen Hayatu can you provide further details on how you correct this? Genevieve Brandt (she/her) is there a known solution to this? I am performing the same genome conversion as described above. Thanks!

    0
    Comment actions Permalink
  • Avatar
    Mohammad Deen Hayatu

    All i had to do was redownload the reference from the GATK website including the dictionary. This is easier of course if youre using the GATK specific reference like b37 etc. If you're using hg19, you would have to build the dictionary using the link Genevieve provided above. Also ensure the reference.fasta, reference.fasta.fai and reference.dict files are all in the same directory

    0
    Comment actions Permalink
  • Avatar
    Ohanna Cavalcanti

    Trying to do the same liftover, I got the same error and followed Mohammad. The link provided by gatk did not worked for me because the reference did not match the contig of my original vcf. Mine had 'chr21' and the hg19_v0_Homo_sapiens_assembly19.fasta had '21'. In this case I downloaded the reference hg19.fa.gz (which presents 'chr21') and was able to produce the hg19.fa.gz.dic, as suggested by Genevieve, and it all worked!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk