Issues with LiftoverVcf
AnsweredSo i am trying to liftover a VCF from hg38 assembly to hg19 using LiftOver tool in picard by i keep running into errors of reference dictionary not present despite having it in the same directory as the reference fasta.
********** The command line looks like this in the new syntax:
**********
********** LiftoverVcf -I 220648.trio.vcf -O 220648.hg19.trio.vcf -CHAIN hg38ToHg19.over.chain.gz -REJECT rejected_variants.vcf -R Homo_sapiens_assembly19.fasta
**********
08:36:42.245 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/sbsuser/Deen/picard-2.27.4/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Oct 06 08:36:42 CEST 2022] LiftoverVcf INPUT=220648.trio.vcf OUTPUT=220648.hg19.trio.vcf CHAIN=hg38ToHg19.over.chain.gz REJECT=rejected_variants.vcf REFERENCE_SEQUENCE=Homo_sapiens_assembly19.fasta WARN_ON_MISSING_CONTIG=false LOG_FAILED_INTERVALS=true WRITE_ORIGINAL_POSITION=false WRITE_ORIGINAL_ALLELES=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false RECOVER_SWAPPED_REF_ALT=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] DISABLE_SORT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Oct 06 08:36:42 CEST 2022] Executing as sbsuser@proton on Linux 4.15.0-189-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT
INFO 2022-10-06 08:36:42 LiftoverVcf Loading up the target reference genome.
ERROR 2022-10-06 08:36:42 LiftoverVcf Reference /home/sbsuser/Deen/picard-2.27.4/Homo_sapiens_assembly19.fasta must have an associated Dictionary .dict file in the same directory.
[Thu Oct 06 08:36:42 CEST 2022] picard.vcf.LiftoverVcf done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2024275968
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
i i am using picard (2.8.3)
-
If GATK is having a hard time finding the dictionary, I would recommend using GATK to create an updated sequence dictionary and then trying to run the tool again.
You can check out GATK's CreateSequenceDictionary: https://gatk.broadinstitute.org/hc/en-us/articles/5358872471963-CreateSequenceDictionary-Picard-
Let me know if this still doesn't work!
Best,
Genevieve
-
Hello Genevieve Brandt (she/her),
The code finally worked when i removed the reference and all the associated dictionaries and redownloaded from the link provided by gatk. However i run into a different issue subsequently and i will state that below;
********** LiftoverVcf -I Trio_Nijmegen.vcf -O 220648.hg19.trio.vcf -CHAIN hg38ToHg19.over.chain.gz -REJECT rejected_variants.vcf -R /home/sbsuser/Deen/picard-2.27.4/Homo_sapiens_assembly19.fasta
**********
16:37:02.253 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/sbsuser/Deen/picard-2.27.4/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Oct 06 16:37:02 CEST 2022] LiftoverVcf INPUT=Trio_Nijmegen.vcf OUTPUT=220648.hg19.trio.vcf CHAIN=hg38ToHg19.over.chain.gz REJECT=rejected_variants.vcf REFERENCE_SEQUENCE=/home/sbsuser/Deen/picard-2.27.4/Homo_sapiens_assembly19.fasta WARN_ON_MISSING_CONTIG=false LOG_FAILED_INTERVALS=true WRITE_ORIGINAL_POSITION=false WRITE_ORIGINAL_ALLELES=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false RECOVER_SWAPPED_REF_ALT=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] DISABLE_SORT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Oct 06 16:37:02 CEST 2022] Executing as sbsuser@proton on Linux 4.15.0-189-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT
INFO 2022-10-06 16:37:02 LiftoverVcf Loading up the target reference genome.
INFO 2022-10-06 16:37:12 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
ERROR 2022-10-06 16:37:12 LiftoverVcf Encountered a contig, chr1 that is not part of the target reference.
[Thu Oct 06 16:37:12 CEST 2022] picard.vcf.LiftoverVcf done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=6689390592
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
(base) sbsuser@proton:~/Deen/picard-2.27.4$ LiftoverVcf Encountered a contig, chr1 that is not part of the target reference -
Update.. I finally got it to run by matching my reference to the chain file and also setting WARN_ON_MiSSING_CONTIG=true . I got this warning in the end. WARNING 2022-10-07 09:47:55 LiftoverVcf 28445 variants with a swapped REF/ALT were identified, but were not recovered. See RECOVER_SWAPPED_REF_ALT and associated caveats.
-
Great news! Thank you for posting your solution Mohammad Deen Hayatu!
-
I am having the same issue with the lack of recognition of the .dict file. I rebuilt the .dict file as suggested and the issue persists. Mohammad Deen Hayatu can you provide further details on how you correct this? Genevieve Brandt (she/her) is there a known solution to this? I am performing the same genome conversion as described above. Thanks!
-
All i had to do was redownload the reference from the GATK website including the dictionary. This is easier of course if youre using the GATK specific reference like b37 etc. If you're using hg19, you would have to build the dictionary using the link Genevieve provided above. Also ensure the reference.fasta, reference.fasta.fai and reference.dict files are all in the same directory
-
Trying to do the same liftover, I got the same error and followed Mohammad. The link provided by gatk did not worked for me because the reference did not match the contig of my original vcf. Mine had 'chr21' and the hg19_v0_Homo_sapiens_assembly19.fasta had '21'. In this case I downloaded the reference hg19.fa.gz (which presents 'chr21') and was able to produce the hg19.fa.gz.dic, as suggested by Genevieve, and it all worked!
Please sign in to leave a comment.
7 comments