Issues with LiftoverVcf GRch37 to GRch38
Hi, I am using LiftoverVcf from GRch37 to GRch38 . I keep getting this error: Encountered a contig, 1 that is not part of the target reference. I've already turned the warning on missing contig to TRUE. But I get 100% variants not successfully being lifted over. I've double checked my reference fasta and dictionary files. The chain file should also be fine because I downloaded from ensembl website.
a) Version: Picard 2.25.6
b) Exact command used:
java -jar $PATH/picard.jar LiftoverVcf I=Input.vcf O=output.vcf.gz CHAIN=GRCh37_to_GRCh38.chain REJECT=rejects.vcf R=Homo_sapiens_assembly38.fasta
c) Entire program log:
INFO 2024-02-27 17:32:53 LiftoverVcf
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** LiftoverVcf -I Input.vcf -O output.gz -CHAIN GRCh37_to_GRCh38.chain -REJECT rejects.vcf -R Homo_sapiens_assembly38.fasta
**********
17:32:53.920 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/vast/palmer/apps/avx2/software/picard/2.25.6-Java-11/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Feb 27 17:32:53 EST 2024] LiftoverVcf INPUT=Input.vcf OUTPUT=output.vcf.gz CHAIN=GRCh37_to_GRCh38.chain REJECT=GTEx_v7_SV_GTEx_v7_liftover_reject_hg38.vcf REFERENCE_SEQUENCE=Homo_sapiens_assembly38.fasta WARN_ON_MISSING_CONTIG=false LOG_FAILED_INTERVALS=true WRITE_ORIGINAL_POSITION=false WRITE_ORIGINAL_ALLELES=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false RECOVER_SWAPPED_REF_ALT=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] DISABLE_SORT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Feb 27 17:32:53 EST 2024] Executing as user on Linux 4.18.0-477.36.1.el8_8.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.16+8; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.6
INFO 2024-02-27 17:32:54 LiftoverVcf Loading up the target reference genome.
INFO 2024-02-27 17:33:05 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
ERROR 2024-02-27 17:33:05 LiftoverVcf Encountered a contig, 1 that is not part of the target reference.
[Tue Feb 27 17:33:05 EST 2024] picard.vcf.LiftoverVcf done. Elapsed time: 0.19 minutes.
Runtime.totalMemory()=7509696512
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
-
If I switch to a hg19 to hg38 chain file (adding "chr" to the chromosome numbers) I get 80% of variants fail to liftover instead of 100%.
-
Hi Eric Nguyen
Can you share the sequence dictionary of the original vcf file and also can you try this tool using the latest version available?
-
My source vcf and target genome do not have alt sequences, but the chain file downloaded from UCSC has it and is causing the issue...
Any solution to this problem?
-
Hi VivekTodur
liftover chain files are crafted specific for source and target genomes therefore modifications to those files are not recommended. If you wish to create your own liftover chains you may check UCSC's documentation on liftover from the link below.
https://genomewiki.ucsc.edu/index.php/LiftOver_Howto
On the other hand if you are dealing with only primary contigs for liftover, you may temporarily move your variants to a genome with alt contigs, remove liftovers that hit alt contigs and replace sequence dictionary with the one you are interested therefore you will have most of your variants lifted over without issues. However you need to pay attention compatibility of reference contigs across both target genomes. Unless they are compatible then you may face non-matching reference nucleotide issues in downstream applications.
I hope this helps.
-
I am getting same error
Command used: java -jar picard.jar LiftoverVcf I=Chr_prefixed_vcf.gz O=lifted_vcf.gz CHAIN=hg19ToHg38.over.chain.gz REJECT=rejected_variants.vcf R=genome2.fna
Error: ERROR 2024-07-19 12:25:54 LiftoverVcf Encountered a contig, chr1 that is not part of the target reference.
After adding WARN_ON_MISSING_CONTIG=true in command
Multiple warnings: WARNING 2024-07-19 12:04:58 LiftoverVcf Encountered a contig, chr1 that is not part of the target reference.
I am pasting snippets of my input files
chain file
chain 20851231461 chr1 249250621 + 10000 249240621 chr1 248956422 + 10000 248946
167376 50041 80290Reference
>NC_000001.11 Homo sapiens chr1, GRCh38.p14 Primary Assembly
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNVcf
chr1 866422 rs139210662 C T 70880.8 PASS AC=74;AF=0.00284
chr1 866430 rs148774856 A G 981.16 PASSAll three of them have chr as chromosome notation
-
Hi Niharika (PhD, Bioinformatics 2020)
Your target reference does not have proper contig names that are compatible with the chain file.
>NC_000001.11
This is the contig name present in the reference file. Other stuff next to this name is just a description and does not count as anything useful.
Regards.
Please sign in to leave a comment.
6 comments