Liftover Picard hg19 to hg38: variants were not successfully lifted over
AnsweredI am trying to convert a sample vcf file from GRCh37 that comes with the VEP installation to see how this tool works, but for some unknown reason, it is not working for me. I have also tried with another smaller test vcf file..but same luck.
a) GATK version used: v4.3.0.0-12 and Picard Version: 2.27.5
b) Exact command used:
java -Xmx8g -jar ~/picard/picard.jar LiftoverVcf \
I=input/homo_sapiens_hg19.vcf \
O=output/test_hg19_lifted.vcf \
CHAIN=hg19ToHg38.over.chain.gz \
REJECT=output/rejected_vars.vcf \
R=/root/.vep/homo_sapiens/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
c) Entire program log:
Executing as ...@... on Linux 5.15.0-56-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.5
INFO 2022-12-09 18:31:41 LiftoverVcf Loading up the target reference genome.
INFO 2022-12-09 18:32:15 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2022-12-09 18:32:15 LiftoverVcf Processed 173 variants.
INFO 2022-12-09 18:32:15 LiftoverVcf 173 variants failed to liftover.
INFO 2022-12-09 18:32:15 LiftoverVcf 0 variants lifted over but had mismatching reference alleles after lift over.
INFO 2022-12-09 18:32:15 LiftoverVcf 100.0000% of variants were not successfully lifted over and written to the output.
INFO 2022-12-09 18:32:15 LiftoverVcf liftover success by source contig:
INFO 2022-12-09 18:32:15 LiftoverVcf 21: 0 / 37 (0.0000%)
INFO 2022-12-09 18:32:15 LiftoverVcf 22: 0 / 136 (0.0000%)
INFO 2022-12-09 18:32:15 LiftoverVcf lifted variants by target contig:
INFO 2022-12-09 18:32:15 LiftoverVcf no successfully lifted variants
WARNING 2022-12-09 18:32:15 LiftoverVcf 0 variants with a swapped REF/ALT were identified, but were not recovered. See RECOVER_SWAPPED_REF_ALT and associated caveats.
INFO 2022-12-09 18:32:15 LiftoverVcf Writing out sorted records to final VCF.
[Fri Dec 09 18:32:15 UTC 2022] picard.vcf.LiftoverVcf done. Elapsed time: 0.57 minutes.
Runtime.totalMemory()=3167748096
d) This is the head of the output vcf file:
##fileformat=VCFv4.2
##INFO=<ID=ReverseComplementedAlleles,Number=0,Type=Flag,Description="The REF and the ALT alleles have been reverse complemente
d in liftover since the mapping from the previous reference to the current one was on the negative strand.">
##INFO=<ID=SwappedAlleles,Number=0,Type=Flag,Description="The REF and the ALT alleles have been swapped in liftover due to chan
ges in the reference. It is possible that not all INFO annotations reflect this swap, and in the genotypes, only the GT, PL, an
d AD fields have been modified. You should check the TAGS_TO_REVERSE parameter that was used during the LiftOver to be sure.">
##contig=<ID=1,length=248956422>
.....
ANY IDEAS PLEASE?
-
In case someone finds this post, I finally managed to solve it by using a different chain file downloaded from Ensembl (GRCh37_to_GRCh38.chain.gz). The one I was using (downloaded from UCSC) was the one causing the issue.
-
Download link:
-
Same problem here. Considering your note on using the GRCh37_to_GRCh38.chain.gz chain file, I found that the issue is related to chromosome names. If chromosome names in your VCF are like 1, 2, X, M, etc., you can't use the hg19ToHg38.over.chain.gz chain file. There are two solutions:
- Change chromosome names in the VCF using bcftools annotate --rename-chrs.
- Use GRCh37_to_GRCh38.chain.gz instead of hg19ToHg38.over.chain.gz.
Post is closed for comments.
3 comments