Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Issues with LiftoverVcf GRch37 to GRch38

1

6 comments

  • Avatar
    Eric Nguyen

    If I switch to a hg19 to hg38 chain file (adding "chr" to the chromosome numbers) I get 80% of variants fail to liftover instead of 100%. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Eric Nguyen

    Can you share the sequence dictionary of the original vcf file and also can you try this tool using the latest version available?

    0
    Comment actions Permalink
  • Avatar
    VivekTodur

    My source vcf and target genome do not have alt sequences, but the chain file downloaded from UCSC has it and is causing the issue...

    Any solution to this problem? 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi VivekTodur

    liftover chain files are crafted specific for source and target genomes therefore modifications to those files are not recommended. If you wish to create your own liftover chains you may check UCSC's documentation on liftover from the link below.

    https://genomewiki.ucsc.edu/index.php/LiftOver_Howto 

    On the other hand if you are dealing with only primary contigs for liftover, you may temporarily move your variants to a genome with alt contigs, remove liftovers that hit alt contigs and replace sequence dictionary with the one you are interested therefore you will have most of your variants lifted over without issues. However you need to pay attention compatibility of reference contigs across both target genomes. Unless they are compatible then you may face non-matching reference nucleotide issues in downstream applications.

    I hope this helps. 

    0
    Comment actions Permalink
  • I am getting same error

    Command used: java -jar picard.jar LiftoverVcf I=Chr_prefixed_vcf.gz  O=lifted_vcf.gz CHAIN=hg19ToHg38.over.chain.gz REJECT=rejected_variants.vcf R=genome2.fna 

    Error: ERROR    2024-07-19 12:25:54    LiftoverVcf    Encountered a contig, chr1 that is not part of the target reference.

    After adding WARN_ON_MISSING_CONTIG=true in command

    Multiple warnings: WARNING    2024-07-19 12:04:58    LiftoverVcf    Encountered a contig, chr1 that is not part of the target reference.

    I am pasting snippets of my input files

    chain file

    chain 20851231461 chr1 249250621 + 10000 249240621 chr1 248956422 + 10000 248946
    167376  50041   80290

    Reference

    >NC_000001.11 Homo sapiens chr1, GRCh38.p14 Primary Assembly
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

    Vcf

    chr1    866422  rs139210662     C       T       70880.8 PASS    AC=74;AF=0.00284
    chr1    866430  rs148774856     A       G       981.16  PASS

    All three of them have chr as chromosome notation

     

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Niharika (PhD, Bioinformatics 2020)

    Your target reference does not have proper contig names that are compatible with the chain file.

    >NC_000001.11

    This is the contig name present in the reference file. Other stuff next to this name is just a description and does not count as anything useful. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk