Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Question about handling genotypes during VCF liftOver from hg38 to hg19 using LiftoverVCF command

0

3 comments

  • Avatar
    Laura Gauthier

    Hi user105689,

    The picard code says 

    If this interval is in the opposite orientation, all alleles and genotypes will be reverse complemented and indels will be left-aligned.

    I don't see any explicit mention of phasing, but based on the code I would expect the phasing to be maintained.

    I have run this tool myself on real data and I do remember it being "memory hungry", but I don't remember how much I actually required.  Sorry.

    -Laura

    1
    Comment actions Permalink
  • Avatar
    user105689

    Hi Laura Gauthier,

    Thank you for your response.

    I had a concern about whether Picard's LiftoverVcf tool can successfully modify genotypes in situations where REF/ALT allele swaps occur across different reference genomes. In my original post, I mentioned that Reverse Complemented Alleles might need modification of genotypes, but I may have been mistaken.

    After successfully running the tool, I can confirm that GATK is able to modify genotypes when REF/ALT allele swaps occur (e.g. 0|0 -> 1|1, 0|1 -> 1|0, 1|0 -> 0|1, 1|1 -> 0|0). It's worth noting that REF/ALT allele swaps are only recovered if the "RECOVER_SWAPPED_REF_ALT" argument is used.

    Here are a few additional comments that might be helpful: Please note that these comments are accurate as of April 2023, and may not be applicable in the future.

    • I encountered a memory error when attempting to install Picard via conda, so I ended up using CrossMap (v0.6.4) instead, as it required less memory for VCF liftOver. However, I discovered that the genotypes were not successfully modified as expected with CrossMap (v0.6.4).
    • I ultimately installed Picard v2.27.5 binaries (https://github.com/broadinstitute/picard/releases/tag/2.27.5). The job took approximately 45 minutes to run for 1,230,000 variants x 4200 samples. Please check end of this post for the code that was used to run it.
    • Unfortunately, I cannot determine the amount of memory actually used, as multiple jobs were run in parallel on the HPC cluster.

    Code that was used to run:

    java -jar /path/to/picard_v2.27.5/picard.jar \
    LiftoverVcf \
    I=${INPUT_VCF} \
    O=${OUTPUT_VCF} \
    REJECT=${OUTPUT_REJECT_VCF} \
    R=${REFERENCE_GENOME} \
    CHAIN=${HG38toHG19_CHAIN} \
    MAX_RECORDS_IN_RAM=50000 \
    RECOVER_SWAPPED_REF_ALT=true \
    WRITE_ORIGINAL_ALLELES=true \
    WRITE_ORIGINAL_POSITION=true

    Hope this helps!

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    Great, that memory benchmarking will likely be helpful to other users in the future -- thanks!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk