Picard LiftOver behaviour for multiallelic positions
Hello GATK community,
This post is more of a question about LiftOver behaviour than a troubleshooting one.
I want to lift over archaic human genomes from hg19 to hg38. I cannot understand how LiftOver will deal with multiallelic positions (when archaic human ALT is different from modern human ALT).
Option 1) is LiftOver only checking position and strand from the chain file and, as a result, lift over any position with ALT different from the hg19 or hg38 ALT?
Or
Option 2) does LiftOver use an extra step which checks whether REF and ALT from the genome to be lifted correspond to REF and ALT either in hg19 or hg38? In this case, rejecting any multiallelic site in the archaic human genome, and increasing resemblance with the modern human reference genome.
I hope this makes sense. Please, if someone better understands LiftOver I would appreciate any insight.
Adeline
-
Hi Adeline,
One thing to note, Liftover only knows about the REF and ALT in the input (in your case hg19) and the REF in the output (in your case hg38) references. It does not know anything about common ALT alleles in archaic vs modern human populations, since the reference sequence only defines the reference, not possible ALTs. The ALT allele is coming from the vcf, which contains that additional information (unlike the reference on its own).
Liftover will check that the REF allele matches between the two genomes. If it doesn't, you can set RECOVER_SWAPPED_REF_ALT to attempt to rescue sites where the ref and alt alleles are swapped. This procedure is only available for biallelic sites, not multialleleics.
However, I think you are asking about sites where the REF has stayed the same, but modern humans and archaic humans tend to have different alt alleles, leading to multiallelic sites in a vcf which contains both modern and archaic humans. In these cases, Liftover will lift the site over fine, which I don't think should be a problem. It will still be clear that the variant in an archaic human is different from the variant in a modern human, because the alleles indicated by the genotypes will be different (for example 0/1 in archaic humans but 0/2 in modern humans. Or split into two separate sites with different alt alleles.). As long as your method for comparing modern and human genomes is aware of the alt alleles and genotypes, this should not artificially increase the resemblance between archaic and modern humans.
Please sign in to leave a comment.
1 comment