Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Picard LiftOverVCF 2.22.3. hs37d5_to_GRCh38. Many mismatched reference alleles

0

4 comments

  • Avatar
    danilovkiri

    Hi.

    Try to tun 'bcftools norm` prior to liftovering. It might be of help. 

    Also, have a look at the rejected VCF file (there is an argument REJECT to specify the file which will contain all rejected VCF entries) after you try normalizing with bcftools. It might help discover the problem.

    0
    Comment actions Permalink
  • Avatar
    Argonaut44

    Thank you for the feedback. I will try the bcftools norm. Should I also create a custom h37d5_to_GRCh38 chain file or it is not an option?

    0
    Comment actions Permalink
  • Avatar
    Argonaut44

    That was my mistake. I one book on bioinformatics I have read that the reference file should be the one vcf file mapped to. Thus, I did not properly read the original gatk documentation and got so many rejected variants (only 18% liftovered). Changing the reference file to the target fasta (GRCh38) increased the successful liftover rate to ~95%.

    0
    Comment actions Permalink
  • Avatar
    Khadija Sana

    I am facing the same issue, even though I am using the right target reference sequence. 
    many variants lifted over but had mismatching reference alleles after liftover. Only about 30% variants were lifted over successfully. 

    When using bcftools 'norm' prior to LiftOver, it gives an error:

    Reference allele mismatch at chr1:743268 .. REF_SEQ:'C' vs VCF:'A' 
    Also getting: Contig 'chr1' is not defined in the header. (Quick workaround: index the file with tabix.) But I assume this has nothing to do with the failure to liftover.

    This is what the output of bcftools norm looks like:

    ##fileformat=VCFv4.2
    ##FILTER=<ID=CannotLiftOver,Description="Liftover of a variant that needed reverse-complementing failed for unknown reasons.">
    ##FILTER=<ID=IndelStraddlesMultipleIntevals,Description="Reference allele in Indel is straddling multiple intervals in the chain, and so the results are not well defined.">
    ##FILTER=<ID=MismatchedRefAllele,Description="Reference allele does not match reference genome sequence after liftover.">
    ##FILTER=<ID=NoTarget,Description="Variant could not be lifted between genome builds.">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##INFO=<ID=AttemptedAlleles,Number=1,Type=String,Description="The alleles of the variant in the TARGET prior to failing due to reference allele mismatching to the target reference.">
    ##INFO=<ID=AttemptedLocus,Number=1,Type=String,Description="The locus of the variant in the TARGET prior to failing due to reference allele mismatching to the target reference.">
    ##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
    ##contig=<ID=1,length=247177331>
    ##contig=<ID=10,length=135312574>
    ##contig=<ID=11,length=134433813>
    ##contig=<ID=12,length=132288870>

    ....

    And so on, Most of the variants have mismatched Ref Alleles.
    How should I interpret this? My vcf file is on build hg18, I checked with both hg18 and hg19 reference fasta, in case I was making a mistake, and there aren't any better results with either.
    Your help is appreciated.

    Thanks! 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk