Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

LiftoverVcf: hg19 to hg38 all variants mismatched

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Ken Hanscombe, what was the original reference used to align your file?

    0
    Comment actions Permalink
  • Avatar
    Ken Hanscombe

    Hi Genevieve Brandt (she/her),

    From UKB documentation:

    Genotypes were imputed into the dataset using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increased the number of testable variants over 100-fold to ~96 million variants, which are stored in the compressed and indexed BGENv1.2 format. The imputed genotypes are aligned to the + strand of the reference and the positions are in GRCh37 coordinates.

    From other UKB documentation:

    The alleles in the imputation are aligned with REF/ALT, first_allele is the ref allele on the fwd strand.

    From Bycroft et al. 2018:

    We used the Haplotype Reference Consortium (HRC) data as the main imputation reference panel (...) We also imputed the UK Biobank using the merged UK10K and 1000 Genomes phase 3 reference panels, which has 87,696,888 bi-allelic markers. We combined this imputed data with that from the HRC panel, using the HRC imputation when a SNP was present in both panels. (...) The SNP database (dbSNP) refer- ence SNP (rs) IDs were assigned to as many markers as possible using reference SNP ID lists available from the UCSC genome annotation database for the GRCh37 assembly of the human genome (http:// hgdownload.cse.ucsc.edu/goldenpath/hg19/database/)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Ken Hanscombe,

    The LiftOver tool can only work if the chain file you use matches the original reference that was used for the VCF. From what you wrote above, it looks like you used GRCh37, which should be similar to hg19. There is more information about reference versions at this link.

    If you are getting an error message that there is "no target", that is most likely referring to your -R reference file not matching your chain file. The -R option should be the target, or the new, reference version.

    You should also check your VCFs, chain file, and reference naming conventions to verify that the naming is consistent so that the LiftOver tool will work. 

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk