Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Can I use LiftOver Picard for converting SNPs from one genomic build to another? If yes, how? If no, then what is the best alternate?

0

4 comments

  • Avatar
    Syeda Fatima Roshan Haneef

    I also want to add that these vcfs are mapped on the GRCh 36/hg18 and need to be remapped on hg38. 

    I have a specific list of SNPs 9according to the hg38) in a csv format which I need to filter from each of these vcfs after remapping. 

    Please suggest any alternate workflows if there are any to help me make this work. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Syeda Fatima Roshan Haneef

    These files are not of VCF type but a tabular format. VCF format requires 

    1- A header section. With Sequence dictionary (could be optional but it better be present)

    2- Mandatory immutably positioned columns 

    You might want to check the spec sheet to convert your tabular files to vcf format manually. 

    https://samtools.github.io/hts-specs/VCFv4.2.pdf 

    Once you have a proper format then you may liftover to appropriate version and select variants based on their positions using a bed file. 

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    Syeda Fatima Roshan Haneef

    Thank you so much for this. I assumed these were vcf files but they're not. Explains all the hassle.

    Would you suggest converting the tab files into csv and then writing them as vcf? I've tried directly however, it always gives an error (probably due to the formatting issues with these)

    Considering that converting them into csv would sort the columns into separate headers, is it a good approach? I would really appreciate your input on this and any additional suggestions you might have for a beginner. 

     

    Thank you so much for your help.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    Converting only tab separated files to VCF may not be a simple task since you don't have reference alleles ready for your conversion process. I would strongly suggest you to pick correct coordinates and reference alleles for those SNP IDs based on your genome of preference and later try performing a manual conversion if necessary.

    I hope this helps. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk