Converting Reference Genomes from b37 to hg19
I am new to bioinformatics. So this might be a noob question. But, if I want to convert bed files aligned to b37 to hg19, what is the best tool to do that ?
I have been using liftover for conversion, but I am not sure how to find the liftover chain files to convert bed files in b37 co-ordinates to hg19 co-ordinates. Could you please give me the documentation for that ?
-
Maybe this will help you:
In addition you should read this:
For these builds, the primary assembly coordinates are identical for the original release but patch updates were different. In addition, the naming conventions of the references differ, e.g. the use of
chr1
(in hg19) versus1
(in b37) to indicate chromosome 1, and chrM vs. MT for the mitochondrial genome. Included decoys were also different. So it is possible to lift-over resources from one to the other, but it should be done using Picard LiftoverVcf with the appropriate chain files. Trying to convert between them just by renaming contigs is a bad idea. And in the case of BAMs, well, the bad news is that if you have a BAM aligned to one reference build but you need the other, you'll have to re-map the data from scratch. -
Thank you for your contribution woodword!
-
Apparently there is no hg19 to b37/HumanG1Kv37 chain file? Only the other way around b37tohg19.
-
Hi Brian,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
I found this: https://github.com/broadgsa/gatk/blob/master/public/chainFiles/b37tohg19.chain
However this chainfile only has information of chromossome 1 to X and not the mitochondrial and other regions... Can this be used? -
Manuel Sérgio Sokolov Ravasqueira
The chain file you linked does actually contain mappings for `Y` and `MT`.
The article I wrote about the reference discrepancies can also be referenced: https://gatk.broadinstitute.org/hc/en-us/articles/360035890711
The short version is you should use a chain file here. There are sequence differences between B37 and HG19.
Please sign in to leave a comment.
6 comments