Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ExomeSingleSample wdl

0

1 comment

  • Avatar
    Laura Gauthier

    Hi Sheryl,

    We do have a GRCh37 set of those files in the same gs://gcp-public-data--broad-references bucket, but at a slightly different path: gs://gcp-public-data--broad-references/hg19/v0. I see the contamination resources, but I don't see the exome lists.  You can use Picard'sLiftOverIntervallist (you can run it from GATK) to lift the hg38 versions back to hg19:

    java -jar gatk.jar LiftOverIntervalList \
    I=input.interval_list \
    O=output.interval_list \
    SD=hg38_reference_sequence.dict \
    CHAIN=build.chain

      You can get the chain file from 

     wget --timestamping 
            'ftp://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz' 
            -O hg38ToHg19.over.chain.gz

    I believe that you'll need to remove the 'chr' prefix from the hg19 target contig names for everything to get along with the other Broad GRCh37 resources. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk