Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

bwa index doesn't create .alt file in hg38

Answered
1

8 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi MatZ, could you clarify your question? In the document you linked to, there is no mention of an alt file:

    • BWA alignment requires an indexed reference genome file. Indexing is specific to algorithms. To index the human genome for BWA, we apply BWA's index function on the reference genome file, e.g. human_g1k_v37_decoy.fasta. This produces five index files with the extensions ambannbwtpac and sa.
    0
    Comment actions Permalink
  • Avatar
    MatZ

    That's the point. In the documents there is no mention of an alt file but in the file processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json there is.

    There are references about all index files, .alt included

     

    "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt",
      "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa",
      "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb",
      "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt",
      "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann",
      "PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac",
     
     
    Our question is: how is the correct way to generate this file in hg38 masked genome?
     
     
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    MatZ are you running one of our featured workspaces? If so, which one?

    0
    Comment actions Permalink
  • Avatar
    MatZ

    No, we downloaded and installed GATK locally.

    For pre-processing/alignment step we used this:

    https://github.com/gatk-workflows/gatk4-data-processing/archive/master.zip

    Into the file folder there is the file "processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json"

    Into the file there is the variable "PreProcessingForVariantDiscovery_GATK4.ref_alt" that points to gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt

     

    All the others index files were generated by bwa index command starting from hg38 masked fastq but .alt file no

    How we can generate this .alt file from a genome slightly different from yours (because masked in a region)?

     

    Thanks for the help

    Mat

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi MatZ, we are continuing to look into this and will let you know when we have more information.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    MatZ have you seen our documentation on mapping to alternate contigs? There is a section in that document regarding the alt file. Also, please see the BWA documentation:

    0
    Comment actions Permalink
  • Avatar
    dario_romagnoli

    I encountered the same problem. I created a mixed human-mouse genome to map reads from PDX samples and I'm struggling to find a way to create the .alt index file. I'm a bit confused that both documentations state the importance of an .alt index file but neither of them explain how to generate it.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi dario_romagnoli,

    In the GATK tutorial, we are using a mini GRCh38 alt file, subset from the one that is available in bwakit. The GATK team does not develop bwa, so the documentation on how to build the alt file is not in our domain. I did find some small leads when looking around google, so that is probably your best bet for now. You can also discuss with other users here and help each other out. 

    We will soon be releasing DRAGEN-GATK which will contain DRAGMAP. DRAGMAP is based off bwa but has improved the alt handling. Stay tuned for updates to see if it will work for your use case as well.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk