bwa index doesn't create .alt file in hg38
AnsweredWe are using Gatk 4.1.4.1 to call somatic variant in the only tumor mode in hg38. We performed sequencing through an amplicon-based targeted gene panel.
Due to the presence of another copy of a gene in the reference genome hg38, we masked that region starting from .fasta supplied in the cloud bucket.
Then we create the index with
bwa index -a *.fasta
as suggested in https://gatk.broadinstitute.org/hc/en-us/articles/360039568932--How-to-Map-and-clean-up-short-read-sequence-data-efficiently#step3B
Unfortunately, the command did not create the .alt file. As a consequence, Mutect2 did not call variants in HRAS gene because the reads map also in an alternative locus and MAPQ became 0 due to multi-mapping.
How do we can create .alt file of a masked genome?
Thanks for the help
Mat
-
Hi MatZ, could you clarify your question? In the document you linked to, there is no mention of an alt file:
- BWA alignment requires an indexed reference genome file. Indexing is specific to algorithms. To index the human genome for BWA, we apply BWA's
index
function on the reference genome file, e.g.human_g1k_v37_decoy.fasta
. This produces five index files with the extensionsamb
,ann
,bwt
,pac
andsa
.
- BWA alignment requires an indexed reference genome file. Indexing is specific to algorithms. To index the human genome for BWA, we apply BWA's
-
That's the point. In the documents there is no mention of an alt file but in the file processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json there is.
There are references about all index files, .alt included
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt","PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa","PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb","PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt","PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann","PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac",Our question is: how is the correct way to generate this file in hg38 masked genome? -
MatZ are you running one of our featured workspaces? If so, which one?
-
No, we downloaded and installed GATK locally.
For pre-processing/alignment step we used this:
https://github.com/gatk-workflows/gatk4-data-processing/archive/master.zip
Into the file folder there is the file "processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json"
Into the file there is the variable "PreProcessingForVariantDiscovery_GATK4.ref_alt" that points to gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt
All the others index files were generated by bwa index command starting from hg38 masked fastq but .alt file no
How we can generate this .alt file from a genome slightly different from yours (because masked in a region)?
Thanks for the help
Mat
-
Hi MatZ, we are continuing to look into this and will let you know when we have more information.
-
MatZ have you seen our documentation on mapping to alternate contigs? There is a section in that document regarding the alt file. Also, please see the BWA documentation:
-
I encountered the same problem. I created a mixed human-mouse genome to map reads from PDX samples and I'm struggling to find a way to create the .alt index file. I'm a bit confused that both documentations state the importance of an .alt index file but neither of them explain how to generate it.
-
Hi dario_romagnoli,
In the GATK tutorial, we are using a mini GRCh38 alt file, subset from the one that is available in bwakit. The GATK team does not develop bwa, so the documentation on how to build the alt file is not in our domain. I did find some small leads when looking around google, so that is probably your best bet for now. You can also discuss with other users here and help each other out.
We will soon be releasing DRAGEN-GATK which will contain DRAGMAP. DRAGMAP is based off bwa but has improved the alt handling. Stay tuned for updates to see if it will work for your use case as well.
Best,
Genevieve
Please sign in to leave a comment.
8 comments