Why do I get ‘Badly formed genome unclippedLoc: Contig chrY given as location, but this contig isn't present in the Fasta sequence dictionary’ when running GetPileupSummaries?
Hi! I am using GATK4, and I have successfully used the GetPileupSummaries tool to deal with the data of one subject in my dataset, following the tutorial of Mutect2 to call somatic variants. However, when I switched to the data of another subject, I got the error ‘Badly formed genome unclippedLoc: Contig chrY given as location, but this contig isn't present in the Fasta sequence dictionary’. May I consult what’s the proper way to deal with it?
The command line I used and the entire error log are pasted below:
gatk GetPileupSummaries -I /gatk/my_data/wgs_BAM/addOrReplaceReadGroups/addOrReplaceReadGroups_LP6005115-DNA_A08.bam -L /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -V /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -O /gatk/my_data/wgs_BAM/step1_3/getpileupsummaries_LP6005115-DNA_A08.table
Using GATK jar /gatk/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.0.0-local.jar GetPileupSummaries -I /gatk/my_data/wgs_BAM/addOrReplaceReadGroups/addOrReplaceReadGroups_LP6005115-DNA_A08.bam -L /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -V /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -O /gatk/my_data/wgs_BAM/step1_3/getpileupsummaries_LP6005115-DNA_A08.table
08:08:41.829 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 26, 2021 8:08:42 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:08:42.023 INFO GetPileupSummaries - ------------------------------------------------------------
08:08:42.023 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
08:08:42.023 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
08:08:42.023 INFO GetPileupSummaries - Executing as root@83ed4dd59ed1 on Linux v5.8.0-1039-azure amd64
08:08:42.024 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
08:08:42.024 INFO GetPileupSummaries - Start Date/Time: September 26, 2021 8:08:41 AM GMT
08:08:42.024 INFO GetPileupSummaries - ------------------------------------------------------------
08:08:42.024 INFO GetPileupSummaries - ------------------------------------------------------------
08:08:42.025 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
08:08:42.025 INFO GetPileupSummaries - Picard Version: 2.25.0
08:08:42.025 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
08:08:42.025 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:08:42.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:08:42.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:08:42.025 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:08:42.025 INFO GetPileupSummaries - Deflater: IntelDeflater
08:08:42.025 INFO GetPileupSummaries - Inflater: IntelInflater
08:08:42.025 INFO GetPileupSummaries - GCS max retries/reopens: 20
08:08:42.025 INFO GetPileupSummaries - Requester pays: disabled
08:08:42.025 INFO GetPileupSummaries - Initializing engine
08:08:42.386 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz
08:08:42.458 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz
08:08:42.963 INFO GetPileupSummaries - Shutting down engine
[September 26, 2021 8:08:42 AM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=463470592
***********************************************************************
A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chrY given as location, but this contig isn't present in the Fasta sequence dictionary
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
-
Hi Ruiqiao Bai,
This error is likely occurring because of a mismatch in the way the chromosomes are labeled ("ChrY" vs just "Y"). Could you check in your ".dict" file to confirm the chromosome labeling? It's possible that the indexing got mismatched when you lifted over the reference sequence.
Kind regards,
Pamela
-
Hi Pamela, thanks for your help! Since the 'lifted_small_exac_common_3.hg19.vcf.gz' file is lifted over from hg38 to hg19 using the 'hg19.fa' file (via the tool LiftoverVcf), I have checked the corresponding 'hg19.dict' file. Below is the line with chrY in it. I think there should be no problem? Please inform me if this is not the .dict file you want me to check.
@SQ SN:chrY LN:59373566 M5:1e86411d73e6f00a10590f976be01623 UR:file:/gatk/my_data/wgs_processing_facilitating_data/hg19.fa
-
Hi Ruiqiao Bai,
Thank you for check and providing this dictionary file, and you're correct that it does look like there shouldn't be a problem. I'm assuming that it may be your bam file that is missing the Y chromosome in its sequence dictionary rather than the VCF. When checking sequence dictionaries, GATK first checks the reference dictionary, then the reads sequence dictionary, then the dictionary from the features (which would be the VCF you are specifying). It appears that the issue may be in the reads sequence dictionary which is causing GATK to throw the error. You should be able to troubleshoot this problem by either providing a reference to GetPileUpSummaries or by repairing the header of the bam file itself. I hope this is helpful and please let me know if you have any questions.
Kind regards,
Pamela
-
Thanks for your suggestion! I have tried to solve the problem by providing a reference to GetPileUpSummaries, but received the error 'A USER ERROR has occurred: Contig chrY not present in reads sequence dictionary'. Please see below for the specific command I have used and results I have received. I guess I have to repair the header of the bam file itself? May I consult which tool I can use to repair the bam file for solving this problem?
gatk GetPileupSummaries -I /gatk/my_data/wgs_BAM/addOrReplaceReadGroups/addOrReplaceReadGroups_LP6005115-DNA_A08.bam -L /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -V /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -O /gatk/my_data/wgs_BAM/step1_3/getpileupsummaries_LP6005115-DNA_A08.table -R /gatk/my_data/wgs_processing_facilitating_data/hg19.fa
Using GATK jar /gatk/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.0.0-local.jar GetPileupSummaries -I /gatk/my_data/wgs_BAM/addOrReplaceReadGroups/addOrReplaceReadGroups_LP6005115-DNA_A08.bam -L /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -V /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -O /gatk/my_data/wgs_BAM/step1_3/getpileupsummaries_LP6005115-DNA_A08.table -R /gatk/my_data/wgs_processing_facilitating_data/hg19.fa
23:51:33.641 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 29, 2021 11:51:33 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
23:51:33.800 INFO GetPileupSummaries - ------------------------------------------------------------
23:51:33.801 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
23:51:33.801 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
23:51:33.801 INFO GetPileupSummaries - Executing as root@c37181d09f74 on Linux v5.8.0-1039-azure amd64
23:51:33.801 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
23:51:33.801 INFO GetPileupSummaries - Start Date/Time: September 29, 2021 11:51:33 PM GMT
23:51:33.801 INFO GetPileupSummaries - ------------------------------------------------------------
23:51:33.801 INFO GetPileupSummaries - ------------------------------------------------------------
23:51:33.802 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
23:51:33.802 INFO GetPileupSummaries - Picard Version: 2.25.0
23:51:33.802 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:51:33.803 INFO GetPileupSummaries - Deflater: IntelDeflater
23:51:33.803 INFO GetPileupSummaries - Inflater: IntelInflater
23:51:33.803 INFO GetPileupSummaries - GCS max retries/reopens: 20
23:51:33.803 INFO GetPileupSummaries - Requester pays: disabled
23:51:33.803 INFO GetPileupSummaries - Initializing engine
23:51:34.274 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz
23:51:34.344 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz
23:51:34.915 INFO IntervalArgumentCollection - Processing 59112 bp from intervals
23:51:34.951 INFO GetPileupSummaries - Done initializing engine
23:51:34.951 INFO ProgressMeter - Starting traversal
23:51:34.952 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
23:51:34.974 INFO GetPileupSummaries - Shutting down engine
[September 29, 2021 11:51:34 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=465567744
***********************************************************************A USER ERROR has occurred: Contig chrY not present in reads sequence dictionary
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. -
Hi Ruiqiao Bai,
I'm sorry that providing the reference was not successful. I would advise you to first make sure that the reads were aligned to the correct header that corresponds to the other files in your analysis. Then, you can use the ReplaceSamHeader tool which will allow to copy a header that you manually create in a dummy sam file into your bam file that is missing the appropriate dictionary files. I hope this helps and that this solution is successful in solving the error.
Kind regards,
Pamela
-
Dear Pamela,
Thank you so much for your suggestion! This solution works for me. Thanks!
-
Ruiqiao Bai Of course! I'm glad I could help.
Please sign in to leave a comment.
7 comments