Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Why do I get ‘Badly formed genome unclippedLoc: Contig chrY given as location, but this contig isn't present in the Fasta sequence dictionary’ when running GetPileupSummaries?

0

7 comments

  • Avatar
    Pamela Bretscher

    Hi Ruiqiao Bai,

    This error is likely occurring because of a mismatch in the way the chromosomes are labeled ("ChrY" vs just "Y"). Could you check in your ".dict" file to confirm the chromosome labeling? It's possible that the indexing got mismatched when you lifted over the reference sequence.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ruiqiao Bai

    Hi Pamela, thanks for your help! Since the  'lifted_small_exac_common_3.hg19.vcf.gz' file is lifted over from hg38 to hg19 using the 'hg19.fa' file (via the tool LiftoverVcf), I have checked the corresponding 'hg19.dict' file. Below is the line with chrY in it. I think there should be no problem? Please inform me if this is not the .dict file you want me to check.

    @SQ SN:chrY    LN:59373566    M5:1e86411d73e6f00a10590f976be01623    UR:file:/gatk/my_data/wgs_processing_facilitating_data/hg19.fa
    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Ruiqiao Bai,

    Thank you for check and providing this dictionary file, and you're correct that it does look like there shouldn't be a problem. I'm assuming that it may be your bam file that is missing the Y chromosome in its sequence dictionary rather than the VCF. When checking sequence dictionaries, GATK first checks the reference dictionary, then the reads sequence dictionary, then the dictionary from the features (which would be the VCF you are specifying). It appears that the issue may be in the reads sequence dictionary which is causing GATK to throw the error. You should be able to troubleshoot this problem by either providing a reference to GetPileUpSummaries or by repairing the header of the bam file itself. I hope this is helpful and please let me know if you have any questions.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ruiqiao Bai

    Thanks for your suggestion! I have tried to solve the problem by providing a reference to GetPileUpSummaries, but received the error 'A USER ERROR has occurred: Contig chrY not present in reads sequence dictionary'. Please see below for the specific command I have used and results I have received. I guess I have to repair the header of the bam file itself? May I consult which tool I can use to repair the bam file for solving this problem? 

     

    gatk GetPileupSummaries -I /gatk/my_data/wgs_BAM/addOrReplaceReadGroups/addOrReplaceReadGroups_LP6005115-DNA_A08.bam -L /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -V /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -O /gatk/my_data/wgs_BAM/step1_3/getpileupsummaries_LP6005115-DNA_A08.table -R /gatk/my_data/wgs_processing_facilitating_data/hg19.fa
    Using GATK jar /gatk/gatk-package-4.2.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.0.0-local.jar GetPileupSummaries -I /gatk/my_data/wgs_BAM/addOrReplaceReadGroups/addOrReplaceReadGroups_LP6005115-DNA_A08.bam -L /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -V /gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz -O /gatk/my_data/wgs_BAM/step1_3/getpileupsummaries_LP6005115-DNA_A08.table -R /gatk/my_data/wgs_processing_facilitating_data/hg19.fa
    23:51:33.641 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Sep 29, 2021 11:51:33 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    23:51:33.800 INFO GetPileupSummaries - ------------------------------------------------------------
    23:51:33.801 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.0.0
    23:51:33.801 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
    23:51:33.801 INFO GetPileupSummaries - Executing as root@c37181d09f74 on Linux v5.8.0-1039-azure amd64
    23:51:33.801 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
    23:51:33.801 INFO GetPileupSummaries - Start Date/Time: September 29, 2021 11:51:33 PM GMT
    23:51:33.801 INFO GetPileupSummaries - ------------------------------------------------------------
    23:51:33.801 INFO GetPileupSummaries - ------------------------------------------------------------
    23:51:33.802 INFO GetPileupSummaries - HTSJDK Version: 2.24.0
    23:51:33.802 INFO GetPileupSummaries - Picard Version: 2.25.0
    23:51:33.802 INFO GetPileupSummaries - Built for Spark Version: 2.4.5
    23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:51:33.802 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:51:33.803 INFO GetPileupSummaries - Deflater: IntelDeflater
    23:51:33.803 INFO GetPileupSummaries - Inflater: IntelInflater
    23:51:33.803 INFO GetPileupSummaries - GCS max retries/reopens: 20
    23:51:33.803 INFO GetPileupSummaries - Requester pays: disabled
    23:51:33.803 INFO GetPileupSummaries - Initializing engine
    23:51:34.274 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz
    23:51:34.344 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/wgs_processing_facilitating_data/hg38_to_hg19/lifted_small_exac_common_3.hg19.vcf.gz
    23:51:34.915 INFO IntervalArgumentCollection - Processing 59112 bp from intervals
    23:51:34.951 INFO GetPileupSummaries - Done initializing engine
    23:51:34.951 INFO ProgressMeter - Starting traversal
    23:51:34.952 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
    23:51:34.974 INFO GetPileupSummaries - Shutting down engine
    [September 29, 2021 11:51:34 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=465567744
    ***********************************************************************

    A USER ERROR has occurred: Contig chrY not present in reads sequence dictionary

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Ruiqiao Bai,

    I'm sorry that providing the reference was not successful. I would advise you to first make sure that the reads were aligned to the correct header that corresponds to the other files in your analysis. Then, you can use the ReplaceSamHeader tool which will allow to copy a header that you manually create in a dummy sam file into your bam file that is missing the appropriate dictionary files. I hope this helps and that this solution is successful in solving the error.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ruiqiao Bai

    Dear Pamela,

    Thank you so much for your suggestion! This solution works for me. Thanks!

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Ruiqiao Bai Of course! I'm glad I could help.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk