Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicates analysis of large wheat chromosomes

Answered
0

2 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi John,

    Could you share your entire command, stack trace, and GATK version? We did have a somewhat similar issue come up with the wheat genome and MarkDuplicates but I don't have enough information about your case to know if there is a good workaround.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Miriam Marin Sanz

    Hi Genevieve,

     

    We have also the same problem with the durum wheat genome.

    Warning messages:

    WARNING 2023-02-23 09:22:48     BAMRecordCodec  Reference length is too large for BAM bin field.
    WARNING 2023-02-23 09:22:48     BAMRecordCodec  Reads on references longer than 536870912bp will have bin set to 0.

     

    The command was:

    gatk MarkDuplicates -I sample.sorted.bam -M sample_dedup_metrics.txt -O sample_sorted_dedup.bam

    GATK version:

    The Genome Analysis Toolkit (GATK) v4.3.0.0
    HTSJDK Version: 3.0.1
    Picard Version: 2.27.5

     

    Thank you in advance,

    Miriam

    -1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk