Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicatesSpark Error

0

1 comment

  • Avatar
    Louis Bergelson

    Hi Yosra Bejaoui

    I can't tell exactly what's happening here.  It looks like it's failing during a serialization operation, but it seems like the stacktrace is cutoff somehow.  However, the line it's referencing in the part of the stacktrace that is there (SAMRecordSparkCodec.java:114) is a line where an exception is thrown due to a problem in the input data

    throw new RuntimeException("Mismatch between read length and quals length writing read " +
    alignment.getReadName() + "; read length: " + alignment.getReadLength() +
    "; quals length: " + alignment.getBaseQualities().length);

    I can't say for sure but I suspect you have a read which is either missing quality scores or has an error in the qualiy scores which makes there be too few or two many compared to the number of bases.

    If you can identify the problematic reads that would clarify things.  If it's just a few reads with that problem you could try applying a read filter to remove them.

    --read-filter MatchingBasesAndQualsReadFilter 

    should remove any reads with mismatches

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk