Question about bam format errors and handling large contigs
REQUIRED for all errors and issues:
a) GATK version used: gatk/4.1.4.1; picard/2.6.0; samtools/1.12; bwa/0.7.17
b) Exact command used:
bwa mem -t 12 -R ${RG} ${REF} ${R1} ${R2} | samtools sort -o ${outdir}/${RGID}.sort.mapped.bam -
samtools index -c ${outdir}/${RGID}.sort.mapped.bam
samtools merge -b ${LIST} ${dir}/${ID}.merged.bam
samtools index -c ${dir}/${FILE}
java -jar -Xmx32G ${picard} MarkDuplicates \
INPUT=${dir}/${FILE} \
OUTPUT=${dir}/${ID}.merged.nd.bam \
METRICS_FILE=${dir2}/${ID}.metrics.txt TMP_DIR=/data/gpfs/projects/punim1525/Projects/PSCO-genome/SNP-chip/tmp
java -jar ${picard} ValidateSamFile \
I=${dir}/${FILE} \
MODE=SUMMARY
c) Entire program log:
Hello, I'm applying GATK Best Practices recommendations for mapping and variant calling of frog sequence data. I've seen several posts online that contig size can impact steps such as "Picard MarkDuplicates" and "GATK HaplotypeCaller". Is it necessary for me to split the chromosomes to avoid downstream issues? Here are the lengths of the longest chromosomes:
1 1,251,518,389
2 1,055,985,148
3 808,540,379
4 947,253,395
5 852,354,006
6 846,894,890
Secondly, I got the error "Insert size out of range" when I ran Picard MarkDuplicates so I ran Picard ValidateSamFile to check the file format. These are the errors I got:
ERROR:INVALID_INDEXING_BIN 42274067
ERROR:INVALID_INSERT_SIZE 496514
ERROR:INVALID_VERSION_NUMBER 1
Do you know why these errors occurred and how they can be fixed? Does it have something to do with the large contig size? I had to create a .csi index for the bams due to the large contigs.
-
Hi Tiffany Kosch,
The requirement of chromosomes being less than 2^31 bases is required for bams. The requirement of being less than 2^29 bases (around 512 million base pairs) is to be able to create a bai index. So you should be able to use these chromosomes "as is", but you will need to use csi indices (it looks like you may have already figured this out based on your samtools calls).
Many of the issues you are seeing may be related to the very old version of picard you are using. CSI indices were not supported until htsjdk 2.19.0. I would suggest updating to a much newer version of picard (3.0.0 if you are ok moving to java 17, or 2.27.5 if not) and seeing if that fixes the issues. -
Hello Chris Kachulis,
Thanks very much for your response and for answering my question about chromosome size limits.
I just ran a test with a newer version of Picard and it started up great so I think the Picard version was what was causing the issue.
Please sign in to leave a comment.
2 comments