Error in MarkDuplicates
Hi,
I encountered a problem while using MarkDuplicates.
Here is how I did it.
## load picard
module load apps/binapps/picard/3.0
## markduplicates syntax
java -jar $PICARD_JAR/picard.jar MarkDuplicates \
I=/mnt/iusers01/jw01/c02544na/training/dummy/AB1_gencode44_trimmed_TruSeq3_readcountAligned.sortedByCoord.out.bam \
O=AB1_marked_duplicates.bam \
M=AB1_marked_dup_metrics.txt
Then, I received the following error message
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "htsjdk.samtools.SAMReadGroupRecord.getReadGroupId()" because the return value of "htsjdk.samtools.SAMRecord.getReadGroup()" is null
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:558)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:270)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:280)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:105)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:115)
What should I do to fix this?
Many thanks
-
Can you check whether your bam files contain proper @RG entries within the header section?
You can use
samtools view -H
-
I'd use instead
samtools view -H <your bam file> | grep '@RG'
If nothing appears add 'read group' (@RG) by e.g.
samtools addreplacerg -r "@RG\tID:RG1\tSM:SampleName\tPL:Illumina\tLB:Library.fa" -o <output.bam> <input.bam>
Worked in my case. BTW, it should be mentioned in the manual (https://gatk.broadinstitute.org/hc/en-us/articles/27007990006555-MarkDuplicates-Picard) because there is not any intuitive explanation why is it obligatory in case when you have just one set of reads.
Please sign in to leave a comment.
2 comments