SplitNCigarReads: Contig ERCC-ERCC-00002 given as location, but this contig isn't present in the Fasta sequence dictionary
Hi,
I have scRNAseq data and I am trying to use SplitNCigarReads on the bam files- I didn't do the alignment myself, I received these bam files from the scRNAseq provides. So far, everything has been ok (ie adding read groups, marking duplicates...) but now I have encountered this issue that SplitNCigarReads gives this error:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig ERCC-00002 given as location, but this contig isn't present in the Fasta sequence dictionary
I am using GATK4.
My command is:
for fn in $(find /home/colpe/scratch/raw_data/marked_dup/ -name "*.bam")
do
gatk SplitNCigarReads -R /home/colpe/data/ref_sequences/Mus_musculus.GRCm38.dna.primary_assembly.fa -I "$fn" -O /home/colpe/scratch/raw_data/cigar_output/"$(basename $fn)"_sc.bam
done
I understand that there's an issue with ERCC-00002 - are these splice sites? What can I do about this?
-
Hi Cora Olpe
This problem is about reference vs bam incompatibility and unless your alignment files are 1:1 match with your reference fasta file our tools will not work properly and throw these errors. Unfortunately the only sane way to resolve this issue is to realign your reads to the reference you have or try to obtain the original reference that was used by the sequencing center that you received your bam files.
I hope this helps.
-
Hi Gökal Celik,
alright, that sounds like I can solve it, I've contacted the people for the original reference. Thank you!
Please sign in to leave a comment.
2 comments