STAR and GATK RNAseq based SNP detection
Dear all,
I have RNAseq reads from four plant cultivars. I want to identify SNPs using this material. The reads come from various experiments done with these cultivars. However, for the SNP discovery purpose I want to treat cultivar#1 as one sample, cultivar#2 as another. When I concatenate all reads from cultivar#1 and do the alignment to the reference genom with STAR I use this option
--outSAMattrRGline "ID:readgroup1 LB:lib1 SM:Veten PL:ILLUMINA PU:unit1".
The samtools view command confirms this readgroup. However, when I run gatk SortSam, this errer is thrown "Error parsing SAM header. @RG line missing SM tag."
I anticipate that this may cause trouble further down the pipeline so I would like to correct my apparent mistake.
Help is appreciated.
Thanks,
Jahn
REQUIRED for all errors and issues:
a) The Genome Analysis Toolkit (GATK) v4.4.0.0
HTSJDK Version: 3.0.5
Picard Version: 3.0.0
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu120.04.1)
OpenJDK 64-Bit Server VM (build 17.0.6+10-Ubuntu-0ubuntu120.04.1, mixed mode, sharing)
b) Exact command used:
First i cat two files:
cat VeMaRep1.R1.fq VeMaRep2.R1.fq >> Veten_R1.fq
cat VeMaRep1.R2.fq VeMaRep2.R2.fq >> Veten_R2.fq
Here is the Star command:
STAR \
--runThreadN 18 \
--genomeDir /home/kvijed/RubusSNPdetection/RubusGenomeMJ \
--readFilesIn /home/kvijed/RubusSNPdetection/Veten2/Veten_R1.fq /home/kvijed/RubusSNPdetection/Veten2/Veten_R2.fq \
--limitBAMsortRAM 7000000000 \
--outSAMtype BAM SortedByCoordinate \
--outFilterMultimapNmax 1 \
--outFileNamePrefix Veten2 \
--outSAMattrRGline "ID:readgroup1 LB:lib1 SM:Veten PL:ILLUMINA PU:unit1"
Then I do samtools:
samtools view -H Veten2Aligned.sortedByCoord.out.bam | grep '^@RG'
which gives:
@RG ID:readgroup1 LB:lib1 SM:Veten PL:ILLUMINA PU:unit1
Then I do:
gatk --java-options "-Xmx8g" SortSam \
-I Veten2Aligned.sortedByCoord.out.bam \
-O VetenSorted.bam \
-SO coordinate
Which apparently throws an error
c) Entire program log:
10:11:10.825 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/bin/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed May 03 10:11:10 CEST 2023] SortSam --INPUT Veten2Aligned.sortedByCoord.out.bam --OUTPUT VetenSorted.bam --SORT_ORDER coordinate --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Wed May 03 10:11:10 CEST 2023] Executing as kvijed@bioseq02 on Linux 5.4.0-147-generic amd64; OpenJDK 64-Bit Server VM 17.0.6+10-Ubuntu-0ubuntu120.04.1; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.4.0.0
[Wed May 03 10:11:10 CEST 2023] picard.sam.SortSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=125829120
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG ID:readgroup1 LB:lib1 SM:Veten PL:ILLUMINA PU:unit1; File /home/kvijed/RubusSNPdetection/Veten2/Veten2Aligned.sortedByCoord.out.bam; Line number 10
at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258)
at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:46)
at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:364)
at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:168)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:110)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:720)
at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:300)
at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:406)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:209)
at picard.sam.SortSam.doWork(SortSam.java:154)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:289)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /usr/local/bin/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8g -jar /usr/local/bin/gatk-package-4.4.0.0-local.jar SortSam -I Veten2Aligned.sortedByCoord.out.bam -O VetenSorted.bam -SO coordinate
-
Hello,
This is a very late reply to this question, but maybe it's still useful. I suspect what's happening here is that you are ending up with separating the fields in your header instead of tabs. SAM header lines are TAB delimited, not whitespace delimited. So either you accidentally used spaces instead of tabs in the headerline you specified or they were introduced somehow by STAR.
Please sign in to leave a comment.
1 comment