Struggling to assign read groups with AddOrReplaceReadgroups
Hi,
I've spent hours trying to get AddOrReplaceReadgroups to work but I keep getting the same error so I'm hoping to find help here.
I am using GATK4
I have a file called
plate1_rg_fields.txt
which looks like this:
cat plate1_rg_fields.txt | while read SAMPLE ID LB PL PU SM
do
gatk AddOrReplaceReadGroups --INPUT "$SAMPLE".bam --OUTPUT "$SAMPLE".rg.bam --RGID "$ID" --RGLB "$LB" --RGPL "$PL" --RGPU "$PU" --RGSM "$SM"
done
The tool starts and I can see this in my log.out file
it starts like this:
Using GATK jar /data/colpe/conda/envs/gatk_env/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data/colpe/conda/envs/gatk_env/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar AddOrReplaceReadGroups --INPUT UZH-CO-v001_AGACTCGT.bam --OUTPUT UZH-CO-v001_AGACTCGT.rg.bam --RGID 1 --RGLB lib1 --RGPL ILLUMINA --RGPU unit1 --RGSM 1
But then the error I get is:
', doesn't.of tags in a SAM header must adhere to the regular expression '^[ -~]+$',but the value provided for RGSM, '1
Tool returned:
1
I don't understand this, as I specified the nr 1 as my RGSM so that should adhere to the required regex, no? The odd thing is, when I try just one sample it works (ie the first line of my tab delimited file) but when I then run it with all samples I get this error and no files are written.
I would really appreciate some help, thank you!
-
Hi Cora Olpe
Since single sample runs work fine but in a loop you are getting this error this is most likely not a GATK problem but a GNU tool read issue. It is possible that the line ending char \n or \r\n are getting in the way of SM when it is read from the stream. Is it possible for you to add another character seperated by tab to each line after SM column and try again? Alternatively you may add a middle step to trim whitespaces or line ending characters at each line as cat reads the file.
I hope this works for you.
-
Hi Gökalp Celik,
thanks so much, the extra character with tab solved it!
best wishes
Cora
Please sign in to leave a comment.
2 comments