Hello. I am trying to map MiSeq reads to a reference genome and extract mutations using MToolBox, which implements gsnap, GATK, Picard, and other tools. When running the tool with example data, there were no errors, so I believe the tool is installed correctly. However, when I used the data sequenced with MiSeq for this run, I encountered the following error. Specifically, the OUTPUT file (OUT.sam.bam) from SORTING OUT.sam FILES WITH PICARDTOOLS was not generated. How should I address this issue? Could it be a problem with the sequence data?
REQUIRED for all errors and issues:
a) GATK version used:
3.8
b) Exact command used:
SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate
c) Entire program log:
setting up MToolBox environment variables...
...done
setting up MToolBox variables in config file ...
...done
pc_GXL will be used as vcf file name...
Check python version... (2.7 required)
OK.
Checking files to be used in MToolBox execution...
Checking mapExome parameters...
OK.
Checking assembleMTgenome parameters...
OK.
Checking mt-classifier parameters...
OK.
GenomeAnalysisTK.jar found. Continue
Input type is fastq.
output files will be placed in /home/User/MTDNA-prospective/pc_GXL/out
##### EXECUTING READ MAPPING WITH MAPEXOME...
mapExome for sample pc-mtGXL, files found: pc-mtGXL.R1.fastq.gz pc-mtGXL.R2.fastq.gz
Mapping onto mtDNA...
/lustre7/home/User/MToolBox-1.2.1/MToolBox/bin/gmap/bin/gsnap -D /lustre7/home/User/MToolBox-1.2.1/MToolBox/gmapdb/ --gunzip -d chrM -A sam --nofails --pairmax-dna=500 --query-unk-mismatch=1 --read-group-id=sample --read-group-name=sample --read-group-library=sample --read-group-platform=sample -n 1 -Q -O -t 8 pc-mtGXL.R1.fastq.gz pc-mtGXL.R2.fastq.gz > /home/User/MTDNA-prospective/pc_GXL/out/OUT_pc-mtGXL/outmt.sam 2> /home/User/MTDNA-prospective/pc_GXL/out/OUT_pc-mtGXL/logmt.txt
Extracting FASTQ from SAM...
Mapping onto complete human genome...single reads
Mapping onto complete human genome...pair reads
Reading Results...
Filtering reads...
Outfile saved on /home/User/MTDNA-prospective/pc_GXL/out/OUT_pc-mtGXL/OUT.sam.
Done.
SAM files post-processing...
##### SORTING OUT.sam FILES WITH PICARDTOOLS...
[Thu May 16 12:23:08 JST 2024] net.sf.picard.sam.SortSam INPUT=OUT.sam OUTPUT=OUT.sam.bam SORT_ORDER=coordinate TMP_DIR=[/home/User/MTDNA-prospective/pc_GXL/out/OUT_pc-mtGXL/tmp] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu May 16 12:23:08 JST 2024] Executing as User@at138 on Linux 5.15.0-87-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_392-b08; Picard version: 1.98(1547)
[Thu May 16 12:23:08 JST 2024] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2058354688
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Read name M02699:57:000000000-LG9CK:1:1119:16584:12259, CIGAR M operator maps off end of reference
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at net.sf.samtools.SAMRecord.getCigar(SAMRecord.java:606)
at net.sf.samtools.SAMRecord.getCigarLength(SAMRecord.java:617)
at net.sf.samtools.SAMRecord.isValid(SAMRecord.java:1599)
at net.sf.samtools.SAMLineParser.parseLine(SAMLineParser.java:328)
at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:237)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:225)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:201)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
at net.sf.picard.sam.SortSam.doWork(SortSam.java:68)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.SortSam.main(SortSam.java:57)
Success.
samtools index: "OUT.sam.bam" is in a format that cannot be usefully indexed
##### REALIGNING KNOWN INDELS WITH GATK INDELREALIGNER...
Realigning known indels for file OUT_pc-mtGXL/OUT.sam.bam using /home/User/MToolBox-1.2.1/MToolBox/data/MITOMAP_HMTDB_known_indels.chrM as reference...
INFO 12:23:11,978 HelpFormatter - ------------------------------------------------------------------------------------
INFO 12:23:11,984 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-1-0-gf15c1c3ef, Compiled 2018/02/19 05:43:50
INFO 12:23:11,984 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 12:23:11,984 HelpFormatter - [Thu May 16 12:23:11 JST 2024] Executing on Linux 5.15.0-87-generic amd64
INFO 12:23:11,984 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_392-b08
INFO 12:23:11,986 HelpFormatter - Program Args: -U ALLOW_N_CIGAR_READS -T IndelRealigner -R /home/User/MToolBox-1.2.1/MToolBox//data/chrM.fa -I OUT.sam.bam -o OUT.realigned.bam -targetIntervals /home/User/MToolBox-1.2.1/MToolBox//data/intervals_file_chrM.list -known /home/User/MToolBox-1.2.1/MToolBox//data/MITOMAP_HMTDB_known_indels_chrM.vcf -compress 0
INFO 12:23:11,993 HelpFormatter - Executing as User@at138 on Linux 5.15.0-87-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_392-b08.
INFO 12:23:11,994 HelpFormatter - Date/Time: 2024/05/16 12:23:11
INFO 12:23:11,994 HelpFormatter - ------------------------------------------------------------------------------------
INFO 12:23:11,994 HelpFormatter - ------------------------------------------------------------------------------------
INFO 12:23:12,139 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre7/home/User/MToolBox-1.2.1/MToolBox/ext_tools/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so
INFO 12:23:12,711 GenomeAnalysisEngine - Deflater: IntelDeflater
INFO 12:23:12,711 GenomeAnalysisEngine - Inflater: IntelInflater
INFO 12:23:12,711 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:23:12,962 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 12:23:12,967 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 12:23:12,985 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR ------------------------------------------------------------------------------------------
The last process reported an error. Exit.
2 comments