MergeBamAlignment produces unknown error
AnsweredI am using MergeBamAlignment:
```
java -Dsamjdk.compression_level=5 -Xms3000m -Xmx3200m -jar /n/data1/hms/dbmi/park/alon/software/picard.jar \
MergeBamAlignment \
VALIDATION_STRINGENCY=SILENT \
EXPECTED_ORIENTATIONS=FR \
ATTRIBUTES_TO_RETAIN=X0 \
ALIGNED_BAM=/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/-660116835/BOT3006-1-WES.unmerged.bam \
UNMAPPED_BAM=/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/1094665884/BOT3006-1-WES.bam \
OUTPUT=BOT3006-1-WES.aligned.unsorted.bam \
REFERENCE_SEQUENCE=/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \
PAIRED_RUN=true \
SORT_ORDER="unsorted" \
IS_BISULFITE_SEQUENCE=false \
ALIGNED_READS_ONLY=false \
CLIP_ADAPTERS=false \
MAX_RECORDS_IN_RAM=2000000 \
ADD_MATE_CIGAR=true \
MAX_INSERTIONS_OR_DELETIONS=-1 \
PRIMARY_ALIGNMENT_STRATEGY=MostDistant \
PROGRAM_RECORD_ID="bwamem" \
PROGRAM_GROUP_VERSION="0.7.17-r1188" \
PROGRAM_GROUP_COMMAND_LINE="bwa mem -K 100000000 -p -v 3 -t 4 -Y $bash_ref_fasta" \
PROGRAM_GROUP_NAME="bwamem" \
UNMAPPED_READ_STRATEGY=COPY_TO_TAG \
ALIGNER_PROPER_PAIR_FLAGS=true \
UNMAP_CONTAMINANT_READS=true
However, I am getting this error:
ERROR: Invalid argument 'mem'.
I do not see anything where I inputted an argument named `mem`. Why would this be occurring?
-
Hi vctrymao,
There might be an issue with how you have submitted your PROGRAM_GROUP_COMMAND_LINE, is there an extra character that is causing bwa and mem to be read as separate?
Could you share the entire stack trace?
Best,
Genevieve
-
I don't think so? The command as written was executed. I also have run this script before on other samples with no issue.
There isn't really a stack trace, this occurs before anything really runs:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/n/data1/hms/dbmi/park/victor/Doga/INFORM_trial/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/bffc3946-75a1-4284-bad9-b70fd0808bbb/call-MergeBamAlignment/shard-0/tmp.2b5c9f68
ERROR: Invalid argument 'mem'.
USAGE: MergeBamAlignment [options]
Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#MergeBamAlignment
Merge alignment data from a SAM or BAM with data in an unmapped BAM file.
Summary
A command-line tool for merging BAM/SAM alignment info from a third-party aligner with the data in an unmapped BAM file,
producing a third BAM file that has alignment data (from the aligner) and all the remaining data from the unmapped BAM.
Quick note: this is not</b> a tool for taking multiple sam files and creating a bigger file by merging them. For that
use-case, see {@link MergeSamFiles}.
Details
Many alignment tools (still!) require fastq format input. The unmapped bam may contain useful information that will be
lost in the conversion to fastq (meta-data like sample alias, library, barcodes, etc., and read-level tags.)
This tool takes an unaligned bam with meta-data, and the aligned bam produced by calling {@link SamToFastq} and then
passing the result to an aligner/mapper. It produces a new SAM file that includes all aligned and unaligned reads and
also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process
of converting to fastq). The resulting file will be valid for use by Picard and GATK tools.
The output may be coordinate-sorted, in which case the tags, NM, MD, and UQ will be calculated and populated, or
query-name sorted, in which case the tags will not be calculated or populated.
Usage example:
java -jar picard.jar MergeBamAlignment \
ALIGNED=aligned.bam \
UNMAPPED=unmapped.bam \
O=merge_alignments.bam \
R=reference_sequence.fasta
Caveats
This tool has been developing for a while and many arguments have been added to it over the years. You may be
particularly interested in the following (partial) list:
- CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters
- IS_BISULFITE_SEQUENCE -- Whether the sequencing originated from bisulfite sequencing, in which case NM will be
calculated differently
- ALIGNER_PROPER_PAIR_FLAGS -- Use if the aligner that was used cannot be trusted to set the "Proper pair" flag and then
the tool will set this flag based on orientation and distance between pairs.
- ADD_MATE_CIGAR -- Whether to use this opportunity to add the MC tag to each read.
- UNMAP_CONTAMINANT_READS (and MIN_UNCLIPPED_BASES) -- Whether to identify extremely short alignments (with clipping on
both sides) as cross-species contamination and unmap the reads.
Version: 2.18.3-SNAPSHOT
Options:
--help
-h Displays options specific to this tool.
--stdhelp
-H Displays options specific to this tool AND options common to all Picard command line
tools.
--version Displays program version.
UNMAPPED_BAM=File
UNMAPPED=File Original SAM or BAM file of unmapped reads, which must be in queryname order. Required.
ALIGNED_BAM=File
ALIGNED=File SAM or BAM file(s) with alignment data. Default value: null. This option may be specified
0 or more times. Cannot be used in conjuction with option(s) READ1_ALIGNED_BAM
(R1_ALIGNED) READ2_ALIGNED_BAM (R2_ALIGNED)
READ1_ALIGNED_BAM=File
R1_ALIGNED=File SAM or BAM file(s) with alignment data from the first read of a pair. Default value:
null. This option may be specified 0 or more times. Cannot be used in conjuction with
option(s) ALIGNED_BAM (ALIGNED)
READ2_ALIGNED_BAM=File
R2_ALIGNED=File SAM or BAM file(s) with alignment data from the second read of a pair. Default value:
null. This option may be specified 0 or more times. Cannot be used in conjuction with
option(s) ALIGNED_BAM (ALIGNED)
OUTPUT=File
O=File Merged SAM or BAM file to write to. Required.
PROGRAM_RECORD_ID=String
PG=String The program group ID of the aligner (if not supplied by the aligned file). Default value:
null.
PROGRAM_GROUP_VERSION=String
PG_VERSION=String The version of the program group (if not supplied by the aligned file). Default value:
null.
PROGRAM_GROUP_COMMAND_LINE=String
PG_COMMAND=String The command line of the program group (if not supplied by the aligned file). Default
value: null.
PROGRAM_GROUP_NAME=String
PG_NAME=String The name of the program group (if not supplied by the aligned file). Default value: null.
PAIRED_RUN=Boolean
PE=Boolean DEPRECATED. This argument is ignored and will be removed. Default value: true. This
option can be set to 'null' to clear the default value. Possible values: {true, false}
JUMP_SIZE=Integer
JUMP=Integer The expected jump size (required if this is a jumping library). Deprecated. Use
EXPECTED_ORIENTATIONS instead Default value: null. Cannot be used in conjuction with
option(s) EXPECTED_ORIENTATIONS (ORIENTATIONS)
CLIP_ADAPTERS=Boolean Whether to clip adapters where identified. Default value: true. This option can be set to
'null' to clear the default value. Possible values: {true, false}
IS_BISULFITE_SEQUENCE=Boolean Whether the lane is bisulfite sequence (used when calculating the NM tag). Default value:
false. This option can be set to 'null' to clear the default value. Possible values:
{true, false}
ALIGNED_READS_ONLY=Boolean Whether to output only aligned reads. Default value: false. This option can be set to
'null' to clear the default value. Possible values: {true, false}
MAX_INSERTIONS_OR_DELETIONS=Integer
MAX_GAPS=Integer The maximum number of insertions or deletions permitted for an alignment to be included.
Alignments with more than this many insertions or deletions will be ignored. Set to -1 to
allow any number of insertions or deletions. Default value: 1. This option can be set to
'null' to clear the default value.
ATTRIBUTES_TO_RETAIN=String Reserved alignment attributes (tags starting with X, Y, or Z) that should be brought over
from the alignment data when merging. Default value: null. This option may be specified 0
or more times.
ATTRIBUTES_TO_REMOVE=String Attributes from the alignment record that should be removed when merging. This overrides
ATTRIBUTES_TO_RETAIN if they share common tags. Default value: null. This option may be
specified 0 or more times.
ATTRIBUTES_TO_REVERSE=String
RV=String Attributes on negative strand reads that need to be reversed. Default value: [OQ, U2].
This option can be set to 'null' to clear the default value. This option may be specified
0 or more times. This option can be set to 'null' to clear the default list.
ATTRIBUTES_TO_REVERSE_COMPLEMENT=String
RC=String Attributes on negative strand reads that need to be reverse complemented. Default value:
[E2, SQ]. This option can be set to 'null' to clear the default value. This option may be
specified 0 or more times. This option can be set to 'null' to clear the default list.
READ1_TRIM=Integer
R1_TRIM=Integer The number of bases trimmed from the beginning of read 1 prior to alignment Default
value: 0. This option can be set to 'null' to clear the default value.
READ2_TRIM=Integer
R2_TRIM=Integer The number of bases trimmed from the beginning of read 2 prior to alignment Default
value: 0. This option can be set to 'null' to clear the default value.
EXPECTED_ORIENTATIONS=PairOrientation
ORIENTATIONS=PairOrientation The expected orientation of proper read pairs. Replaces JUMP_SIZE Default value: null.
Possible values: {FR, RF, TANDEM} This option may be specified 0 or more times. Cannot be
used in conjuction with option(s) JUMP_SIZE (JUMP)
ALIGNER_PROPER_PAIR_FLAGS=Boolean
Use the aligner's idea of what a proper pair is rather than computing in this program.
Default value: false. This option can be set to 'null' to clear the default value.
Possible values: {true, false}
SORT_ORDER=SortOrder
SO=SortOrder The order in which the merged reads should be output. Default value: coordinate. This
option can be set to 'null' to clear the default value. Possible values: {unsorted,
queryname, coordinate, duplicate, unknown}
PRIMARY_ALIGNMENT_STRATEGY=PrimaryAlignmentStrategy
Strategy for selecting primary alignment when the aligner has provided more than one
alignment for a pair or fragment, and none are marked as primary, more than one is marked
as primary, or the primary alignment is filtered out for some reason. For all strategies,
ties are resolved arbitrarily. Default value: BestMapq. This option can be set to 'null'
to clear the default value. Possible values: {
BestMapq (Expects that multiple alignments will be correlated with HI tag, and prefers the
pair of alignments with the largest MAPQ, in the absence of a primary selected by the
aligner.)
EarliestFragment (Prefers the alignment which maps the earliest base in the read. Note
that EarliestFragment may not be used for paired reads.)
BestEndMapq (Appropriate for cases in which the aligner is not pair-aware, and does not
output the HI tag. It simply picks the alignment for each end with the highest MAPQ, and
makes those alignments primary, regardless of whether the two alignments make sense
together.)
MostDistant (Appropriate for a non-pair-aware aligner. Picks the alignment pair with the
largest insert size. If all alignments would be chimeric, it picks the alignments for each
end with the best MAPQ. )
}
CLIP_OVERLAPPING_READS=BooleanFor paired reads, soft clip the 3' end of each read if necessary so that it does not
extend past the 5' end of its mate. Default value: true. This option can be set to 'null'
to clear the default value. Possible values: {true, false}
INCLUDE_SECONDARY_ALIGNMENTS=Boolean
If false, do not write secondary alignments to output. Default value: true. This option
can be set to 'null' to clear the default value. Possible values: {true, false}
ADD_MATE_CIGAR=Boolean
MC=Boolean Adds the mate CIGAR tag (MC) if true, does not if false. Default value: true. This option
can be set to 'null' to clear the default value. Possible values: {true, false}
UNMAP_CONTAMINANT_READS=Boolean
UNMAP_CONTAM=Boolean Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial
sample),and unmap + label those reads accordingly. Default value: false. This option can
be set to 'null' to clear the default value. Possible values: {true, false}
MIN_UNCLIPPED_BASES=Integer If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will
be marked as contaminant. Default value: 32. This option can be set to 'null' to clear
the default value.
MATCHING_DICTIONARY_TAGS=String
List of Sequence Records tags that must be equal (if present) in the reference dictionary
and in the aligned file. Mismatching tags will cause an error if in this list, and a
warning otherwise. Default value: [M5, LN]. This option can be set to 'null' to clear the
default value. This option may be specified 0 or more times. This option can be set to
'null' to clear the default list.
UNMAPPED_READ_STRATEGY=UnmappingReadStrategy
How to deal with alignment information in reads that are being unmapped (e.g. due to
cross-species contamination.) Currently ignored unless UNMAP_CONTAMINANT_READS = true
Default value: DO_NOT_CHANGE. This option can be set to 'null' to clear the default value.
Possible values: {COPY_TO_TAG, DO_NOT_CHANGE, MOVE_TO_TAG}
REFERENCE_SEQUENCE=File
R=File Reference sequence file. Required. -
Actually, I noticed the command that is actually running. I think the problematic area is like you said, `-PROGRAM_GROUP_COMMAND_LINE bwa mem -K 100000000 -p -v 3 -t 4 -Y \/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \`, even though I specify `PROGRAM_GROUP_COMMAND_LINE="bwa mem -K 100000000 -p -v 3 -t 4 -Y $bash_ref_fasta` in the input command. However, I am not sure of the correct syntax. How are we supposed to supply this flag with the command?
MergeBamAlignment \
-VALIDATION_STRINGENCY SILENT \
-EXPECTED_ORIENTATIONS FR \
-ATTRIBUTES_TO_RETAIN X0 \
-ALIGNED_BAM /n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/-660116835/BOT3006-1-WES.unmerged.bam -UNMAPPED_BAM /n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/1094665884/BOT3006-1-WES.bam -OUTPUT BOT3006-1-WES.aligned.unsorted.bam \
-REFERENCE_SEQUENCE /n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \
-PAIRED_RUN true \
-SORT_ORDER unsorted \
-IS_BISULFITE_SEQUENCE false \
-ALIGNED_READS_ONLY false \
-CLIP_ADAPTERS false \
-MAX_RECORDS_IN_RAM 2000000 \
-ADD_MATE_CIGAR true \
-MAX_INSERTIONS_OR_DELETIONS -1 \
-PRIMARY_ALIGNMENT_STRATEGY MostDistant \
-PROGRAM_RECORD_ID bwamem \
-PROGRAM_GROUP_VERSION 0.7.17-r1188 \
-PROGRAM_GROUP_COMMAND_LINE bwa mem -K 100000000 -p -v 3 -t 4 -Y \/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \
-PROGRAM_GROUP_NAME bwamem \
-UNMAPPED_READ_STRATEGY COPY_TO_TAG \
-ALIGNER_PROPER_PAIR_FLAGS true \
-UNMAP_CONTAMINANT_READS true -
Hi vctrymao, thanks for posting the update with the information you found.
Quotes are necessary when you have spaces in an argument but not necessary otherwise. So, in your case, it looks like they would only be needed in the PROGRAM_GROUP_COMMAND_LINE.
Most likely this is an issue with whatever is in the $bash_ref_fasta variable. For example, there is a backslash in the variable after the -Y and also after .fasta. There can be issues with how bash reads in variables when they are in quotes, so if you look more into properly passing that variable into the command, you should be able to get it to work! You might also want to try single quotes for the argument.
Best,
Genevieve
Please sign in to leave a comment.
4 comments