Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MergeBamAlignment produces unknown error

Answered
0

4 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi vctrymao,

    There might be an issue with how you have submitted your PROGRAM_GROUP_COMMAND_LINE, is there an extra character that is causing bwa and mem to be read as separate?

    Could you share the entire stack trace?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    vctrymao

    I don't think so? The command as written was executed. I also have run this script before on other samples with no issue. 

    There isn't really a stack trace, this occurs before anything really runs: 

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/n/data1/hms/dbmi/park/victor/Doga/INFORM_trial/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/bffc3946-75a1-4284-bad9-b70fd0808bbb/call-MergeBamAlignment/shard-0/tmp.2b5c9f68

    ERROR: Invalid argument 'mem'.




    USAGE: MergeBamAlignment [options]




    Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#MergeBamAlignment




    Merge alignment data from a SAM or BAM with data in an unmapped BAM file.  

    Summary




    A command-line tool for merging BAM/SAM alignment info from a third-party aligner with the data in an unmapped BAM file,

    producing a third BAM file that has alignment data (from the aligner) and all the remaining data from the unmapped BAM.




    Quick note: this is not</b> a tool for taking multiple sam files and creating a bigger file by merging them. For that

    use-case, see {@link MergeSamFiles}.







    Details




    Many alignment tools (still!) require fastq format input. The unmapped bam may contain useful information that will be

    lost in the conversion to fastq (meta-data like sample alias, library, barcodes, etc., and read-level tags.)




    This tool takes an unaligned bam with meta-data, and the aligned bam produced by calling {@link SamToFastq} and then

    passing the result to an aligner/mapper. It produces a new SAM file that includes all aligned and unaligned reads and

    also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process

    of converting to fastq). The resulting file will be valid for use by Picard and GATK tools.




    The output may be coordinate-sorted, in which case the tags, NM, MD, and UQ will be calculated and populated, or

    query-name sorted, in which case the tags will not be calculated or populated.







    Usage example:







    java -jar picard.jar MergeBamAlignment \

    ALIGNED=aligned.bam \

    UNMAPPED=unmapped.bam \

    O=merge_alignments.bam \

    R=reference_sequence.fasta







    Caveats




    This tool has been developing for a while and many arguments have been added to it over the years. You may be

    particularly interested in the following (partial) list:







    - CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters




    - IS_BISULFITE_SEQUENCE -- Whether the sequencing originated from bisulfite sequencing, in which case NM will be

    calculated differently




    - ALIGNER_PROPER_PAIR_FLAGS -- Use if the aligner that was used cannot be trusted to set the "Proper pair" flag and then

    the tool will set this flag based on orientation and distance between pairs.




    - ADD_MATE_CIGAR -- Whether to use this opportunity to add the MC tag to each read.




    - UNMAP_CONTAMINANT_READS (and MIN_UNCLIPPED_BASES) -- Whether to identify extremely short alignments (with clipping on

    both sides) as cross-species contamination and unmap the reads.













    Version: 2.18.3-SNAPSHOT







    Options:




    --help

    -h                            Displays options specific to this tool.




    --stdhelp

    -H                            Displays options specific to this tool AND options common to all Picard command line

                                  tools.




    --version                     Displays program version.




    UNMAPPED_BAM=File

    UNMAPPED=File                 Original SAM or BAM file of unmapped reads, which must be in queryname order.  Required. 




    ALIGNED_BAM=File

    ALIGNED=File                  SAM or BAM file(s) with alignment data.  Default value: null. This option may be specified

                                  0 or more times.  Cannot be used in conjuction with option(s) READ1_ALIGNED_BAM

                                  (R1_ALIGNED) READ2_ALIGNED_BAM (R2_ALIGNED)




    READ1_ALIGNED_BAM=File

    R1_ALIGNED=File               SAM or BAM file(s) with alignment data from the first read of a pair.  Default value:

                                  null. This option may be specified 0 or more times.  Cannot be used in conjuction with

                                  option(s) ALIGNED_BAM (ALIGNED)




    READ2_ALIGNED_BAM=File

    R2_ALIGNED=File               SAM or BAM file(s) with alignment data from the second read of a pair.  Default value:

                                  null. This option may be specified 0 or more times.  Cannot be used in conjuction with

                                  option(s) ALIGNED_BAM (ALIGNED)




    OUTPUT=File

    O=File                        Merged SAM or BAM file to write to.  Required. 




    PROGRAM_RECORD_ID=String

    PG=String                     The program group ID of the aligner (if not supplied by the aligned file).  Default value:

                                  null. 




    PROGRAM_GROUP_VERSION=String

    PG_VERSION=String             The version of the program group (if not supplied by the aligned file).  Default value:

                                  null. 




    PROGRAM_GROUP_COMMAND_LINE=String

    PG_COMMAND=String             The command line of the program group (if not supplied by the aligned file).  Default

                                  value: null. 




    PROGRAM_GROUP_NAME=String

    PG_NAME=String                The name of the program group (if not supplied by the aligned file).  Default value: null.




    PAIRED_RUN=Boolean

    PE=Boolean                    DEPRECATED. This argument is ignored and will be removed.  Default value: true. This

                                  option can be set to 'null' to clear the default value. Possible values: {true, false} 




    JUMP_SIZE=Integer

    JUMP=Integer                  The expected jump size (required if this is a jumping library). Deprecated. Use

                                  EXPECTED_ORIENTATIONS instead  Default value: null.  Cannot be used in conjuction with

                                  option(s) EXPECTED_ORIENTATIONS (ORIENTATIONS)




    CLIP_ADAPTERS=Boolean         Whether to clip adapters where identified.  Default value: true. This option can be set to

                                  'null' to clear the default value. Possible values: {true, false} 




    IS_BISULFITE_SEQUENCE=Boolean Whether the lane is bisulfite sequence (used when calculating the NM tag).  Default value:

                                  false. This option can be set to 'null' to clear the default value. Possible values:

                                  {true, false} 




    ALIGNED_READS_ONLY=Boolean    Whether to output only aligned reads.    Default value: false. This option can be set to

                                  'null' to clear the default value. Possible values: {true, false} 




    MAX_INSERTIONS_OR_DELETIONS=Integer

    MAX_GAPS=Integer              The maximum number of insertions or deletions permitted for an alignment to be included.

                                  Alignments with more than this many insertions or deletions will be ignored. Set to -1 to

                                  allow any number of insertions or deletions.  Default value: 1. This option can be set to

                                  'null' to clear the default value. 




    ATTRIBUTES_TO_RETAIN=String   Reserved alignment attributes (tags starting with X, Y, or Z) that should be brought over

                                  from the alignment data when merging.  Default value: null. This option may be specified 0

                                  or more times. 




    ATTRIBUTES_TO_REMOVE=String   Attributes from the alignment record that should be removed when merging.  This overrides

                                  ATTRIBUTES_TO_RETAIN if they share common tags.  Default value: null. This option may be

                                  specified 0 or more times. 




    ATTRIBUTES_TO_REVERSE=String

    RV=String                     Attributes on negative strand reads that need to be reversed.  Default value: [OQ, U2].

                                  This option can be set to 'null' to clear the default value. This option may be specified

                                  0 or more times. This option can be set to 'null' to clear the default list. 




    ATTRIBUTES_TO_REVERSE_COMPLEMENT=String

    RC=String                     Attributes on negative strand reads that need to be reverse complemented.  Default value:

                                  [E2, SQ]. This option can be set to 'null' to clear the default value. This option may be

                                  specified 0 or more times. This option can be set to 'null' to clear the default list. 




    READ1_TRIM=Integer

    R1_TRIM=Integer               The number of bases trimmed from the beginning of read 1 prior to alignment  Default

                                  value: 0. This option can be set to 'null' to clear the default value. 




    READ2_TRIM=Integer

    R2_TRIM=Integer               The number of bases trimmed from the beginning of read 2 prior to alignment  Default

                                  value: 0. This option can be set to 'null' to clear the default value. 




    EXPECTED_ORIENTATIONS=PairOrientation

    ORIENTATIONS=PairOrientation  The expected orientation of proper read pairs. Replaces JUMP_SIZE  Default value: null.

                                  Possible values: {FR, RF, TANDEM} This option may be specified 0 or more times.  Cannot be

                                  used in conjuction with option(s) JUMP_SIZE (JUMP)




    ALIGNER_PROPER_PAIR_FLAGS=Boolean

                                  Use the aligner's idea of what a proper pair is rather than computing in this program. 

                                  Default value: false. This option can be set to 'null' to clear the default value.

                                  Possible values: {true, false} 




    SORT_ORDER=SortOrder

    SO=SortOrder                  The order in which the merged reads should be output.  Default value: coordinate. This

                                  option can be set to 'null' to clear the default value. Possible values: {unsorted,

                                  queryname, coordinate, duplicate, unknown} 




    PRIMARY_ALIGNMENT_STRATEGY=PrimaryAlignmentStrategy

                                  Strategy for selecting primary alignment when the aligner has provided more than one

                                  alignment for a pair or fragment, and none are marked as primary, more than one is marked

                                  as primary, or the primary alignment is filtered out for some reason. For all strategies,

                                  ties are resolved arbitrarily.  Default value: BestMapq. This option can be set to 'null'

                                  to clear the default value. Possible values: {

                                  BestMapq (Expects that multiple alignments will be correlated with HI tag, and prefers the

                                  pair of alignments with the largest MAPQ, in the absence of a primary selected by the

                                  aligner.)

                                  EarliestFragment (Prefers the alignment which maps the earliest base in the read. Note

                                  that EarliestFragment may not be used for paired reads.)

                                  BestEndMapq (Appropriate for cases in which the aligner is not pair-aware, and does not

                                  output the HI tag. It simply picks the alignment for each end with the highest MAPQ, and

                                  makes those alignments primary, regardless of whether the two alignments make sense

                                  together.)

                                  MostDistant (Appropriate for a non-pair-aware aligner. Picks the alignment pair with the

                                  largest insert size. If all alignments would be chimeric, it picks the alignments for each

                                  end with the best MAPQ. )

                                  } 




    CLIP_OVERLAPPING_READS=BooleanFor paired reads, soft clip the 3' end of each read if necessary so that it does not

                                  extend past the 5' end of its mate.  Default value: true. This option can be set to 'null'

                                  to clear the default value. Possible values: {true, false} 




    INCLUDE_SECONDARY_ALIGNMENTS=Boolean

                                  If false, do not write secondary alignments to output.  Default value: true. This option

                                  can be set to 'null' to clear the default value. Possible values: {true, false} 




    ADD_MATE_CIGAR=Boolean

    MC=Boolean                    Adds the mate CIGAR tag (MC) if true, does not if false.  Default value: true. This option

                                  can be set to 'null' to clear the default value. Possible values: {true, false} 




    UNMAP_CONTAMINANT_READS=Boolean

    UNMAP_CONTAM=Boolean          Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial

                                  sample),and unmap + label those reads accordingly.  Default value: false. This option can

                                  be set to 'null' to clear the default value. Possible values: {true, false} 




    MIN_UNCLIPPED_BASES=Integer   If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will

                                  be marked as contaminant.  Default value: 32. This option can be set to 'null' to clear

                                  the default value. 




    MATCHING_DICTIONARY_TAGS=String

                                  List of Sequence Records tags that must be equal (if present) in the reference dictionary

                                  and in the aligned file. Mismatching tags will cause an error if in this list, and a

                                  warning otherwise.  Default value: [M5, LN]. This option can be set to 'null' to clear the

                                  default value. This option may be specified 0 or more times. This option can be set to

                                  'null' to clear the default list. 




    UNMAPPED_READ_STRATEGY=UnmappingReadStrategy

                                  How to deal with alignment information in reads that are being unmapped (e.g. due to

                                  cross-species contamination.) Currently ignored unless UNMAP_CONTAMINANT_READS = true 

                                  Default value: DO_NOT_CHANGE. This option can be set to 'null' to clear the default value.

                                  Possible values: {COPY_TO_TAG, DO_NOT_CHANGE, MOVE_TO_TAG} 




    REFERENCE_SEQUENCE=File

    R=File                        Reference sequence file.  Required. 
    0
    Comment actions Permalink
  • Avatar
    vctrymao

    Actually, I noticed the command that is actually running. I think the problematic area is like you said, `-PROGRAM_GROUP_COMMAND_LINE bwa mem -K 100000000 -p -v 3 -t 4 -Y \/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \`, even though I specify `PROGRAM_GROUP_COMMAND_LINE="bwa mem -K 100000000 -p -v 3 -t 4 -Y $bash_ref_fasta` in the input command. However, I am not sure of the correct syntax. How are we supposed to supply this flag with the command? 

    MergeBamAlignment \
    -VALIDATION_STRINGENCY SILENT \
    -EXPECTED_ORIENTATIONS FR \
    -ATTRIBUTES_TO_RETAIN X0 \
    -ALIGNED_BAM /n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/-660116835/BOT3006-1-WES.unmerged.bam -UNMAPPED_BAM /n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/1094665884/BOT3006-1-WES.bam -OUTPUT BOT3006-1-WES.aligned.unsorted.bam \
    -REFERENCE_SEQUENCE /n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \
    -PAIRED_RUN true \
    -SORT_ORDER unsorted \
    -IS_BISULFITE_SEQUENCE false \
    -ALIGNED_READS_ONLY false \
    -CLIP_ADAPTERS false \
    -MAX_RECORDS_IN_RAM 2000000 \
    -ADD_MATE_CIGAR true \
    -MAX_INSERTIONS_OR_DELETIONS -1 \
    -PRIMARY_ALIGNMENT_STRATEGY MostDistant \
    -PROGRAM_RECORD_ID bwamem \
    -PROGRAM_GROUP_VERSION 0.7.17-r1188 \
    -PROGRAM_GROUP_COMMAND_LINE bwa mem -K 100000000 -p -v 3 -t 4 -Y \/n/data1/hms/dbmi/park/DATA/INFORM_trial/.FastqToSam2/tumor_WES/.PreProcessing/.BOT3006-1-WES.bam/.sh/cromwell-executions/PreProcessingForVariantDiscovery_GATK4/52f7e630-0980-4f43-ba61-a9d325f57c53/call-MergeBamAlignment/shard-0/inputs/232076856/Homo_sapiens_assembly19.fasta \
    -PROGRAM_GROUP_NAME bwamem \
    -UNMAPPED_READ_STRATEGY COPY_TO_TAG \
    -ALIGNER_PROPER_PAIR_FLAGS true \
    -UNMAP_CONTAMINANT_READS true
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi vctrymao, thanks for posting the update with the information you found.

    Quotes are necessary when you have spaces in an argument but not necessary otherwise. So, in your case, it looks like they would only be needed in the PROGRAM_GROUP_COMMAND_LINE. 

    Most likely this is an issue with whatever is in the $bash_ref_fasta variable. For example, there is a backslash in the variable after the -Y and also after .fasta. There can be issues with how bash reads in variables when they are in quotes, so if you look more into properly passing that variable into the command, you should be able to get it to work! You might also want to try single quotes for the argument.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk