Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK Best Practices Mitochondrial Analysis

1

26 comments

  • Avatar
    Tiffany Miller

    Thanks for the feedback ajwils

    I will pass your request to our documentation team. We have been looking for a GATK writer for nearly a year and finally found Derek Caetano-Anolles which we are so excited about! There is a large backlog of documentation improvements to go through. Not sure if you've seen it, but there is also a Terra featured workspace demoing the Mitchondria pipeline. I'll see if I can get a clarification on your last question about NuMT. 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Re your question: Can someone explain in detail the algorithm/logic behind NuMT filtration using median autosomal coverage as a filtering statistic? Is it basically the same as https://gatk.broadinstitute.org/hc/en-us/articles/360036732951-PolymorphicNuMT in 4.1.2.0??

    The answer I got was Yes. I'll follow up with more detail when I get it. 

    1
    Comment actions Permalink
  • Avatar
    ajwils

    Tiffany Miller Thanks for the response! I'm really glad you found an accomplished docs writer. 

    I've been doing most of my early testing on the Terra workspace, yes, but I'm really looking into local command line solutions nowadays (single alignment with basic gatk commands or using cromwell/docker/git to run gatk repo scripts). There are just too many confidentiality/security issues with hosting patient/client data in a Terra workspace bucket. 

    I'm sill trying to wrap my head around the NuMT solution, so any extra detail would be very much appreciated

    0
    Comment actions Permalink
  • Avatar
    Kat No

    Greetings,

    I'm hoping that this is the correct place to ask this question since there is no longer commenting posted over here: https://gatkforums.broadinstitute.org/gatk/discussion/23598/new-mitochondrial-analysis-with-mutect2#latest

    I see that there are some options using Mutect2 with either WGS or WES data, regarding the autosomal coverage metric. I wanted to ask your thoughts on trying to variant call just using independently assembled mitochondrial genomes as input? I have some isolated mitomes (obviously of course these might include some numt sequences), and I'm wondering how you think that variant calling will go in this type of data. 

    Any thoughts are appreciated! Thanks. (gatk-4.1.4.1)

    0
    Comment actions Permalink
  • Avatar
    ajwils

    Kat No

    The best way to counter NuMTs when you're not doing WGS work is to take steps to isolate your mtDNA at the wet lab stage (long distance amplicons encompassing multiple mtDNA genes, specialized chromatography for the mito contig, etc etc). If you've done that and have true isolated mitomes, I suspect your NuMT content will be extremely minimal, and that your low-level heteroplasmy call sensitivity will be mostly accurate already. 

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    ajwils

     

    Thank you for your input!

    0
    Comment actions Permalink
  • Avatar
    sprocha

    Hi!

    not sure if this is the proper place, but will try.

    I am interested in using the --mitochondria mode to VC in *normal* and *tumour* tissues of a non-human species (a marine bivalve). Still going through the docs to set up stuff but question for now:

    - Is the PolymorphicNuMT pipeline only for humans or could be adapted to other organisms as long as we have a nuclear contig/scaffold that think may be a numt? Can someone explain me a bit more in detail how it works?

    Many thanks in advance,

    sara

    1
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi sprocha

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

     

    0
    Comment actions Permalink
  • Avatar
    sprocha

    ok! thanks!

    0
    Comment actions Permalink
  • Avatar
    FM

    Hello Everyone,
    I tried mitochondrial variant calling on my local Mac using the latest WDL (using cromwell/docker/git of https://github.com/gatk-workflows/gatk4-mitochondria-pipeline, G97753.NA12878.bam obtained from https://console.cloud.google.com/storage/gatk-best-practices, and MT reference files obtained from https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0/chrM), but got an error and no results.
    The analysis seemed to be running without a problem in the first half, but it stopped in the middle, and an error seemed to occur in the MergeBamAlignment part.
    The error messages were as follows:
    .....
    [2020-04-15 18:30:03,54] [info] BackgroundConfigAsyncJobExecutionActor [558f59ffAlignmentPipeline.AlignAndMarkDuplicates:NA:1]: Status change from WaitingForReturnCode to Done
    [2020-04-15 18:30:06,06] [error] WorkflowManagerActor Workflow 9921b2ea-6e79-41d8-9190-e9c04a624f49 failed (during ExecutingWorkflowState): Job AlignmentPipeline.AlignAndMarkDuplicates:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: /Users/Hoge/Desktop/work/gatk-workflows/cromwell-executions/MitochondriaPipeline/9921b2ea-6e79-41d8-9190-e9c04a624f49/call-AlignAndCall/AlignAndCall/0bdf8fd5-ab9f-49da-bfe2-2f210b8510ee/call-AlignToMt/AlignmentPipeline/fbb55c70-f147-4702-8049-431a7e3c53ad/call-AlignAndMarkDuplicates/execution/stderr.
    [First 300 bytes]:Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/MitochondriaPipeline/9921b2ea-6e79-41d8-9190-e9c04a624f49/call-AlignAndCall/AlignAndCall/0bdf8fd5-ab9f-49da-bfe2-2f210b8510ee/call-AlignToMt/AlignmentPipeline/fbb55c70-f147-4702-8049-431a7e3c53ad/call-AlignAndMarkDuplicates/tmp.f284d66e
    Job AlignmentPipeline.AlignAndMarkDuplicates:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
    Check the content of stderr for potential additional information: /Users/Hoge/Desktop/work/gatk-workflows/cromwell-executions/MitochondriaPipeline/9921b2ea-6e79-41d8-9190-e9c04a624f49/call-AlignAndCall/AlignAndCall/0bdf8fd5-ab9f-49da-bfe2-2f210b8510ee/call-AlignToShiftedMt/AlignmentPipeline/558f59ff-9b12-49d1-bf5a-043f3c0b7a3b/call-AlignAndMarkDuplicates/execution/stderr.

    So I checked the "stderr" file and it said the following.
    ....
    [M::mem_process_seqs] Processed 80168 reads in 5.509 CPU sec, 7.867 real sec
    [main] Version: 0.7.15-r1140
    [main] CMD: /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 2 -Y /cromwell-executions/MitochondriaPipeline/9921b2ea-6e79-41d8-9190-e9c04a624f49/call-AlignAndCall/AlignAndCall/0bdf8fd5-ab9f-49da-bfe2-2f210b8510ee/call-AlignToShiftedMt/AlignmentPipeline/558f59ff-9b12-49d1-bf5a-043f3c0b7a3b/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo
    [main] Real time: 107.698 sec; CPU: 46.181 sec
    WARNING 2020-04-15 09:29:35 SamAlignmentMerger Exception merging bam alignment - attempting to sort aligned reads and try again: Inappropriate call if not paired read
    INFO 2020-04-15 09:29:35 SamAlignmentMerger Finished reading 0 total records from alignment SAM/BAM.
    [Wed Apr 15 09:29:35 UTC 2020] picard.sam.MergeBamAlignment done. Elapsed time: 1.75 minutes.
    Runtime.totalMemory()=3133669376
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.IllegalArgumentException: Do not use this function to merge dictionaries with different sequences in them. Sequences must be in the same order as well. Found [] and [chrM].
    at htsjdk.samtools.SAMSequenceDictionary.mergeDictionaries(SAMSequenceDictionary.java:323)
    at picard.sam.SamAlignmentMerger.getDictionaryForMergedBam(SamAlignmentMerger.java:197)
    at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:363)
    at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
    at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:356)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

    I wasn't sure what caused this error.
    Has anyone been able to get a mitochondrial variant call on the test data (G97753.NA12878.bam) using the latest program? Is there a problem with my analysis environment?
    If anyone can give me a solution, I would greatly appreciate it.

     

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    To try to help with some of the previous questions about the NuMT filter, right now that filter is based solely on coverage and is overly broad/aggressive about filtering out low potential sites that truly belong in the autosome. We use a Poisson distribution taking the median autosomal coverage that was observed and assume that NuMT insertions could have happened multiple times (based on observations we've made in our samples). Then we filter out any site that has a lower number of reads supporting the alternate allele than a cutoff based on the Poisson distribution I just described. This is very broad and will also filter out real low AF mitochondrial variants, so it's a balance as far as how sensitive or precise you'd like to be with these low AF calls.

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    FM I tried to run the pipeline with the input you mentioned (G97753.NA12878.bam). I noticed that there is no bam index in that directory which is a required input to the pipeline, so I'm not clear on how you were able to run without that. Assuming you made an index, I also tried running that sample with an index and did not get the same error you are seeing from MergeBamAlignement. 

    Because that task pipes the output from BWA directly to MergeBamAlignment the error messages can unfortunately get somewhat lost. It looks like MergeBamAlignment might be getting an empty file as input, possibly because BWA ran into an error. Is there anything else in your stderr file or did you post the whole thing? Is there anything in the G97753.NA12878.realigned.bwa.stderr.log file (if you have that in the same directory as the stderr you posted)?

    0
    Comment actions Permalink
  • Avatar
    FM

    Hello Megan,

    Thank you for your reply. I created a BAI by SAMTOOLS and then ran that program because there was no BAI file for the BAM file as you say.
    I found the file "G97753.NA12878.realigned.bwa.stderr.log", so I'll copy and pace its contents. (I ran the program again, so the ID is a little different, but I got the same error.)

    ____________________

    [M::bwa_idx_load_from_disk] read 0 ALT contigs
    [W::main_mem] when '-p' is in use, the second query file is ignored.
    [M::process] read 662252 sequences (100000052 bp)...
    [M::process] 0 single-end sequences; 662252 paired-end sequences
    [M::process] read 311077 sequences (46972627 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (6, 326570, 6, 16)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (153, 227, 317)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 645)
    [M::mem_pestat] mean and std.dev: (242.20, 118.14)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 809)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation RR...
    [M::mem_pestat] (25, 50, 75) percentile: (1, 1, 27)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 79)
    [M::mem_pestat] mean and std.dev: (9.80, 18.73)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 105)
    [M::mem_pestat] skip orientation RR
    [M::mem_process_seqs] Processed 662252 reads in 42.333 CPU sec, 53.379 real sec
    [M::process] 1 single-end sequences; 311076 paired-end sequences
    [M::mem_process_seqs] Processed 1 reads in 0.002 CPU sec, 0.025 real sec
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 153663, 0, 8)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (150, 221, 309)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 627)
    [M::mem_pestat] mean and std.dev: (236.26, 114.81)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 786)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] skip orientation RR as there are not enough pairs
    [M::mem_process_seqs] Processed 311076 reads in 22.606 CPU sec, 27.390 real sec
    [main] Version: 0.7.15-r1140
    [main] CMD: /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 2 -Y /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta /dev/stdin -
    [main] Real time: 139.626 sec; CPU: 71.354 sec

    _____________________

    I'm not an expert, so I'm not sure, but is there something wrong with the log?

    Regards,

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Hi FM 

    Hmm, that log for BWA looks good to me. Can you post the full stderr file? I'm confused why this isn't working since I can use the same inputs and wdl and I'm not seeing the same problem.

    0
    Comment actions Permalink
  • Avatar
    FM
     
    Hi Megan,
     
    Thank you for your reply.

    I've posted the stderr file below. I've checked some of the mapped BAM files and they seemingly work well. I'm not sure what the reason for the error is. Do you get good results with no errors when you run the program the same way with the same files as I do? If you don't mind, could you tell me what environment you ran the program in? I'm thinking of running the same program in CentOS at a later date.

    Many Thanks!
     
    _________________________
     
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/tmp.d8559804
    [M::bwa_idx_load_from_disk] read 0 ALT contigs
    [W::main_mem] when '-p' is in use, the second query file is ignored.
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/tmp.d8559804
    INFO2020-04-16 16:49:40SamToFastq
     
    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    **********
    ********** The command line looks like this in the new syntax:
    **********
    **********    SamToFastq -INPUT /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-475663231/G97753.NA12878.bam -FASTQ /dev/stdout -INTERLEAVE true -NON_PF true
    **********
     
     
    INFO2020-04-16 16:49:40MergeBamAlignment
     
    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    **********
    ********** The command line looks like this in the new syntax:
    **********
    **********    MergeBamAlignment -VALIDATION_STRINGENCY SILENT -EXPECTED_ORIENTATIONS FR -ATTRIBUTES_TO_RETAIN X0 -ATTRIBUTES_TO_REMOVE NM -ATTRIBUTES_TO_REMOVE MD -ALIGNED_BAM /dev/stdin -UNMAPPED_BAM /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-475663231/G97753.NA12878.bam -OUTPUT mba.bam -REFERENCE_SEQUENCE /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta -PAIRED_RUN true -SORT_ORDER unsorted -IS_BISULFITE_SEQUENCE false -ALIGNED_READS_ONLY false -CLIP_ADAPTERS false -MAX_RECORDS_IN_RAM 2000000 -ADD_MATE_CIGAR true -MAX_INSERTIONS_OR_DELETIONS -1 -PRIMARY_ALIGNMENT_STRATEGY MostDistant -PROGRAM_RECORD_ID bwamem -PROGRAM_GROUP_VERSION 0.7.15-r1140 -PROGRAM_GROUP_COMMAND_LINE bwa mem -K 100000000 -p -v 3 -t 2 -Y /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta -PROGRAM_GROUP_NAME bwamem -UNMAPPED_READ_STRATEGY COPY_TO_TAG -ALIGNER_PROPER_PAIR_FLAGS true -UNMAP_CONTAMINANT_READS true -ADD_PG_TAG_TO_READS false
    **********
     
     
    16:49:41.632 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/gitc/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Apr 16 16:49:41 UTC 2020] SamToFastq INPUT=/cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-475663231/G97753.NA12878.bam FASTQ=/dev/stdout INTERLEAVE=true INCLUDE_NON_PF_READS=true    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Apr 16 16:49:41 UTC 2020] Executing as root@a6bae9c2cad7 on Linux 4.19.76-linuxkit amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.27-SNAPSHOT
    16:49:42.067 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/gitc/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Apr 16 16:49:42 UTC 2020] MergeBamAlignment ADD_PG_TAG_TO_READS=false UNMAPPED_BAM=/cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-475663231/G97753.NA12878.bam ALIGNED_BAM=[/dev/stdin] OUTPUT=mba.bam PROGRAM_RECORD_ID=bwamem PROGRAM_GROUP_VERSION=0.7.15-r1140 PROGRAM_GROUP_COMMAND_LINE=bwa mem -K 100000000 -p -v 3 -t 2 -Y /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta PROGRAM_GROUP_NAME=bwamem PAIRED_RUN=true CLIP_ADAPTERS=false IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false MAX_INSERTIONS_OR_DELETIONS=-1 ATTRIBUTES_TO_RETAIN=[X0] ATTRIBUTES_TO_REMOVE=[NM, MD] EXPECTED_ORIENTATIONS=[FR] ALIGNER_PROPER_PAIR_FLAGS=true SORT_ORDER=unsorted PRIMARY_ALIGNMENT_STRATEGY=MostDistant ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=true UNMAPPED_READ_STRATEGY=COPY_TO_TAG VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REFERENCE_SEQUENCE=/cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta    ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 CLIP_OVERLAPPING_READS=true INCLUDE_SECONDARY_ALIGNMENTS=true MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONARY_TAGS=[M5, LN] VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Apr 16 16:49:42 UTC 2020] Executing as root@a6bae9c2cad7 on Linux 4.19.76-linuxkit amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.27-SNAPSHOT
    INFO 2020-04-16 16:49:42 SamAlignmentMerger Processing SAM file(s): [/dev/stdin]
    [M::process] read 662252 sequences (100000052 bp)...
    [M::process] 0 single-end sequences; 662252 paired-end sequences
    [M::process] read 311077 sequences (46972627 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (6, 326570, 6, 16)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (153, 227, 317)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 645)
    [M::mem_pestat] mean and std.dev: (242.20, 118.14)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 809)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation RR...
    [M::mem_pestat] (25, 50, 75) percentile: (1, 1, 27)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 79)
    [M::mem_pestat] mean and std.dev: (9.80, 18.73)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 105)
    [M::mem_pestat] skip orientation RR
    [M::mem_process_seqs] Processed 662252 reads in 42.333 CPU sec, 53.379 real sec
    [M::process] 1 single-end sequences; 311076 paired-end sequences
    [M::mem_process_seqs] Processed 1 reads in 0.002 CPU sec, 0.025 real sec
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 153663, 0, 8)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (150, 221, 309)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 627)
    [M::mem_pestat] mean and std.dev: (236.26, 114.81)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 786)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] skip orientation RR as there are not enough pairs
    [M::mem_process_seqs] Processed 311076 reads in 22.606 CPU sec, 27.390 real sec
    [main] Version: 0.7.15-r1140
    [main] CMD: /usr/gitc/bwa mem -K 100000000 -p -v 3 -t 2 -Y /cromwell-executions/MitochondriaPipeline/0c4734f5-2b23-4ea6-9aa0-50122a3c08c6/call-AlignAndCall/AlignAndCall/2e53a57d-42a4-4069-b479-7f4fd1ba8afc/call-AlignToShiftedMt/AlignmentPipeline/5098f038-37cc-4186-9fd8-3474882f4725/call-AlignAndMarkDuplicates/inputs/-1270591648/Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta /dev/stdin -
    [main] Real time: 139.626 sec; CPU: 71.354 sec
    WARNING 2020-04-16 16:51:59 SamAlignmentMerger Exception merging bam alignment - attempting to sort aligned reads and try again: Inappropriate call if not paired read
    INFO 2020-04-16 16:51:59 SamAlignmentMerger Finished reading 0 total records from alignment SAM/BAM.
    [Thu Apr 16 16:51:59 UTC 2020] picard.sam.MergeBamAlignment done. Elapsed time: 2.29 minutes.
    Runtime.totalMemory()=3133145088
    Exception in thread "main" java.lang.IllegalArgumentException: Do not use this function to merge dictionaries with different sequences in them. Sequences must be in the same order as well. Found [] and [chrM].
    at htsjdk.samtools.SAMSequenceDictionary.mergeDictionaries(SAMSequenceDictionary.java:323)
    at picard.sam.SamAlignmentMerger.getDictionaryForMergedBam(SamAlignmentMerger.java:197)
    at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:363)
    at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
    at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:356)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    FM I'm not seeing anything obvious in the stderr you posted. Out of curiosity, this failed task seems to be with the "shifted" reference, have you had any problems with BWA/MergeBamAlignment with the non-shifted reference? The pipeline should run both.

    I am able to run this sample on Terra, which runs Cromwell with a google cloud backend.

    0
    Comment actions Permalink
  • Avatar
    FM

    Hello Megan,

    Thank you for your fast response.

    Based on your kind comments, I also checked the stderr file of BWA/MergeBamAlignment with the "non-shifted". In the file, I saw the exact same error log as in the case of "shifted" above. (The message was "Exception in thread "main" java.lang.IllegalArgumentException: Do not use this function to merge dictionaries with different sequences in them. Sequences must be in the same order as well. Found [] and [chrM].")

    I've never used Terra before, but I'll try to learn about it. If I could use Terra and get results for the same sample data using the Terra, I would like to report it to you.

    Best Regards,

    0
    Comment actions Permalink
  • Avatar
    FM

    Hello Megan,

    I've tried Terra and a few other things and I'll tell you the results here.

    1. Default demo data for NA12878.cram on Terra: Succeeded.

    2. Running the above G97753.NA12878.bam data on Terra, which I uploaded myself to Terra: Succeeded.

    3. Copy NA12878.cram and crai on Terra to my Mac and then run the program on my Mac for the data: Failed (same error as above)

    4. Run WDL program on local CentOS using G97753.NA12878.bam: Succeeded!!

    5. After attempting to update docker and git on a Mac and/or using other versions of WDL on a Mac: Failed...

     

    As a result, I'm not sure what caused the program's error on my Mac, but I was relieved that it worked on my local CentOS because I wanted to run the program locally. I'll be using the mitochondrial analysis program on local CentOS server to investigate my samples.

    Thank you very much for your kindness.

    Best Regards,

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    FM I'm glad this worked on CentOS, thanks for letting me know. I'm glad you can move forward! 

    0
    Comment actions Permalink
  • Avatar
    Yue Wang

    Hi~

    I am working on fungal mitochondrial variants. I am wondering if the new mitochondrial pipeline is only for humans and I should use other tools like HaplotypeCaller or Mutect2 mitochondria mode.

    The sequence file was downloaded from the NCBI database. After extracting the fastq files, I did not check the quality or trim the files. Instead, I tried the methods to clean the data referring to this article https://gatk.broadinstitute.org/hc/en-us/articles/360039568932--How-to-Map-and-clean-up-short-read-sequence-data-efficiently

    The species I am working on is not a model one. So I guess I should do bootstrap to get trusted sites and do the BQSR. For this step, I have a total of 88 samples, should I use all of their BAM files to infer the trusted sites or just one is enough?

    For the variant calling, I am quite struggling to decide which tool to use, any suggestion will be appreciated.

    Is this workflow correct?

    Thank you very much!

     

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Yue Wang This seems like a fairly complex question, would you mind starting a new thread with this? Feel free to tag me and I'll try to address some of your questions. Thanks!

    0
    Comment actions Permalink
  • Avatar
    Matthew Dapas

    Hello,

    I was unable to build the pipeline container on the HPC I use (versioning issues with Singularity), so I ended up building my own copy of the pipeline, which is now running successfully. In my final VCF, all of the variants are duplicated as a result of merging the original alignment with the shifted alignment. As you would expect, all of the variant metadata is essentially the same across duplicate pairs except for the variants called near the boundaries of either alignment. I would like to have a VCF with one call per variant, removing the calls with lower quality--more of a join rather than a concat+sort, if you will. I wanted to double-check: did I miss a filtering or dedup step in the pipeline, or is this output expected?

    If I didn't miss anything, then I think an easy fix would be to simply filter out the control region from the non-shifted VCF and then remove everything but the control region from the shifted VCF prior to merging? Thanks.

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Matthew Dapas You're correct. In the current version of the pipeline we only use the shifted VCF for the control region, so your easy fix sounds very reasonable to me.

    0
    Comment actions Permalink
  • Avatar
    rohit satyam

    Hi, is there any update on the proper documentation of the Mitochondrial Variant Analysis. I am really finding it difficult to understand really what sequence of steps must be followed for finding Mitochondrial Variants. ajwils can you please share your steps if you compiled them from different blogs. In this blog only people have posted multiple errors and I kinda got lost...

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi rohit satyam, you can look into our pipeline here: https://github.com/gatk-workflows/gatk4-mitochondria-pipeline, it can be run locally on Cromwell. 

    ajwils did you write down the steps that worked for you when re-creating the pipeline?

    0
    Comment actions Permalink
  • Avatar
    ajwils

    rohit satyam and Genevieve Brandt

    To be clear, all my command line pipe does is a single alignment equitable version of the Mutect2, FilterMutectCalls, and VariantFiltration steps, using the same arguments used in these three steps by the Terra double alignment pipe.

    Probably not what you want.

    If you want to run the double alignmnet pipe locally, try and set it up with Docker/git/Cromwell. You'll have to edit the scripts to work in your installations, and that can be a lot of work.

    We are still weighing whether or not it is worth it to do this at JHU for WES data. When you have high enough coverage 5000x+, those single-alignment control region coverage dropoffs due to circularity matter a lot less. And rescuing those reads is really all the double-alignment strategy is doing for you if you're working with WES, as the NUMT resolution logic only seems to apply to WGS.

     

    If you're interested in the single alignment version, I can post the generic command line input.

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk