Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

issue while running mitochondrial analysis

Answered
0

15 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Sinem Selvi,

    Thanks for writing in so that we can get this pipeline working for you! The best practices pipeline is using RevertSam, not RevertSamSpark. RevertSamSpark is still a BETA tool so for troubleshooting purposes, can you please try this with RevertSam?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Same error occurs after switching the tool, but I see fewer errors in ValidateSamFile summary:

    Error Type      Count
    ERROR:MATE_NOT_FOUND    306

    But I have the reads error in the MergeBamAlignment part.

    Error:

    Exception in thread "main" picard.PicardException: Second read from pair not found in unmapped bam: A00958:129:HCNGCDSX3:1:1131:9308:5932, A00958:129:HCNGCDSX3:1:1132:15257:5995

    As mentioned here (https://gatk.broadinstitute.org/hc/en-us/articles/4403687183515--How-to-Generate-an-unmapped-BAM-from-FASTQ-or-aligned-BAM) I added "--SANITIZE true" option on ReverSam part. And I got no error summary by ValidateSamFile.

    But still MergeBamAlignment part have error!

    Error:

    Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: HJYFJCCXX160204:4:1101:10003:7778!

    The read is in aligned bam file:

    grep "HJYFJCCXX160204:4:1101:10003:7778" /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.sorted.sam
    HJYFJCCXX160204:4:1101:10003:7778       163     chrM    7710    60      151M    =       7883    324     TTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAAATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCCCTACGCAGCCTTTACATAACAGACGAGGTCAACGA      ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????+5?????5????????????????????      OA:Z:chrM,7710,+,151M,47,null;  MC:Z:151M       RG:Z:HJYFJ.4    XM:Z:chrM       MQ:i:60 AS:i:146
    HJYFJCCXX160204:4:1101:10003:7778       83      chrM    7883    60      151M    =       7710    -324    ATTGGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCCTACATACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATCGAGTAGTACTCCCGATTGAAGCCCCCA      ??????????????????????????5??5????????????????????????????5????????????5???????????????5????5????????????????????5??????????????????????5?????????????5      OA:Z:chrM,7883,-,151M,60,null;  MC:Z:151M       RG:Z:HJYFJ.4    XM:Z:chrM       MQ:i:60 AS:i:151

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hi,

    I could get bam file by changing sort order command (--SORT_ORDER unsorted). There is still the same error but merged bam file is produced.

    Log:

    17:14:19.159 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Tue May 17 17:14:19 TRT 2022] MergeBamAlignment --UNMAPPED_BAM /gpfs/shared/WES_analyses/hg38/reverted_sorted.bam --ALIGNED_BAM /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.sorted.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/merge_alignments.bam --SORT_ORDER unsorted --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --ADD_PG_TAG_TO_READS true --PAIRED_RUN true --CLIP_ADAPTERS true --IS_BISULFITE_SEQUENCE false --ALIGNED_READS_ONLY false --MAX_INSERTIONS_OR_DELETIONS 1 --ATTRIBUTES_TO_REVERSE OQ --ATTRIBUTES_TO_REVERSE U2 --ATTRIBUTES_TO_REVERSE_COMPLEMENT E2 --ATTRIBUTES_TO_REVERSE_COMPLEMENT SQ --READ1_TRIM 0 --READ2_TRIM 0 --ALIGNER_PROPER_PAIR_FLAGS false --PRIMARY_ALIGNMENT_STRATEGY BestMapq --CLIP_OVERLAPPING_READS true --HARD_CLIP_OVERLAPPING_READS false --INCLUDE_SECONDARY_ALIGNMENTS true --ADD_MATE_CIGAR true --UNMAP_CONTAMINANT_READS false --MIN_UNCLIPPED_BASES 32 --MATCHING_DICTIONARY_TAGS M5 --MATCHING_DICTIONARY_TAGS LN --UNMAPPED_READ_STRATEGY DO_NOT_CHANGE --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Tue May 17 17:14:19 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-05-17 17:14:19     SamAlignmentMerger      Processing SAM file(s): [/gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.sorted.bam]
    WARNING 2022-05-17 17:14:20     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Reads remaining on alignment iterator: HJYFJCCXX160204:4:1101:15980:26993!
    INFO    2022-05-17 17:14:27     SamAlignmentMerger      Read 1000000 records from alignment SAM/BAM.
    INFO    2022-05-17 17:14:32     SamAlignmentMerger      Read 2000000 records from alignment SAM/BAM.
    INFO    2022-05-17 17:14:38     SamAlignmentMerger      Read 3000000 records from alignment SAM/BAM.
    INFO    2022-05-17 17:14:43     SamAlignmentMerger      Finished reading 3752069 total records from alignment SAM/BAM.
    WARNING 2022-05-17 17:14:43     SortingCollection       There is not enough memory per file for buffering. Reading will be unbuffered.
    [Tue May 17 17:14:44 TRT 2022] picard.sam.MergeBamAlignment done. Elapsed time: 0.43 minutes.
    Runtime.totalMemory()=2547384320
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: HJYFJCCXX160204:4:1101:10003:7778!
            at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:572)
            at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
            at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:368)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
            at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
            at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

     

    I cannot use the bam file, there is error in ValidateSamFile summary:

    Error Type      Count
    ERROR:MATE_NOT_FOUND    1
    ERROR:TRUNCATED_FILE    2

     

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Sinem Selvi,

    This is actually a different error than before, so I suggest you keep using RevertSam instead of RevertSamSpark. 

    See this forum post for information about how to fix your Reads remaining on alignment iterator error: https://gatk.broadinstitute.org/hc/en-us/community/posts/360057896892-Question-Picard-MergeBamAlignment-Reads-remaining-on-alignment-iterator

    I would recommend first sorting your input bam files.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hello,

    I sorted both files by queryname and run MergeBamAlignment with --SORT_ORDER queryname. I still get the output bam file. But it is not valid and I cannot use it. 

     

    10:00:28.936 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed May 18 10:00:29 TRT 2022] SortSam --INPUT /gpfs/shared/WES_analyses/hg38/reverted_fixed.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/reverted_sorted.bam --SORT_ORDER queryname --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed May 18 10:00:29 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-05-18 10:00:29     SortSam Finished reading inputs, merging and writing to output now.
    [Wed May 18 10:00:30 TRT 2022] picard.sam.SortSam done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=90308608
    10:00:31.592 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed May 18 10:00:31 TRT 2022] SortSam --INPUT /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.bam --OUTPUT /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.sorted.bam --SORT_ORDER queryname --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed May 18 10:00:31 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    WARNING: BAM index file /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.bai is older than BAM /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.bam
    INFO    2022-05-18 10:00:52     SortSam Finished reading inputs, merging and writing to output now.
    WARNING 2022-05-18 10:00:53     SortingCollection       There is not enough memory per file for buffering. Reading will be unbuffered.
    INFO    2022-05-18 10:00:53     SortSam Seen many non-increasing record positions. Printing Read-names as well.
    [Wed May 18 10:01:33 TRT 2022] picard.sam.SortSam done. Elapsed time: 1.03 minutes.
    Runtime.totalMemory()=2692874240
    10:01:34.262 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed May 18 10:01:34 TRT 2022] MergeBamAlignment --UNMAPPED_BAM /gpfs/shared/WES_analyses/hg38/reverted_sorted.bam --ALIGNED_BAM /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.sorted.bam --OUTPUT /gpfs/shared/WES_analyses/Sets-annalysis/hg38/merge_alignments.bam --SORT_ORDER queryname --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --ADD_PG_TAG_TO_READS true --PAIRED_RUN true --CLIP_ADAPTERS true --IS_BISULFITE_SEQUENCE false --ALIGNED_READS_ONLY false --MAX_INSERTIONS_OR_DELETIONS 1 --ATTRIBUTES_TO_REVERSE OQ --ATTRIBUTES_TO_REVERSE U2 --ATTRIBUTES_TO_REVERSE_COMPLEMENT E2 --ATTRIBUTES_TO_REVERSE_COMPLEMENT SQ --READ1_TRIM 0 --READ2_TRIM 0 --ALIGNER_PROPER_PAIR_FLAGS false --PRIMARY_ALIGNMENT_STRATEGY BestMapq --CLIP_OVERLAPPING_READS true --HARD_CLIP_OVERLAPPING_READS false --INCLUDE_SECONDARY_ALIGNMENTS true --ADD_MATE_CIGAR true --UNMAP_CONTAMINANT_READS false --MIN_UNCLIPPED_BASES 32 --MATCHING_DICTIONARY_TAGS M5 --MATCHING_DICTIONARY_TAGS LN --UNMAPPED_READ_STRATEGY DO_NOT_CHANGE --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed May 18 10:01:34 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-05-18 10:01:34     SamAlignmentMerger      Processing SAM file(s): [/gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.sorted.bam]
    WARNING 2022-05-18 10:01:35     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Reads remaining on alignment iterator: HJYFJCCXX160204:4:1101:10003:7778!
    INFO    2022-05-18 10:01:42     SamAlignmentMerger      Read 1000000 records from alignment SAM/BAM.
    INFO    2022-05-18 10:01:48     SamAlignmentMerger      Read 2000000 records from alignment SAM/BAM.
    INFO    2022-05-18 10:01:53     SamAlignmentMerger      Read 3000000 records from alignment SAM/BAM.
    INFO    2022-05-18 10:01:58     SamAlignmentMerger      Finished reading 3752069 total records from alignment SAM/BAM.
    WARNING 2022-05-18 10:01:58     SortingCollection       There is not enough memory per file for buffering. Reading will be unbuffered.
    [Wed May 18 10:01:59 TRT 2022] picard.sam.MergeBamAlignment done. Elapsed time: 0.42 minutes.
    Runtime.totalMemory()=2547449856
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: HJYFJCCXX160204:4:1101:10003:7778!
            at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:572)
            at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
            at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:368)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
            at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
            at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

    How to solve the problem?

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Sinem Selvi,

    We have to do a bit of troubleshooting to determine why you are having this issue with your MergeBamAlignment input file and find the true cause. Can you run ValidateSamFile on your input bam to MergeBamAlignment and post the output here?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hello,
    Firstly, I got the aligned bam file from here:

    gs://fc-ccb0e231-7c8d-4a45-82a3-0f3584a216c0/40c2c7c9-3d17-40b0-96e6-4b43d2b56250/MitochondriaPipeline/56b9e8ed-898c-4747-b8e5-992c3afe71fa/call-AlignToMt/AlignAndMarkDuplicates.AlignmentPipeline/990d083d-5e16-403c-b55d-8d176c69e72d/call-AlignAndMarkDuplicates/NA12878.realigned.bam

    If it is the wrong bam file, can you suggest the aligned bam file for the pipeline?

    The summary output of aligned and unmapped bam files of MergeBamAlignment is below.

    14:10:24.095 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Mon May 23 14:10:24 TRT 2022] ValidateSamFile --INPUT /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.bam --MODE SUMMARY --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Mon May 23 14:10:24 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    WARNING: BAM index file /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.bai is older than BAM /gpfs/shared/WES_analyses/Development/mito-gatk/NA12878.realigned.bam


    ## HISTOGRAM    java.lang.String
    Error Type      Count
    WARNING:MISSING_TAG_NM  3707265

    [Mon May 23 14:10:48 TRT 2022] picard.sam.ValidateSamFile done. Elapsed time: 0.41 minutes.
    Runtime.totalMemory()=139329536
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    14:10:49.525 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Mon May 23 14:10:49 TRT 2022] ValidateSamFile --INPUT /gpfs/shared/WES_analyses/hg38/reverted_sorted.bam --MODE SUMMARY --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Mon May 23 14:10:49 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    No errors found
    [Mon May 23 14:10:50 TRT 2022] picard.sam.ValidateSamFile done. Elapsed time: 0.02 minutes.

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Hello,

    I have a couple of questions about this input data. Are you using gs://fc-ccb0e231-7c8d-4a45-82a3-0f3584a216c0/40c2c7c9-3d17-40b0-96e6-4b43d2b56250/MitochondriaPipeline/56b9e8ed-898c-4747-b8e5-992c3afe71fa/call-AlignToMt/AlignAndMarkDuplicates.AlignmentPipeline/990d083d-5e16-403c-b55d-8d176c69e72d/call-AlignAndMarkDuplicates/NA12878.realigned.bam as the input file? I believe that file has already been aligned, so I don't think I understand why you need to run MergeBamAlignment on it again. Can you please clarify that that is the input file you're using? Which unaligned bam are you trying to merge with this aligned bam?

    Also, from looking at your original post the issue that you first ran into might be that this PrintReads command doesn't print the mates of all the reads on chrM:

    $GATK --java-options "-Xmx"$memory"G -XX:ParallelGCThreads=$thread" PrintReads -I $SAMPLE_DIR/BQSR_output/recal_reads.bam -L /gpfs/shared/WES_analyses/bedfiles/mito.bed -O $SAMPLE_DIR/chrM.bam

    This will output only the reads that align to the region defined in /gpfs/shared/WES_analyses/bedfiles/mito.bed but not their mates which results in an invalid output file. You probably want to include these two arguments --read-filter MateOnSameContigOrNoMappedMateReadFilter and --read-filter MateUnmappedAndUnmappedReadFilter. These two read filters ensure that you only keep reads whose mates are also aligned to chrM. In fact you'll also want to be sure that your interval list (mito.bed) includes the whole chrM so that you don't end up keeping one read whose mate is outside of those bounds. This way your output will only be over chrM but all the reads will have their mate in the same file.

    I hope this is helpful, but please let me know if I'm misinterpreting your pipeline. 

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hello,
    Thank you for helping. I'm working on a patient bam file and the pipeline starts with it. I understood that I needed a reference bam file in MergeAlignmentBam part out of my original patient data. Here there was a recommended reference file (https://gatk.broadinstitute.org/hc/en-us/community/posts/4416630574747/comments/4418254867355). That's why I took it that way. If I understood wrong please clarify it for me. The unaligned bam file is the patient reverted bam file. Should I use patient aligned bam file for MergeBamAlignment which is output of PrintReads chrM.bam? 

     

    I will run the pipeline with your recommended arguments for the mates.  The interval list as below:

    mito.bed

    chrM    1       16569

     

    Thank you for you time

     

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    That interval list looks good! So hopefully those arguments to PrintReads will at least ensure that the bam you're using is valid (no missing mates). 

    So part of the mitochondria pipeline is to realign to the shifted reference (which is referenced in your linked comment). This works by taking your overall patient sample bam, runnging PrintReads with those args mentioned above to only have reads that are aligned to chrM. Then we unalign those reads and realign them to the shifted reference to make sure we don't miss any sensitivity at the breakpoint in the reference.

    So after unaligning (the RevertSam step in the WDL) we run BWA followed by MergeBamAlignment using the reverted BAM and the newly realigned BAM. All this does is move the metadata from your BAM back into the aligned BAM (which was removed because we transfer the files to FASTQ before aligning with BWA). 

    Does this help clarify? 

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hello,
    Thank you to explain the pipeline to me! I successfully obtained the vcf file. I want to be sure everything is correct. The log of the pipeline is below. 

    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160G -XX:ParallelGCThreads=44 -jar /gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar PrintReads -I /gpfs/shared/WES_analyses/hg38/BQSR_output/recal_reads.bam -L /gpfs/shared/WES_analyses/bedfiles/mito.bed -ip 500 -O /gpfs/shared/WES_analyses/hg38/Mitochondria/chrM.bam --read-filter MateOnSameContigOrNoMappedMateReadFilter -read-filter MateUnmappedAndUnmappedReadFilter
    10:45:41.504 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 01, 2022 10:45:41 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    10:45:41.787 INFO  PrintReads - ------------------------------------------------------------
    10:45:41.788 INFO  PrintReads - The Genome Analysis Toolkit (GATK) v4.1.9.0
    10:45:41.788 INFO  PrintReads - For support and documentation go to https://software.broadinstitute.org/gatk/
    10:45:41.788 INFO  PrintReads - Executing as sselvi@server001.mesten.local on Linux v3.10.0-1062.12.1.el7.x86_64 amd64
    10:45:41.788 INFO  PrintReads - Java runtime: IBM J9 VM v8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30)
    10:45:41.788 INFO  PrintReads - Start Date/Time: June 1, 2022 10:45:41 AM TRT
    10:45:41.788 INFO  PrintReads - ------------------------------------------------------------
    10:45:41.788 INFO  PrintReads - ------------------------------------------------------------
    10:45:41.789 INFO  PrintReads - HTSJDK Version: 2.23.0
    10:45:41.789 INFO  PrintReads - Picard Version: 2.23.3
    10:45:41.789 INFO  PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    10:45:41.789 INFO  PrintReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    10:45:41.789 INFO  PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    10:45:41.789 INFO  PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    10:45:41.789 INFO  PrintReads - Deflater: IntelDeflater
    10:45:41.789 INFO  PrintReads - Inflater: IntelInflater
    10:45:41.789 INFO  PrintReads - GCS max retries/reopens: 20
    10:45:41.789 INFO  PrintReads - Requester pays: disabled
    10:45:41.789 INFO  PrintReads - Initializing engine
    10:45:42.380 INFO  FeatureManager - Using codec BEDCodec to read file file:///gpfs/shared/WES_analyses/bedfiles/mito.bed
    10:45:42.386 INFO  IntervalArgumentCollection - Processing 16569 bp from intervals
    10:45:42.387 INFO  PrintReads - Done initializing engine
    10:45:42.503 INFO  ProgressMeter - Starting traversal
    10:45:42.504 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute
    10:45:43.330 INFO  PrintReads - 0 read(s) filtered by: WellformedReadFilter
    427 read(s) filtered by: MateOnSameContigOrNoMappedMateReadFilter
    70 read(s) filtered by: MateUnmappedAndUnmappedReadFilter
    497 total reads filtered
    10:45:43.331 INFO  ProgressMeter -           chrM:16211              0.0                 48975        3557506.1
    10:45:43.331 INFO  ProgressMeter - Traversal complete. Processed 48975 total reads in 0.0 minutes.
    10:45:43.610 INFO  PrintReads - Shutting down engine
    [June 1, 2022 10:45:43 AM TRT] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 0.04 minutes.
    Runtime.totalMemory()=35454976
    Using GATK jar /gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160G -XX:ParallelGCThreads=44 -jar /gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar RevertSam -I /gpfs/shared/WES_analyses/hg38/Mitochondria/chrM.bam -O /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted.bam --SANITIZE true
    10:45:46.475 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:45:46 TRT 2022] RevertSam --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/chrM.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted.bam --SANITIZE true --OUTPUT_BY_READGROUP false --RESTORE_HARDCLIPS true --OUTPUT_BY_READGROUP_FILE_FORMAT dynamic --SORT_ORDER queryname --RESTORE_ORIGINAL_QUALITIES true --REMOVE_DUPLICATE_INFORMATION true --REMOVE_ALIGNMENT_INFORMATION true --ATTRIBUTE_TO_CLEAR NM --ATTRIBUTE_TO_CLEAR UQ --ATTRIBUTE_TO_CLEAR PG --ATTRIBUTE_TO_CLEAR MD --ATTRIBUTE_TO_CLEAR MQ --ATTRIBUTE_TO_CLEAR SA --ATTRIBUTE_TO_CLEAR MC --ATTRIBUTE_TO_CLEAR AS --MAX_DISCARD_FRACTION 0.01 --KEEP_FIRST_DUPLICATE false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    Jun 01, 2022 10:45:46 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Wed Jun 01 10:45:46 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.9.0
    INFO    2022-06-01 10:45:47     RevertSam       Detected quality format for ID: Standard
    INFO    2022-06-01 10:45:48     RevertSam       Discarded 0 out of 48808 (0.000%) reads in order to sanitize output.
    [Wed Jun 01 10:45:48 TRT 2022] picard.sam.RevertSam done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=108920832
    Tool returned:
    0
    10:45:49.620 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:45:49 TRT 2022] SortSam --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted_sorted.bam --SORT_ORDER queryname --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:45:49 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:45:50     SortSam Finished reading inputs, merging and writing to output now.
    [Wed Jun 01 10:45:51 TRT 2022] picard.sam.SortSam done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=81920000
    10:45:52.345 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:45:52 TRT 2022] SortSam --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/chrM.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/chrM_sorted.bam --SORT_ORDER queryname --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:45:52 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:45:53     SortSam Finished reading inputs, merging and writing to output now.
    INFO    2022-06-01 10:45:53     SortSam Seen many non-increasing record positions. Printing Read-names as well.
    [Wed Jun 01 10:45:54 TRT 2022] picard.sam.SortSam done. Elapsed time: 0.04 minutes.
    Runtime.totalMemory()=142802944
    10:45:55.577 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:45:55 TRT 2022] SamToFastq --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted_sorted.bam --FASTQ /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted.fastq --INTERLEAVE true --INCLUDE_NON_PF_READS true --OUTPUT_PER_RG false --COMPRESS_OUTPUTS_PER_RG false --RG_TAG PU --RE_REVERSE true --CLIPPING_MIN_LENGTH 0 --READ1_TRIM 0 --READ2_TRIM 0 --INCLUDE_NON_PRIMARY_ALIGNMENTS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:45:55 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    [Wed Jun 01 10:45:56 TRT 2022] picard.sam.SamToFastq done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=11599872
    [M::bwa_idx_load_from_disk] read 0 ALT contigs
    [M::process] read 48808 sequences (6581631 bp)...
    [M::process] 0 single-end sequences; 48808 paired-end sequences
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 24284, 0, 3)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (123, 164, 220)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 414)
    [M::mem_pestat] mean and std.dev: (176.13, 69.60)
    [M::mem_pestat] low and high boundaries for proper pairs: (1, 511)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] skip orientation RR as there are not enough pairs
    [M::mem_process_seqs] Processed 48808 reads in 1.622 CPU sec, 0.151 real sec
    [main] Version: 0.7.15-r1140
    [main] CMD: /gpfs/shared/WES_analyses/tools/bwa-0.7.15/bwa mem -t 11 -K 100000000 -p -v 3 -Y /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted.fastq
    [main] Real time: 0.233 sec; CPU: 1.697 sec
    10:45:57.839 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:45:58 TRT 2022] MergeBamAlignment --UNMAPPED_BAM /gpfs/shared/WES_analyses/hg38/Mitochondria/reverted_sorted.bam --ALIGNED_BAM /gpfs/shared/WES_analyses/hg38/Mitochondria/algined.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments.bam --SORT_ORDER queryname --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --ADD_PG_TAG_TO_READS true --PAIRED_RUN true --CLIP_ADAPTERS true --IS_BISULFITE_SEQUENCE false --ALIGNED_READS_ONLY false --MAX_INSERTIONS_OR_DELETIONS 1 --ATTRIBUTES_TO_REVERSE OQ --ATTRIBUTES_TO_REVERSE U2 --ATTRIBUTES_TO_REVERSE_COMPLEMENT E2 --ATTRIBUTES_TO_REVERSE_COMPLEMENT SQ --READ1_TRIM 0 --READ2_TRIM 0 --ALIGNER_PROPER_PAIR_FLAGS false --PRIMARY_ALIGNMENT_STRATEGY BestMapq --CLIP_OVERLAPPING_READS true --HARD_CLIP_OVERLAPPING_READS false --INCLUDE_SECONDARY_ALIGNMENTS true --ADD_MATE_CIGAR true --UNMAP_CONTAMINANT_READS false --MIN_UNCLIPPED_BASES 32 --MATCHING_DICTIONARY_TAGS M5 --MATCHING_DICTIONARY_TAGS LN --UNMAPPED_READ_STRATEGY DO_NOT_CHANGE --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:45:58 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:45:58     SamAlignmentMerger      Processing SAM file(s): [/gpfs/shared/WES_analyses/hg38/Mitochondria/algined.bam]
    INFO    2022-06-01 10:45:58     AbstractAlignmentMerger Seen many non-increasing record positions. Printing Read-names as well.
    INFO    2022-06-01 10:46:00     AbstractAlignmentMerger Wrote 49015 alignment records and 1 unmapped reads.
    [Wed Jun 01 10:46:00 TRT 2022] picard.sam.MergeBamAlignment done. Elapsed time: 0.04 minutes.
    Runtime.totalMemory()=13893632
    10:46:01.593 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:46:01 TRT 2022] SortSam --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_sorted.bam --SORT_ORDER coordinate --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:46:01 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:46:02     SortSam Seen many non-increasing record positions. Printing Read-names as well.
    INFO    2022-06-01 10:46:02     SortSam Finished reading inputs, merging and writing to output now.
    [Wed Jun 01 10:46:03 TRT 2022] picard.sam.SortSam done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=152502272
    10:46:04.460 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:46:04 TRT 2022] BuildBamIndex --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_sorted.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_sorted.bai --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:46:04 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:46:05     BuildBamIndex   Successfully wrote bam index file /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_sorted.bai
    [Wed Jun 01 10:46:05 TRT 2022] picard.sam.BuildBamIndex done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=13434880
    10:46:06.281 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:46:06 TRT 2022] MarkDuplicates --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_sorted.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_md.bam --METRICS_FILE /gpfs/shared/WES_analyses/hg38/Mitochondria/md_metrics.txt --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:46:06 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:46:06     MarkDuplicates  Start of doWork freeMemory: 3397304; totalMemory: 10092544; maxMemory: 171798691840
    INFO    2022-06-01 10:46:06     MarkDuplicates  Reading input file and constructing read end information.
    INFO    2022-06-01 10:46:06     MarkDuplicates  Will retain up to 622459028 data points before spilling to disk.
    INFO    2022-06-01 10:46:17     MarkDuplicates  Read 49016 records. 0 pairs never matched.
    INFO    2022-06-01 10:46:17     MarkDuplicates  After buildSortedReadEndLists freeMemory: 4332507216; totalMemory: 14313717760; maxMemory: 171798691840
    INFO    2022-06-01 10:46:17     MarkDuplicates  Will retain up to 2147483642 duplicate indices before spilling to disk.
    INFO    2022-06-01 10:46:26     MarkDuplicates  Traversing read pair information and detecting duplicates.
    INFO    2022-06-01 10:46:26     MarkDuplicates  Traversing fragment information and detecting duplicates.
    INFO    2022-06-01 10:46:26     MarkDuplicates  Sorting list of duplicate records.
    INFO    2022-06-01 10:46:26     MarkDuplicates  After generateDuplicateIndexes freeMemory: 21677040904; totalMemory: 38873202688; maxMemory: 171798691840
    INFO    2022-06-01 10:46:26     MarkDuplicates  Marking 3923 records as duplicates.
    INFO    2022-06-01 10:46:26     MarkDuplicates  Found 128 optical duplicate clusters.
    INFO    2022-06-01 10:46:26     MarkDuplicates  Reads are assumed to be ordered by: coordinate
    INFO    2022-06-01 10:46:27     MarkDuplicates  Writing complete. Closing input iterator.
    INFO    2022-06-01 10:46:27     MarkDuplicates  Duplicate Index cleanup.
    INFO    2022-06-01 10:46:27     MarkDuplicates  Getting Memory Stats.
    INFO    2022-06-01 10:46:27     MarkDuplicates  Before output close freeMemory: 38856731360; totalMemory: 38873202688; maxMemory: 171798691840
    INFO    2022-06-01 10:46:27     MarkDuplicates  Closed outputs. Getting more Memory Stats.
    INFO    2022-06-01 10:46:27     MarkDuplicates  After output close freeMemory: 36917999192; totalMemory: 36934123520; maxMemory: 171798691840
    [Wed Jun 01 10:46:27 TRT 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.35 minutes.
    Runtime.totalMemory()=36934123520
    10:46:28.826 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:46:29 TRT 2022] CollectWgsMetrics --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_sorted.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/cm_metrics.txt --REFERENCE_SEQUENCE /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta --MINIMUM_MAPPING_QUALITY 20 --MINIMUM_BASE_QUALITY 20 --COVERAGE_CAP 250 --LOCUS_ACCUMULATION_CAP 100000 --STOP_AFTER -1 --INCLUDE_BQ_HISTOGRAM false --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --USE_FAST_ALGORITHM false --READ_LENGTH 150 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:46:29 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:46:30     TheoreticalSensitivity  Creating Roulette Wheel
    INFO    2022-06-01 10:46:30     TheoreticalSensitivity  Calculating quality sums from quality sampler
    INFO    2022-06-01 10:46:30     TheoreticalSensitivity  0 sampling iterations completed
    INFO    2022-06-01 10:46:31     TheoreticalSensitivity  1000 sampling iterations completed
    INFO    2022-06-01 10:46:31     TheoreticalSensitivity  2000 sampling iterations completed
    INFO    2022-06-01 10:46:32     TheoreticalSensitivity  3000 sampling iterations completed
    INFO    2022-06-01 10:46:32     TheoreticalSensitivity  4000 sampling iterations completed
    INFO    2022-06-01 10:46:32     TheoreticalSensitivity  5000 sampling iterations completed
    INFO    2022-06-01 10:46:33     TheoreticalSensitivity  6000 sampling iterations completed
    INFO    2022-06-01 10:46:33     TheoreticalSensitivity  7000 sampling iterations completed
    INFO    2022-06-01 10:46:33     TheoreticalSensitivity  8000 sampling iterations completed
    INFO    2022-06-01 10:46:34     TheoreticalSensitivity  9000 sampling iterations completed
    INFO    2022-06-01 10:46:34     TheoreticalSensitivity  Calculating theoretical het sensitivity
    [Wed Jun 01 10:46:35 TRT 2022] picard.analysis.CollectWgsMetrics done. Elapsed time: 0.11 minutes.
    Runtime.totalMemory()=204996608
    10:46:36.367 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/picard-2.27.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 01 10:46:36 TRT 2022] BuildBamIndex --INPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_md.bam --OUTPUT /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_md.bai --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 01 10:46:36 TRT 2022] Executing as sselvi@server001.mesten.local on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; IBM J9 VM 8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30); Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.27.1
    INFO    2022-06-01 10:46:37     BuildBamIndex   Successfully wrote bam index file /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_md.bai
    [Wed Jun 01 10:46:37 TRT 2022] picard.sam.BuildBamIndex done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=10944512
    Using GATK jar /gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx160G -XX:ParallelGCThreads=44 -jar /gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar Mutect2 -I /gpfs/shared/WES_analyses/hg38/Mitochondria/merge_alignments_md.bam -R /gpfs/shared/WES_analyses/tools/gsutil/mito/Homo_sapiens_assembly38.chrM.fasta -L /gpfs/shared/WES_analyses/bedfiles/mito.bed -ip 100 -O /gpfs/shared/WES_analyses/hg38/Mitochondria/mito.vcf --mitochondria-mode --initial-tumor-lod 0 --tumor-lod-to-emit 0 --af-of-alleles-not-in-resource 4e-3 --pruning-lod-threshold -4
    10:46:40.022 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 01, 2022 10:46:40 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    10:46:40.313 INFO  Mutect2 - ------------------------------------------------------------
    10:46:40.313 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0
    10:46:40.313 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
    10:46:40.313 INFO  Mutect2 - Executing as sselvi@server001.mesten.local on Linux v3.10.0-1062.12.1.el7.x86_64 amd64
    10:46:40.313 INFO  Mutect2 - Java runtime: IBM J9 VM v8.0.5.30 - pxa6480sr5fp30-20190207_01(SR5 FP30)
    10:46:40.313 INFO  Mutect2 - Start Date/Time: June 1, 2022 10:46:40 AM TRT
    10:46:40.313 INFO  Mutect2 - ------------------------------------------------------------
    10:46:40.313 INFO  Mutect2 - ------------------------------------------------------------
    10:46:40.314 INFO  Mutect2 - HTSJDK Version: 2.23.0
    10:46:40.314 INFO  Mutect2 - Picard Version: 2.23.3
    10:46:40.314 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    10:46:40.314 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    10:46:40.314 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    10:46:40.314 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    10:46:40.314 INFO  Mutect2 - Deflater: IntelDeflater
    10:46:40.314 INFO  Mutect2 - Inflater: IntelInflater
    10:46:40.314 INFO  Mutect2 - GCS max retries/reopens: 20
    10:46:40.314 INFO  Mutect2 - Requester pays: disabled
    10:46:40.314 INFO  Mutect2 - Initializing engine
    10:46:40.735 INFO  FeatureManager - Using codec BEDCodec to read file file:///gpfs/shared/WES_analyses/bedfiles/mito.bed
    10:46:40.741 INFO  IntervalArgumentCollection - Processing 16569 bp from intervals
    10:46:40.745 INFO  Mutect2 - Done initializing engine
    10:46:40.758 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    10:46:40.759 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/gpfs/shared/WES_analyses/tools/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    10:46:40.788 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    10:46:40.788 INFO  IntelPairHmm - Available threads: 44
    10:46:40.788 INFO  IntelPairHmm - Requested threads: 4
    10:46:40.788 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    10:46:40.826 INFO  ProgressMeter - Starting traversal
    10:46:40.826 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
    10:46:51.747 INFO  ProgressMeter -            chrM:6732              0.2                    40            219.8
    10:47:01.799 INFO  ProgressMeter -           chrM:13824              0.3                    80            228.9
    10:47:03.117 INFO  Mutect2 - 2 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappingQualityNotZeroReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    3923 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    0 read(s) filtered by: ReadLengthReadFilter
    0 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    3925 total reads filtered
    10:47:03.118 INFO  ProgressMeter -           chrM:15664              0.4                    95            255.7
    10:47:03.118 INFO  ProgressMeter - Traversal complete. Processed 95 total regions in 0.4 minutes.
    10:47:03.129 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.021642380000000003
    10:47:03.129 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 8.420165577
    10:47:03.129 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 3.52 sec
    10:47:03.129 INFO  Mutect2 - Shutting down engine
    [June 1, 2022 10:47:03 AM TRT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.39 minutes.
    Runtime.totalMemory()=133169152
    Tool returned:
    SUCCESS

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    From glancing through the logs, it looks good! If you have a VCF that looks reasonable, that's a really good start too.

    One point I should make is that we do this whole alignment to the shifted reference to gain sensitivity at the breakpoint, but we also align and variant call on the regular chrM reference as well. We don't want to lose sensitivity at our new shifted breakpoint. I'm not sure if you're running this pipeline twice on the two different references or not, so I wanted to point that out. In the pipeline we call only over the control region on the shifted reference and then call on the rest of the mitochondria on the unshifted reference. Then we combine those two output VCFs.

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hi Megan,

    Thank you to inform me. I think I should run the pipeline twice with another reference which is Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta . Is it right?

    Also for both pipelines, interval list should be the same or different considering control regions?

    My current interval list:

    mito.bed

    chrM    1       16569

    There are two different interval lists shared in reference files.

    control_region_shifted.chrM.interval_list
    chrM    8025    9144    +       .


    non_control_region.interval_list
    chrM    576     16024   +       .


    Where to use them?


    Thank you for your time

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Yes that's right (using the shifted_by_8000_bases.fasta). You should use those two interval lists when variant calling (one on the shifted version and one on the regular version) so that you only call variants in the control region on the shifted reference and the rest of the mitochondria on the regular reference.

    Also, another note, if you are not looking for variants in the control region, you could skip all of this extra realignment to the shifted reference and just use the regular reference.

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Thank you!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk