Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

PathSeqPipelineSpark successfully but no result

0

9 comments

  • Avatar
    Yaer

    Although I use Picard check my bam file ,but anything look like was fine。

    picard ValidateSamFile \
    I=add-NMTag.bam \
    MODE=SUMMARY

    log:

    INFO    2023-06-08 16:20:16     ValidateSamFile

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    **********    ValidateSamFile -I add-NMTag.bam -MODE SUMMARY
    **********


    16:20:16.944 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jiajiangang/.conda/envs/gatk4/share/picard-2.18.29-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Jun 08 16:20:16 CST 2023] ValidateSamFile INPUT=add-NMTag.bam MODE=SUMMARY    MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Jun 08 16:20:16 CST 2023] Executing as jiajiangang@life on Linux 5.4.0-148-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_332-b09; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.29-SNAPSHOT
    WARNING 2023-06-08 16:20:16     ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
    INFO    2023-06-08 16:20:54     SamFileValidator        Validated Read    10,000,000 records.  Elapsed time: 00:00:37s.  Time for last 10,000,000:   37s.  Last read position: 10:7,585,906
    INFO    2023-06-08 16:21:30     SamFileValidator        Validated Read    20,000,000 records.  Elapsed time: 00:01:13s.  Time for last 10,000,000:   36s.  Last read position: 11:117,203,091
    INFO    2023-06-08 16:22:06     SamFileValidator        Validated Read    30,000,000 records.  Elapsed time: 00:01:49s.  Time for last 10,000,000:   36s.  Last read position: 14:50,476,168
    INFO    2023-06-08 16:22:42     SamFileValidator        Validated Read    40,000,000 records.  Elapsed time: 00:02:25s.  Time for last 10,000,000:   36s.  Last read position: 17:8,466,985
    INFO    2023-06-08 16:23:19     SamFileValidator        Validated Read    50,000,000 records.  Elapsed time: 00:03:02s.  Time for last 10,000,000:   36s.  Last read position: 19:45,473,437
    INFO    2023-06-08 16:23:55     SamFileValidator        Validated Read    60,000,000 records.  Elapsed time: 00:03:38s.  Time for last 10,000,000:   36s.  Last read position: 2:231,460,810
    INFO    2023-06-08 16:24:31     SamFileValidator        Validated Read    70,000,000 records.  Elapsed time: 00:04:14s.  Time for last 10,000,000:   36s.  Last read position: 21:8,442,553
    INFO    2023-06-08 16:25:08     SamFileValidator        Validated Read    80,000,000 records.  Elapsed time: 00:04:51s.  Time for last 10,000,000:   36s.  Last read position: 4:6,576,582
    INFO    2023-06-08 16:25:44     SamFileValidator        Validated Read    90,000,000 records.  Elapsed time: 00:05:27s.  Time for last 10,000,000:   36s.  Last read position: 6:31,816,198
    INFO    2023-06-08 16:26:20     SamFileValidator        Validated Read   100,000,000 records.  Elapsed time: 00:06:03s.  Time for last 10,000,000:   36s.  Last read position: 7:108,569,128
    INFO    2023-06-08 16:26:56     SamFileValidator        Validated Read   110,000,000 records.  Elapsed time: 00:06:39s.  Time for last 10,000,000:   36s.  Last read position: MT:2,620
    INFO    2023-06-08 16:27:32     SamFileValidator        Validated Read   120,000,000 records.  Elapsed time: 00:07:15s.  Time for last 10,000,000:   35s.  Last read position: GL000220.1:109,245
    No errors found
    [Thu Jun 08 16:27:46 CST 2023] picard.sam.ValidateSamFile done. Elapsed time: 7.49 minutes.
    Runtime.totalMemory()=715128832

    0
    Comment actions Permalink
  • Avatar
    Mark Walker

    Hi Yaer, was the output bam empty as well?

    0
    Comment actions Permalink
  • Avatar
    Yaer

    Hi~ the output bam wasn`t empty,but all reads was pair in my input。Could it be my input problem?(T__T)

    0
    Comment actions Permalink
  • Avatar
    Mark Walker

    There are a few possibilities that could cause this, for example reads are not meeting default quality settings, or microbes are not present in the reference. Can you re-run PathSeq with the --filter-metrics and --score-metrics options and post them?

    0
    Comment actions Permalink
  • Avatar
    Yaer

    Sorry, I'm busy these days.

    score_metrics.txt look like that:

    (gatk4) (xunyin) jiajiangang@life:~/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq$ cat add-NMTag.score_metrics.txt
    ## htsjdk.samtools.metrics.StringHeader
    # PathSeqPipelineSpark  --is-host-aligned false --kmer-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.bfi --min-clipped-read-length 60 --filter-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.fa.img --filter-duplicates false --filter-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.filter_metrics.txt --microbe-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa.img --microbe-fasta /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa --scores-output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.csv --taxonomy-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_taxonomy.db --min-score-identity 0.7 --score-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.score_metrics.txt --output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.bam --input /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/TempData/addgourpreed/add-NMTag.bam  --skip-quality-filters false --skip-pre-bwa-repartition false --max-masked-bases 2 --min-base-quality 15 --quality-threshold 15 --dust-mask-quality 2 --dust-window 64 --dust-t 20.0 --host-min-identity 30 --host-kmer-thresh 1 --filter-bwa-seed-length 19 --filter-reads-per-partition 200000 --min-adapter-length 12 --max-adapter-mismatches 1 --microbe-min-seed-length 19 --max-alternate-hits 5000 --bwa-score-threshold 30 --identity-margin 0.02 --divide-by-genome-length false --not-normalized-by-kingdom false --score-reads-per-partition-estimate 200000 --pipeline-reads-per-partition 5000 --readsPerPartitionOutput 1000000 --read-validation-stringency SILENT --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --bam-partition-size 0 --disable-sequence-dictionary-validation false --sharded-output false --num-reducers 0 --spark-master local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false
    ## htsjdk.samtools.metrics.StringHeader
    # Started on: 2023年6月18日 下午05时31分08秒
    ## METRICS CLASS        org.broadinstitute.hellbender.tools.spark.pathseq.loggers.PSScoreMetrics
    MAPPED_READS    UNMAPPED_READS
    0       343

    And the filter_metrics.txt look like that:

     cat add-NMTag.filter_metrics.txt
    ## htsjdk.samtools.metrics.StringHeader
    # PathSeqPipelineSpark  --is-host-aligned false --kmer-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.bfi --min-clipped-read-length 60 --filter-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.fa.img --filter-duplicates false --filter-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.filter_metrics.txt --microbe-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa.img --microbe-fasta /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa --scores-output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.csv --taxonomy-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_taxonomy.db --min-score-identity 0.7 --score-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.score_metrics.txt --output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.bam --input /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/TempData/addgourpreed/add-NMTag.bam  --skip-quality-filters false --skip-pre-bwa-repartition false --max-masked-bases 2 --min-base-quality 15 --quality-threshold 15 --dust-mask-quality 2 --dust-window 64 --dust-t 20.0 --host-min-identity 30 --host-kmer-thresh 1 --filter-bwa-seed-length 19 --filter-reads-per-partition 200000 --min-adapter-length 12 --max-adapter-mismatches 1 --microbe-min-seed-length 19 --max-alternate-hits 5000 --bwa-score-threshold 30 --identity-margin 0.02 --divide-by-genome-length false --not-normalized-by-kingdom false --score-reads-per-partition-estimate 200000 --pipeline-reads-per-partition 5000 --readsPerPartitionOutput 1000000 --read-validation-stringency SILENT --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --bam-partition-size 0 --disable-sequence-dictionary-validation false --sharded-output false --num-reducers 0 --spark-master local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false
    ## htsjdk.samtools.metrics.StringHeader
    # Started on: 2023年6月18日 下午05时31分08秒
    ## METRICS CLASS        org.broadinstitute.hellbender.tools.spark.pathseq.loggers.PSFilterMetrics
    PRIMARY_READS   READS_AFTER_PREALIGNED_HOST_FILTER      READS_AFTER_QUALITY_AND_COMPLEXITY_FILTER       READS_AFTER_HOST_FILTER    READS_AFTER_DEDUPLICATION       FINAL_PAIRED_READS      FINAL_UNPAIRED_READS    FINAL_TOTAL_READS LOW_QUALITY_OR_LOW_COMPLEXITY_READS_FILTERED     HOST_READS_FILTERED     DUPLICATE_READS_FILTERED
    65176741        65176741        30944412        343     343     0       343     343     34232329        30944069  0

    Look like some reads can`t paired that.

    Do I need to consider reusing SART to process fq files and add unpaired reads to bam files?

    At the same time, thank you for your reply these days.

    (˶˚ ᗨ ˚˶)

    -Yaer

    0
    Comment actions Permalink
  • Avatar
    Yaer

    Sorry,a slip of the tongue,it's STAR.(T__T)

    0
    Comment actions Permalink
  • Avatar
    Mark Walker

    Actually it looks like all the reads except 343 are identified as host, and none of the 343 match anything in the microbe reference. Have you identified microbes in this sample with any other tools?

    0
    Comment actions Permalink
  • Avatar
    Yaer

    Thank you for your help~

    I re-align the reads with STAR,add some parameter: --outSAMunmapped ,'Within' ,Make it can to output the bam which has unmapped reads,then it works perfectly.

    I find that it is usually missing RG tag and NM tag is the reason why it does not work,so use it add @RG

    picard AddOrReplaceReadGroups \
                                  -I {input_bam} \
                                  -O {output_bam} \
                                    --RGID sample1 \
                                    --RGLB lib1 \
                                    --RGPL illumina \
                                    --RGPU unit1 \
                                    --RGSM sample1

    And use samtools add @NM,Ref_path is the reference file path you use when using the comparison software,like STAR

    samtools calmd {input_bam} {Ref_path} -b > {output_bam}

    Do you have anything to add?

    Thank you so much for your assistance! I truly appreciate it.

    Best wishes to you!

    -Yaer

     

    0
    Comment actions Permalink
  • Avatar
    Yaer

    (⑅˃◡˂⑅)

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk