PathSeqPipelineSpark successfully but no result
GATK version :4.0.5.1 ,install by conda
command used:
gatk --java-options "-Xmx750g" PathSeqPipelineSpark \
--input ${file} \
--filter-bwa-image ${PathSeqDB}/pathseq_host.fa.img \
--kmer-file ${PathSeqDB}/pathseq_host.bfi \
--min-clipped-read-length 60 \
--microbe-fasta ${PathSeqDB}/pathseq_microbe.fa \
--microbe-bwa-image ${PathSeqDB}/pathseq_microbe.fa.img \
--taxonomy-file ${PathSeqDB}/pathseq_taxonomy.db \
--output ${PathSeqOutPath}/${id}.pathseq.complete.bam \
--scores-output ${PathSeqOutPath}/${id}.pathseq.complete.csv \
--is-host-aligned false \
--filter-duplicates false \
--min-score-identity .7
Log:
23/06/08 17:43:36 INFO DAGScheduler: ResultStage 31 (foreach at BwaMemIndexCache.java:84) finished in 0.730 s
23/06/08 17:43:36 INFO DAGScheduler: Job 7 finished: foreach at BwaMemIndexCache.java:84, took 0.744074 s
23/06/08 17:43:36 INFO SparkUI: Stopped Spark web UI at http://192.168.110.181:4040
23/06/08 17:43:36 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/06/08 17:43:37 INFO MemoryStore: MemoryStore cleared
23/06/08 17:43:37 INFO BlockManager: BlockManager stopped
23/06/08 17:43:37 INFO BlockManagerMaster: BlockManagerMaster stopped
23/06/08 17:43:37 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/06/08 17:43:37 INFO SparkContext: Successfully stopped SparkContext
17:43:37.037 INFO PathSeqPipelineSpark - Shutting down engine
[2023年6月8日 下午05时43分37秒] org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqPipelineSpark done. Elapsed time: 68.04 minutes.
Runtime.totalMemory()=90018152448
23/06/08 17:43:37 INFO ShutdownHookManager: Shutdown hook called
23/06/08 17:43:37 INFO ShutdownHookManager: Deleting directory /tmp/jiajiangang/spark-e2642d63-0468-4e18-8bbb-2557f9058fc5
Although it ran successfully and generated three files in the result directory, the composition csv file containing microorganisms is empty, what should I do?
Anything missing here?
Best and Thank you。
-Yaer
-
Although I use Picard check my bam file ,but anything look like was fine。
picard ValidateSamFile \
I=add-NMTag.bam \
MODE=SUMMARYlog:
INFO 2023-06-08 16:20:16 ValidateSamFile
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** ValidateSamFile -I add-NMTag.bam -MODE SUMMARY
**********
16:20:16.944 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/jiajiangang/.conda/envs/gatk4/share/picard-2.18.29-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jun 08 16:20:16 CST 2023] ValidateSamFile INPUT=add-NMTag.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jun 08 16:20:16 CST 2023] Executing as jiajiangang@life on Linux 5.4.0-148-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_332-b09; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.29-SNAPSHOT
WARNING 2023-06-08 16:20:16 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO 2023-06-08 16:20:54 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:37s. Time for last 10,000,000: 37s. Last read position: 10:7,585,906
INFO 2023-06-08 16:21:30 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:01:13s. Time for last 10,000,000: 36s. Last read position: 11:117,203,091
INFO 2023-06-08 16:22:06 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:01:49s. Time for last 10,000,000: 36s. Last read position: 14:50,476,168
INFO 2023-06-08 16:22:42 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:02:25s. Time for last 10,000,000: 36s. Last read position: 17:8,466,985
INFO 2023-06-08 16:23:19 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:03:02s. Time for last 10,000,000: 36s. Last read position: 19:45,473,437
INFO 2023-06-08 16:23:55 SamFileValidator Validated Read 60,000,000 records. Elapsed time: 00:03:38s. Time for last 10,000,000: 36s. Last read position: 2:231,460,810
INFO 2023-06-08 16:24:31 SamFileValidator Validated Read 70,000,000 records. Elapsed time: 00:04:14s. Time for last 10,000,000: 36s. Last read position: 21:8,442,553
INFO 2023-06-08 16:25:08 SamFileValidator Validated Read 80,000,000 records. Elapsed time: 00:04:51s. Time for last 10,000,000: 36s. Last read position: 4:6,576,582
INFO 2023-06-08 16:25:44 SamFileValidator Validated Read 90,000,000 records. Elapsed time: 00:05:27s. Time for last 10,000,000: 36s. Last read position: 6:31,816,198
INFO 2023-06-08 16:26:20 SamFileValidator Validated Read 100,000,000 records. Elapsed time: 00:06:03s. Time for last 10,000,000: 36s. Last read position: 7:108,569,128
INFO 2023-06-08 16:26:56 SamFileValidator Validated Read 110,000,000 records. Elapsed time: 00:06:39s. Time for last 10,000,000: 36s. Last read position: MT:2,620
INFO 2023-06-08 16:27:32 SamFileValidator Validated Read 120,000,000 records. Elapsed time: 00:07:15s. Time for last 10,000,000: 35s. Last read position: GL000220.1:109,245
No errors found
[Thu Jun 08 16:27:46 CST 2023] picard.sam.ValidateSamFile done. Elapsed time: 7.49 minutes.
Runtime.totalMemory()=715128832 -
Hi Yaer, was the output bam empty as well?
-
Hi~ the output bam wasn`t empty,but all reads was pair in my input。Could it be my input problem?(T__T)
-
There are a few possibilities that could cause this, for example reads are not meeting default quality settings, or microbes are not present in the reference. Can you re-run PathSeq with the --filter-metrics and --score-metrics options and post them?
-
Sorry, I'm busy these days.
score_metrics.txt look like that:
(gatk4) (xunyin) jiajiangang@life:~/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq$ cat add-NMTag.score_metrics.txt
## htsjdk.samtools.metrics.StringHeader
# PathSeqPipelineSpark --is-host-aligned false --kmer-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.bfi --min-clipped-read-length 60 --filter-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.fa.img --filter-duplicates false --filter-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.filter_metrics.txt --microbe-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa.img --microbe-fasta /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa --scores-output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.csv --taxonomy-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_taxonomy.db --min-score-identity 0.7 --score-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.score_metrics.txt --output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.bam --input /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/TempData/addgourpreed/add-NMTag.bam --skip-quality-filters false --skip-pre-bwa-repartition false --max-masked-bases 2 --min-base-quality 15 --quality-threshold 15 --dust-mask-quality 2 --dust-window 64 --dust-t 20.0 --host-min-identity 30 --host-kmer-thresh 1 --filter-bwa-seed-length 19 --filter-reads-per-partition 200000 --min-adapter-length 12 --max-adapter-mismatches 1 --microbe-min-seed-length 19 --max-alternate-hits 5000 --bwa-score-threshold 30 --identity-margin 0.02 --divide-by-genome-length false --not-normalized-by-kingdom false --score-reads-per-partition-estimate 200000 --pipeline-reads-per-partition 5000 --readsPerPartitionOutput 1000000 --read-validation-stringency SILENT --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --bam-partition-size 0 --disable-sequence-dictionary-validation false --sharded-output false --num-reducers 0 --spark-master local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false
## htsjdk.samtools.metrics.StringHeader
# Started on: 2023年6月18日 下午05时31分08秒
## METRICS CLASS org.broadinstitute.hellbender.tools.spark.pathseq.loggers.PSScoreMetrics
MAPPED_READS UNMAPPED_READS
0 343And the filter_metrics.txt look like that:
cat add-NMTag.filter_metrics.txt
## htsjdk.samtools.metrics.StringHeader
# PathSeqPipelineSpark --is-host-aligned false --kmer-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.bfi --min-clipped-read-length 60 --filter-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_host.fa.img --filter-duplicates false --filter-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.filter_metrics.txt --microbe-bwa-image /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa.img --microbe-fasta /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_microbe.fa --scores-output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.csv --taxonomy-file /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Data/Reference/PathseqReference/pathseq_taxonomy.db --min-score-identity 0.7 --score-metrics /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.score_metrics.txt --output /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/Result/PathSeq/add-NMTag.pathseq.complete.bam --input /home/jiajiangang/10T_DISK/RunIntratumoralMicrobiota_by_xunyin/TempData/addgourpreed/add-NMTag.bam --skip-quality-filters false --skip-pre-bwa-repartition false --max-masked-bases 2 --min-base-quality 15 --quality-threshold 15 --dust-mask-quality 2 --dust-window 64 --dust-t 20.0 --host-min-identity 30 --host-kmer-thresh 1 --filter-bwa-seed-length 19 --filter-reads-per-partition 200000 --min-adapter-length 12 --max-adapter-mismatches 1 --microbe-min-seed-length 19 --max-alternate-hits 5000 --bwa-score-threshold 30 --identity-margin 0.02 --divide-by-genome-length false --not-normalized-by-kingdom false --score-reads-per-partition-estimate 200000 --pipeline-reads-per-partition 5000 --readsPerPartitionOutput 1000000 --read-validation-stringency SILENT --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --bam-partition-size 0 --disable-sequence-dictionary-validation false --sharded-output false --num-reducers 0 --spark-master local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false
## htsjdk.samtools.metrics.StringHeader
# Started on: 2023年6月18日 下午05时31分08秒
## METRICS CLASS org.broadinstitute.hellbender.tools.spark.pathseq.loggers.PSFilterMetrics
PRIMARY_READS READS_AFTER_PREALIGNED_HOST_FILTER READS_AFTER_QUALITY_AND_COMPLEXITY_FILTER READS_AFTER_HOST_FILTER READS_AFTER_DEDUPLICATION FINAL_PAIRED_READS FINAL_UNPAIRED_READS FINAL_TOTAL_READS LOW_QUALITY_OR_LOW_COMPLEXITY_READS_FILTERED HOST_READS_FILTERED DUPLICATE_READS_FILTERED
65176741 65176741 30944412 343 343 0 343 343 34232329 30944069 0Look like some reads can`t paired that.
Do I need to consider reusing SART to process fq files and add unpaired reads to bam files?
At the same time, thank you for your reply these days.
(˶˚ ᗨ ˚˶)
-Yaer
-
Sorry,a slip of the tongue,it's STAR.(T__T)
-
Actually it looks like all the reads except 343 are identified as host, and none of the 343 match anything in the microbe reference. Have you identified microbes in this sample with any other tools?
-
Thank you for your help~
I re-align the reads with STAR,add some parameter: --outSAMunmapped ,'Within' ,Make it can to output the bam which has unmapped reads,then it works perfectly.
I find that it is usually missing RG tag and NM tag is the reason why it does not work,so use it add @RG
picard AddOrReplaceReadGroups \
-I {input_bam} \
-O {output_bam} \
--RGID sample1 \
--RGLB lib1 \
--RGPL illumina \
--RGPU unit1 \
--RGSM sample1And use samtools add @NM,Ref_path is the reference file path you use when using the comparison software,like STAR
samtools calmd {input_bam} {Ref_path} -b > {output_bam}
Do you have anything to add?
Thank you so much for your assistance! I truly appreciate it.
Best wishes to you!
-Yaer
-
(⑅˃◡˂⑅)
Please sign in to leave a comment.
9 comments