Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(How to) Run the Pathseq pipeline Follow


  • Avatar

    Hi, GATK team,

    I recently used the GATK PathSeq pipeline to detect virus from RNA-seq data. It seems succeeded, the last few lines of the log file are shown below:

    20/03/04 10:03:30 INFO TaskSchedulerImpl: Removed TaskSet 43.0, whose tasks have all completed, from pool
    20/03/04 10:03:30 INFO DAGScheduler: ResultStage 43 (foreach at finished in 2.808 s
    20/03/04 10:03:30 INFO DAGScheduler: Job 9 finished: foreach at, took 2.810214 s
    20/03/04 10:03:30 INFO SparkUI: Stopped Spark web UI at http://bioRUN:4040
    20/03/04 10:03:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    20/03/04 10:03:32 INFO MemoryStore: MemoryStore cleared
    20/03/04 10:03:32 INFO BlockManager: BlockManager stopped
    20/03/04 10:03:32 INFO BlockManagerMaster: BlockManagerMaster stopped
    20/03/04 10:03:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    20/03/04 10:03:32 INFO SparkContext: Successfully stopped SparkContext
    10:03:32.092 INFO PathSeqPipelineSpark - Shutting down engine
    [March 4, 2020 10:03:32 AM EST] done. Elapsed time: 7.70 minutes.
    20/03/04 10:03:32 INFO ShutdownHookManager: Shutdown hook called
    20/03/04 10:03:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-a39060d4-c445-4785-bcce-02d955a812a5

    But, the results of output files confused me. I obtained many dsDNA_viruses,_no_RNA_stage from RNA-seq data. If this viruses haven't RNA stage, Why could I detect it from RNA-seq data? some results are shown below:

    196896 root|Viruses|dsDNA_viruses,_no_RNA_stage|Caudovirales|Myoviridae|unclassified_Myoviridae no_rank unclassified_Myoviridae Viruses 0.17882352941176471 5.697151424287859 2 0 0
    197310 root|Viruses|dsDNA_viruses,_no_RNA_stage|Caudovirales|Myoviridae|Tevenvirinae|T4virus|unclassified_T4virus|Enterobacteria_phage_RB14 species Enterobacteria_phage_RB14 Viruses 0.039607843137254906 1.261869065467267 2 0 165429
    66711 root|Viruses|dsDNA_viruses,_no_RNA_stage|Caudovirales|Myoviridae|Tevenvirinae|T4virus|Escherichia_virus_AR1|Escherichia_phage_AR1 no_rank Escherichia_phage_AR1 Viruses 0.039607843137254906 1.261869065467267 2 0 167435
    329380 root|Viruses|dsDNA_viruses,_no_RNA_stage|Caudovirales|Myoviridae|Tevenvirinae|T4virus|unclassified_T4virus no_rank unclassified_T4virus Viruses 0.43568627450980385 13.88055972013994 2 0 0

    Any help greatly appreciated. Thank you!


    Comment actions Permalink
  • Avatar
    Christopher Koch

    Are there pre-built GRCh37 reference files available somewhere? I only see a link to GRCh38 references.

    Comment actions Permalink
  • Avatar
    zhan li
    Download tutorial_10913.tar.gz from the ftp site. Extract the archive with the command:

    Sorry, the ftp link is unavaliable, how can I get the data?

    Comment actions Permalink
  • Avatar
    Joe Li

    The FTP link for the tutorial files are no longer available. Please help?

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk