Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Hg38 RNAseq germline variant calling – BQSR steps

Answered
0

43 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi manolis,

    Could you fill in your commands with the actual commands you are using? You have many variables so it is challenging to see if something might be going wrong.

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    manolis

    Hi

    gatk-4.1.9.0/gatk BaseRecalibrator \
    -R Homo_sapiens_assembly38.fasta \
    -I "test.SplitNCigar.bam" \
    -O "test.recal_data.csv" \
    -L "hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list" \
    -ip 12 \
    --use-original-qualities \
    --known-sites Homo_sapiens_assembly38.dbsnp138.vcf \
    --known-sites Homo_sapiens_assembly38.known_indels.vcf.gz \
    --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

    gatk-4.1.9.0/gatk ApplyBQSR \
    -R Homo_sapiens_assembly38.fasta \
    -I "test.SplitNCigar.bam" \
    -O "test_RNAseq_bqsr.bam" \
    --bqsr-recal-file "test.recal_data.csv" \
    --add-output-sam-program-record \
    --use-original-qualities

    gatk-4.1.9.0/gatk HaplotypeCaller \
    -R Homo_sapiens_assembly38.fasta \
    -I "test_RNAseq_bqsr.bam" \
    -O "test_confidence_0.vcf.gz" \
    -bamout "test_confidence_0.bam" \
    -L "hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list" \
    -ip 12 \
    -dont-use-soft-clipped-bases \
    --standard-min-confidence-threshold-for-calling 0

    If you need more information or the csv table let me know.

    Best

     

     

     

    0
    Comment actions Permalink
  • Avatar
    manolis

    I will try to exclude the low coverage regions, could this help?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    manolis how large is the test data you are running this on? BQSR needs a lot of data to be successful, around 100M bases per read group. 

    See this document for more details: https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR-

    0
    Comment actions Permalink
  • Avatar
    manolis

    starting from the "test.SplitNCigar.bam" file, where the duplicate reads are already marked, I done:

    samtools depth -a -b "hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list" "test.SplitNCigar.bam" > "test_BaseCov.bed"

    cat "test_BaseCov.bed" | awk '{sum+=$3} END {print sum}'

    1,109,051,245 total mapped bases

    I have one read group

    0
    Comment actions Permalink
  • Avatar
    manolis

    > Running BaseRecalibrator, log files

    1973854 read(s) filtered by: MappingQualityNotZeroReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    61295789 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: WellformedReadFilter
    63269643 total reads filtered
    11:23:25.266 INFO ProgressMeter - Traversal complete. Processed 16320344 total reads in 6.9 minutes.
    11:23:25.336 INFO BaseRecalibrator - Calculating quantized quality scores...
    11:23:25.354 INFO BaseRecalibrator - Writing recalibration report...
    11:23:25.551 INFO BaseRecalibrator - ...done!
    11:23:25.551 INFO BaseRecalibrator - BaseRecalibrator was able to recalibrate 16320344 reads
    11:23:25.551 INFO BaseRecalibrator - Shutting down engine

    > "test.recal_data.csv" file

    #:GATKTable:RecalTable0:
    ReadGroup EventType EmpiricalQuality EstimatedQReported Observations Errors
    H72C2DSXY.4.22 M 10.0000 29.8223 1049274284 746315906.00

    #:GATKTable:6:3:%s:%d:%s:%.4f:%d:%.2f:;
    #:GATKTable:RecalTable1:
    ReadGroup QualityScore EventType EmpiricalQuality Observations Errors
    H72C2DSXY.4.22 11 M 2.0000 10221118 7318586.00
    H72C2DSXY.4.22 25 M 6.0000 24938818 17792584.00
    H72C2DSXY.4.22 37 M 18.0000 1014114348 721204736.00

    > Running ApplyBQSR, log files

    11:50:13.756 INFO ApplyBQSR - 0 read(s) filtered by: WellformedReadFilter11:50:13.757 INFO ProgressMeter - Traversal complete. Processed 85746751 total reads in 21.1 minutes.
    11:50:13.865 INFO ApplyBQSR - Shutting down engine

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for looking into that - could you share the complete error message from HaplotypeCaller when you get no results?

    0
    Comment actions Permalink
  • Avatar
    manolis

    Unfortunately I do not have any clear error :( If you need my bam file I can send you it. Here the HaplotypeCaller caller logs, but from another sample (RNA106):

    sing GATK jar /home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -I RNA106_bqsr.bam -O /home/manolis/GATK4190-v1-RNAseq/sample_RNA106/VCF/RNA106_conf00.vcf.gz -bamout /home/manolis/GATK4190-v1-RNAseq/sample_RNA106/VCF/RNA106_bamout_conf00.bam -L /home/manolis/GATK4190-v1-RNAseq/intervals_gtf/hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list -ip 12 -dont-use-soft-clipped-bases --standard-min-confidence-threshold-for-calling 0
    23:00:07.616 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 21, 2021 11:00:07 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    23:00:07.806 INFO HaplotypeCaller - ------------------------------------------------------------
    23:00:07.806 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
    23:00:07.807 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    23:00:07.807 INFO HaplotypeCaller - Executing as manolis@apollo3 on Linux v4.4.0-137-generic amd64
    23:00:07.807 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    23:00:07.807 INFO HaplotypeCaller - Start Date/Time: January 21, 2021 11:00:07 PM CET
    23:00:07.807 INFO HaplotypeCaller - ------------------------------------------------------------
    23:00:07.807 INFO HaplotypeCaller - ------------------------------------------------------------
    23:00:07.808 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
    23:00:07.808 INFO HaplotypeCaller - Picard Version: 2.23.3
    23:00:07.808 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:00:07.808 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:00:07.808 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:00:07.808 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:00:07.808 INFO HaplotypeCaller - Deflater: IntelDeflater
    23:00:07.808 INFO HaplotypeCaller - Inflater: IntelInflater
    23:00:07.809 INFO HaplotypeCaller - GCS max retries/reopens: 20
    23:00:07.809 INFO HaplotypeCaller - Requester pays: disabled
    23:00:07.809 INFO HaplotypeCaller - Initializing engine
    23:00:08.573 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/manolis/GATK4190-v1-RNAseq/intervals_gtf/hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list
    23:00:10.328 INFO IntervalArgumentCollection - Processing 133758640 bp from intervals
    23:00:10.525 INFO HaplotypeCaller - Done initializing engine
    23:00:10.621 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
    23:00:10.801 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    23:00:10.804 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    23:00:10.871 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    23:00:10.871 INFO IntelPairHmm - Available threads: 32
    23:00:10.871 INFO IntelPairHmm - Requested threads: 4
    23:00:10.871 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    23:00:10.952 INFO ProgressMeter - Starting traversal
    23:00:10.954 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
    23:00:21.055 INFO ProgressMeter - chr1:42697168 0.2 15170 90118.8
    23:00:31.054 INFO ProgressMeter - chr1:157183148 0.3 38750 115671.6
    23:00:41.057 INFO ProgressMeter - chr2:27040699 0.5 64580 128718.1
    23:00:51.057 INFO ProgressMeter - chr2:196199760 0.7 95690 143166.3
    23:01:01.084 INFO ProgressMeter - chr3:49358044 0.8 117770 140960.3
    23:01:11.087 INFO ProgressMeter - chr4:39823602 1.0 150960 150626.1
    23:01:21.093 INFO ProgressMeter - chr5:103119133 1.2 185860 158992.9
    23:01:31.093 INFO ProgressMeter - chr6:46480257 1.3 214760 160790.6
    23:01:41.105 INFO ProgressMeter - chr7:99391166 1.5 251510 167394.3
    23:01:51.105 INFO ProgressMeter - chr8:103912041 1.7 281480 168635.0
    23:02:01.114 INFO ProgressMeter - chr9:136993620 1.8 313340 170666.0
    23:02:11.360 INFO ProgressMeter - chr11:5232975 2.0 343920 171380.2
    23:02:21.360 INFO ProgressMeter - chr11:94635712 2.2 364460 167688.6
    23:02:31.366 INFO ProgressMeter - chr12:57098493 2.3 389020 166233.7
    23:02:41.366 INFO ProgressMeter - chr14:20376092 2.5 419930 167511.9
    23:02:51.366 INFO ProgressMeter - chr15:70616435 2.7 453430 169599.5
    23:03:01.372 INFO ProgressMeter - chr16:71732098 2.8 480380 169131.0
    23:03:11.372 INFO ProgressMeter - chr17:50755973 3.0 504860 167897.7
    23:03:21.410 INFO ProgressMeter - chr19:3053783 3.2 529550 166825.9
    23:03:31.413 INFO ProgressMeter - chr19:45792304 3.3 548340 164126.2
    23:03:41.416 INFO ProgressMeter - chr21:25770492 3.5 573860 163600.9
    23:03:51.508 INFO ProgressMeter - chrX:55015627 3.7 600500 163362.1
    23:03:59.292 INFO HaplotypeCaller - 1267247 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    0 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    0 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    1267247 total reads filtered
    23:03:59.292 INFO ProgressMeter - chrM:15403 3.8 615166 161646.2
    23:03:59.292 INFO ProgressMeter - Traversal complete. Processed 615166 total regions in 3.8 minutes.
    23:03:59.297 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
    23:03:59.297 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
    23:03:59.297 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
    23:03:59.474 INFO HaplotypeCaller - Shutting down engine
    [January 21, 2021 11:03:59 PM CET] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 3.87 minutes.
    Runtime.totalMemory()=3078094848

    0
    Comment actions Permalink
  • Avatar
    manolis

    Here HaplotypeCaller logs from the same sample (RNA106) but without the run of the BQSR steps

     

    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -I RNA106_RNAseq.bam -O /home/manolis/GATK4190-v1-RNAseq/sample_RNA106/VCF/RNA106_conf00.vcf.gz -bamout /home/manolis/GATK4190-v1-RNAseq/sample_RNA106/VCF/RNA106_bamout_conf00.bam -L /home/manolis/GATK4190-v1-RNAseq/intervals_gtf/hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list -ip 12 -dont-use-soft-clipped-bases --standard-min-confidence-threshold-for-calling 0
    23:09:30.447 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 21, 2021 11:09:30 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    23:09:30.609 INFO HaplotypeCaller - ------------------------------------------------------------
    23:09:30.609 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
    23:09:30.609 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    23:09:30.609 INFO HaplotypeCaller - Executing as manolis@apollo3 on Linux v4.4.0-137-generic amd64
    23:09:30.610 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    23:09:30.610 INFO HaplotypeCaller - Start Date/Time: January 21, 2021 11:09:30 PM CET
    23:09:30.610 INFO HaplotypeCaller - ------------------------------------------------------------
    23:09:30.610 INFO HaplotypeCaller - ------------------------------------------------------------
    23:09:30.610 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
    23:09:30.610 INFO HaplotypeCaller - Picard Version: 2.23.3
    23:09:30.610 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:09:30.611 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:09:30.611 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:09:30.611 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:09:30.611 INFO HaplotypeCaller - Deflater: IntelDeflater
    23:09:30.611 INFO HaplotypeCaller - Inflater: IntelInflater
    23:09:30.611 INFO HaplotypeCaller - GCS max retries/reopens: 20
    23:09:30.611 INFO HaplotypeCaller - Requester pays: disabled
    23:09:30.611 INFO HaplotypeCaller - Initializing engine
    23:09:31.422 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/manolis/GATK4190-v1-RNAseq/intervals_gtf/hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list
    23:09:33.180 INFO IntervalArgumentCollection - Processing 133758640 bp from intervals
    23:09:33.362 INFO HaplotypeCaller - Done initializing engine
    23:09:33.458 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
    23:09:33.610 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    23:09:33.612 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    23:09:33.678 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    23:09:33.679 INFO IntelPairHmm - Available threads: 32
    23:09:33.679 INFO IntelPairHmm - Requested threads: 4
    23:09:33.679 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    23:09:33.753 INFO ProgressMeter - Starting traversal
    23:09:33.753 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
    23:09:34.503 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    23:09:34.507 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
    23:09:43.819 INFO ProgressMeter - chr1:1663789 0.2 1380 8226.5
    23:09:53.823 INFO ProgressMeter - chr1:6887874 0.3 3530 10553.1
    23:10:03.887 INFO ProgressMeter - chr1:11926108 0.5 5380 10712.2
    23:10:13.912 INFO ProgressMeter - chr1:19216688 0.7 7990 11937.5
    23:10:23.949 INFO ProgressMeter - chr1:22510176 0.8 9400 11236.2

    ...

    00:13:31.872 INFO HaplotypeCaller - 1267247 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    0 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    0 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    1267247 total reads filtered
    00:13:31.873 INFO ProgressMeter - chrM:15021 64.0 698313 10916.5
    00:13:31.873 INFO ProgressMeter - Traversal complete. Processed 698313 total regions in 64.0 minutes.
    00:13:31.925 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 2.4741971720000002
    00:13:31.925 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 1226.7148626740002
    00:13:31.926 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 238.57 sec
    00:13:35.962 INFO HaplotypeCaller - Shutting down engine
    [January 22, 2021 12:13:35 AM CET] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 64.09 minutes.
    Runtime.totalMemory()=12635865088

    0
    Comment actions Permalink
  • Avatar
    manolis

    Could be that HC exclude all reads when I run BQSR, while when I skip BQSR I do not have this problem?

    0
    Comment actions Permalink
  • Avatar
    manolis

    sorry, about "exclusion reads" is not correct, in both cases (with and without BQSR) are excluded 1267247

     
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi manolis, I wanted to see the complete HaplotypeCaller log from the sample that is having the issue, could you share that?

    0
    Comment actions Permalink
  • Avatar
    manolis

    ok no problem just I have to run again the pipeline starting from zero. I will update you next days.

    From the RNA106 sample I had removed from the bam file the duplicate reads because with them SplitNCigar needs 4 days ... without them just 1 hour. Is there any option in SplitNCigar to not use duplicates reads? considering that HC doesn't use them.

    I have the same issue (BQSR -> fail HC) with all my samples.

    Thanks

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Oh, I see. I will try to look into this as well with the information you have given me. But please also update when you have that log.

    And to confirm, there are no other issues in ValidateSamFile?

    0
    Comment actions Permalink
  • Avatar
    manolis

    Hi Genevieve,

    here all validation steps of the "test" sample:

    > MarkDuplicates: (included in the bam file)

    No errors found

    > SplitNCigar

    ## HISTOGRAM java.lang.String
    Error Type Count
    WARNING:MISSING_TAG_NM 84313811

    > BQSR

    ## HISTOGRAM java.lang.String
    Error Type Count
    WARNING:MISSING_TAG_NM 84313811

    0
    Comment actions Permalink
  • Avatar
    manolis

    Here the HC logs of the "test" sample. The "test" sample is the same as in the first posts just here has the name "RNA105".

    11:08:09.635 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 24, 2021 11:08:09 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    11:08:09.799 INFO HaplotypeCaller - ------------------------------------------------------------
    11:08:09.800 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
    11:08:09.800 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    11:08:09.800 INFO HaplotypeCaller - Executing as manolis@apollo4 on Linux v4.4.0-142-generic amd64
    11:08:09.800 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    11:08:09.800 INFO HaplotypeCaller - Start Date/Time: January 24, 2021 11:08:09 AM CET
    11:08:09.800 INFO HaplotypeCaller - ------------------------------------------------------------
    11:08:09.800 INFO HaplotypeCaller - ------------------------------------------------------------
    11:08:09.801 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
    11:08:09.801 INFO HaplotypeCaller - Picard Version: 2.23.3
    11:08:09.801 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    11:08:09.801 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:08:09.801 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:08:09.801 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:08:09.801 INFO HaplotypeCaller - Deflater: IntelDeflater
    11:08:09.801 INFO HaplotypeCaller - Inflater: IntelInflater
    11:08:09.801 INFO HaplotypeCaller - GCS max retries/reopens: 20
    11:08:09.801 INFO HaplotypeCaller - Requester pays: disabled
    11:08:09.801 INFO HaplotypeCaller - Initializing engine
    11:08:10.561 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/manolis/GATK4190-v1-RNAseq/intervals_gtf/hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list
    11:08:12.384 INFO IntervalArgumentCollection - Processing 133758640 bp from intervals
    11:08:12.593 INFO HaplotypeCaller - Done initializing engine
    11:08:12.684 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
    11:08:12.884 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    11:08:12.899 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    11:08:12.949 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    11:08:12.949 INFO IntelPairHmm - Available threads: 32
    11:08:12.950 INFO IntelPairHmm - Requested threads: 4
    11:08:12.950 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    11:08:13.018 INFO ProgressMeter - Starting traversal
    11:08:13.020 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
    11:08:23.022 INFO ProgressMeter - chr1:32394392 0.2 11980 71872.8
    11:08:33.021 INFO ProgressMeter - chr1:150983963 0.3 35250 105744.7
    11:08:43.026 INFO ProgressMeter - chr1:206733746 0.5 50790 101559.7
    11:08:53.206 INFO ProgressMeter - chr2:85697494 0.7 76130 113666.5
    11:09:03.218 INFO ProgressMeter - chr2:241167563 0.8 106700 127535.0
    11:09:13.221 INFO ProgressMeter - chr3:123511128 1.0 129700 129269.1
    11:09:23.223 INFO ProgressMeter - chr4:154544433 1.2 167140 142850.6
    11:09:33.227 INFO ProgressMeter - chr5:177447632 1.3 200200 149762.5
    11:09:43.230 INFO ProgressMeter - chr6:106587667 1.5 222440 147948.1
    11:09:53.231 INFO ProgressMeter - chr7:88202759 1.7 249790 149558.4
    11:10:03.239 INFO ProgressMeter - chr8:73292735 1.8 277540 151084.7
    11:10:13.240 INFO ProgressMeter - chr9:125232584 2.0 308570 154003.9
    11:10:23.240 INFO ProgressMeter - chr10:102814537 2.2 335000 154354.2
    11:10:34.705 INFO ProgressMeter - chr11:5225565 2.4 344380 145836.2
    11:10:52.063 INFO ProgressMeter - chr11:5232985 2.7 344390 129923.4
    11:11:02.081 INFO ProgressMeter - chr11:67611488 2.8 359450 127569.3
    11:11:12.081 INFO ProgressMeter - chr12:51109109 3.0 386260 129428.5
    11:11:22.081 INFO ProgressMeter - chr13:25248740 3.2 407910 129453.5
    11:11:32.110 INFO ProgressMeter - chr14:105489843 3.3 440040 132615.4
    11:12:08.631 INFO ProgressMeter - chr16:180447 3.9 462810 117858.3
    11:12:18.630 INFO ProgressMeter - chr16:89143206 4.1 485290 118551.4
    11:12:28.633 INFO ProgressMeter - chr17:45243106 4.3 503340 118148.9
    11:12:38.655 INFO ProgressMeter - chr18:77014116 4.4 527960 119252.4
    11:12:48.698 INFO ProgressMeter - chr19:18193206 4.6 539200 117354.3
    11:12:58.935 INFO ProgressMeter - chr19:51372495 4.8 553240 116098.8
    11:13:08.967 INFO ProgressMeter - chr21:8398605 4.9 574190 116410.7
    11:13:18.990 INFO ProgressMeter - chrX:1395118 5.1 594580 116595.7
    11:13:29.392 INFO ProgressMeter - chrM:6563 5.3 616770 116970.5
    11:13:33.374 INFO HaplotypeCaller - 35459902 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    0 read(s) filtered by: NotSecondaryAlignmentReadFilter
    28786917 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    3 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    0 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    64246822 total reads filtered
    11:13:33.374 INFO ProgressMeter - chrM:14804 5.3 616839 115529.5
    11:13:33.374 INFO ProgressMeter - Traversal complete. Processed 616839 total regions in 5.3 minutes.
    11:13:33.386 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
    11:13:33.386 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
    11:13:33.386 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
    11:13:33.676 INFO HaplotypeCaller - Shutting down engine
    [January 24, 2021 11:13:33 AM CET] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 5.40 minutes.
    Runtime.totalMemory()=3138912256
    Using GATK jar /home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/manolis/software/gatk/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -I RNA105_RNAseq_bqsr.bam -O RNA105_conf00.vcf.gz -bamout RNA105_bamout_conf00.bam -L /home/manolis/GATK4190-v1-RNAseq/intervals_gtf/hg38_gtf_exon_ncbiRefSeq_fixed_merged.interval_list -ip 12 -dont-use-soft-clipped-bases --standard-min-confidence-threshold-for-calling 0

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi manolis,

    Thank you for sharing that log. I followed up with our developer team to try to find the issue. 

    They have an answer to your question regarding SplitNCigar not using duplicate reads, you can use -RF NotDuplicateReadFilter.

    Could you share the entire sequence of tools you ran? Something seems wrong because the read you shared went from quality around 35 to base qualities 1 and 2. Which would explain why you are not getting results from HaplotypeCaller.

    Could you submit your BQSR recalibration table and your SplitNCigarReads BAM to a bug report? Follow the instructions here.

    Thanks,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    manolis

    Hi, I will be back to you next days because I need the authorization from my leader group to upload the files.

    Thanks for SplitNCigarReads!

    Best

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    No problem, thank you!

    0
    Comment actions Permalink
  • Avatar
    manolis

    In the middle one more question: In the MergeBamAlignment step you use "--PAIRED_RUN False", in the template WDL pipeline. Why "false"?

    In the star step I use two fastq (I have paired-end, as in your template WDL pipeline):

    STAR --genomeDir STAR2_5 --runThreadN ${threads} --readFilesIn ${fastq1} ${fastq2}

    Also in WES/WGS in the MergeBamAlignment step you report "true"

    Thanks

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    manolis could you clarify which WDL you using?

    0
    Comment actions Permalink
  • Avatar
    manolis

    sorry, here is

    I do not use directly the WDL, I run the codes with bash

    0
    Comment actions Permalink
  • Avatar
    Beri
    0
    Comment actions Permalink
  • Avatar
    manolis

    Thanks Beri !

    Genevieve Brandt (she/her) I uploaded the files. The folder is "manolis" and inside you will find the

    RNAseq_VariantCalling.tar.gz compressed folder.

    > test_recal_data.csv

    > test_starAligned.merged.dedup.SplitNCigar.bam

    > logs folder with the log file of each step

    > pipeline: hg38_RNAseq_germline_variants_v4_ForumGATK.zip

    I ran each gatk step separately using Sun Grid Engine (qsub). "-N xxx" is the name of the job as also of the log files of each step.

    Let me know if you need more informations.

    Best and many thanks for the help!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi manolis,

    I cannot seem to find what you have uploaded, there are many different files with similar names. Could you please point me to the correct one?

    RNAseq_VariantCalling.tar.gz

    RNAseq_VariantCalling_forum_Manolis.gz

    RNAseq_VariantCalling_forum_Manolis.tar.gz

    hg38_RNAseq_germline_variants_v4_ForumGATK.zip

    Thank you.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    manolis

    Yes, I had connection problems and for this I uploaded many files :-/ . All compressed .gz folders are the same but 2 of them maybe are the truncated.

    Now, you have to check only the "RNAseq_VariantCalling.tar.gr" folder and the pipeline is the

    "hg38_RNAseq_germline_variants_v4_ForumGATK.zip" file.

    In the "RNAseq_VariantCalling.tar.gr" folder you will find inside:

    > test_recal_data.csv

    > test_starAligned.merged.dedup.SplitNCigar.bam

    > the folder with the log file of each step ran with Sun Grid Engine (qsub). "-N xxx" is the name of the job as also of the log files of each step.

    Many thanks

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    manolis thanks so much for the clarification! I think I found it. I will update you when we have more information.

    0
    Comment actions Permalink
  • Avatar
    manolis

    Hi Genevieve Brandt (she/her)

    do you have any update? If you need the fastq files or other information let me know.

    Do you think that I have to test my sample with yours hg19 pipeline (using yours db - bqsrDB/gtf/genome) ? Could it help for debug?

    Best

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi manolis, we are still working with your files, I will let you know when I have more information.

    0
    Comment actions Permalink
  • Avatar
    manolis

    Thanks! Just one more question about BQSR+RNAseq data ...

    For example we have two situations:

    A) with some WES design (exons) we have a very good uniformity of the mapping reads across the target regions .... I suppose that we have around 100M mapped bases. Imagine a similar situation with RNAseq data where most of the genes are expressed with similar levels and that mean that could be have a good uniformity across whole target genes/exons. Imagine a design of 35Mb where are covered with good uniformity 34Mb.

    B) in case that some genes are super-over expressed, meen that some target genes/exons will be over covered with mapped reads. In this case few target regions are over covered and many other with very/ultra low coverage ... also in this case we have around 100M bases. Imagine that also in this case the design is of 35Mb but are over-covered 5Mb and very low covered 29Mb ...

    In A and B situations BQSR works in the same good way?

    Many thanks

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk