Mutect2 somatic call error
REQUIRED for all errors and issues:
a) GATK version used: 4.2.5.0
b) Exact command used: java -jar /home/tbiswas/softwares/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar Mutect2 -R /home/tbiswas/hg38/hg38.fa -I /scratch/tbiswas/IITK-P6-TD_recal.bam -I /scratch/tbiswas/IITK-P6-BD_recal.bam --normal-sample IITK-P6-BD --germline-resource /scratch/tbiswas/largefiles/somatic-hg38_af-only-gnomad.hg38.vcf.gz --f1r2-tar-gz IITK-P6_2_f1r2.tar.gz --read-validation-stringency LENIENT --lenient true --annotation Coverage -O IITK-P6_2_somatic.vcf.gz
c) Entire program log:
[tbiswas@un01 ~]$ java -jar /home/tbiswas/softwares/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar Mutect2 -R /home/tbiswas/hg38/hg38.fa -I /scratch/tbiswas/IITK-P6-TD_recal.bam -I /scratch/tbiswas/IITK-P6-BD_recal.bam --normal-sample IITK-P6-BD --germline-resource /scratch/tbiswas/largefiles/somatic-hg38_af-only-gnomad.hg38.vcf.gz --f1r2-tar-gz IITK-P6_2_f1r2.tar.gz --read-validation-stringency LENIENT --lenient true --annotation Coverage -O IITK-P6_2_somatic.vcf.gz
13:33:42.215 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/tbiswas/softwares/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 07, 2022 1:33:42 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:33:46.306 INFO Mutect2 - ------------------------------------------------------------
13:33:46.307 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.2.5.0
13:33:46.307 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
13:33:46.307 INFO Mutect2 - Executing as tbiswas@un01 on Linux v3.10.0-327.el7.x86_64 amd64
13:33:46.307 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_65-b17
13:33:46.308 INFO Mutect2 - Start Date/Time: 7 November, 2022 1:33:42 PM IST
13:33:46.308 INFO Mutect2 - ------------------------------------------------------------
13:33:46.308 INFO Mutect2 - ------------------------------------------------------------
13:33:46.309 INFO Mutect2 - HTSJDK Version: 2.24.1
13:33:46.309 INFO Mutect2 - Picard Version: 2.25.4
13:33:46.309 INFO Mutect2 - Built for Spark Version: 2.4.5
13:33:46.309 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:33:46.309 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:33:46.309 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:33:46.309 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:33:46.310 INFO Mutect2 - Deflater: IntelDeflater
13:33:46.310 INFO Mutect2 - Inflater: IntelInflater
13:33:46.310 INFO Mutect2 - GCS max retries/reopens: 20
13:33:46.310 INFO Mutect2 - Requester pays: disabled
13:33:46.310 INFO Mutect2 - Initializing engine
13:33:49.929 INFO FeatureManager - Using codec VCFCodec to read file file:///scratch/tbiswas/largefiles/somatic-hg38_af-only-gnomad.hg38.vcf.gz
13:33:50.940 INFO Mutect2 - Done initializing engine
13:33:50.967 INFO Mutect2 - Shutting down engine
[7 November, 2022 1:33:50 PM IST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.15 minutes.
Runtime.totalMemory()=1166540800
***********************************************************************
A USER ERROR has occurred: Bad input: Sample IITK-P6-BD is not in BAM header: [normal]
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
[tbiswas@un01 ~]$
Mutect2 is giving the above error. Please let me know how and where to write the header to the BAM file. I have checked the header of the bam file which is as below:
[tbiswas@un01 ~]$ samtools view -H /scratch/tbiswas/IITK-P6-BD_recal.bam
@HD VN:1.6 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10 LN:133797422
@SQ SN:chr11 LN:135086622
@SQ SN:chr12 LN:133275309
@SQ SN:chr13 LN:114364328
@SQ SN:chr14 LN:107043718
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
@SQ SN:chr18 LN:80373285
@SQ SN:chr19 LN:58617616
@SQ SN:chr20 LN:64444167
@SQ SN:chr21 LN:46709983
@SQ SN:chr22 LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@RG ID:normal PL:ILLUMINA LB:TruSeq SM:normal PI:200
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 2 -R @RG\tID:normal\tPL:ILLUMINA\tLB:TruSeq\tSM:normal\tPI:200 /home/tbiswas/hg38/hg38.fa /home/tbiswas/IITK-P6/IITK-P6-BD_1.fastq.gz /home/tbiswas/IITK-P6/IITK-P6-BD_2.fastq.gz -f /scratch/tbiswas/IITK-P6-BD.sam
@PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools fixmate -O bam -@ 10 /scratch/tbiswas/IITK-P6-BD.sam /scratch/tbiswas/IITK-P6-BD_fixmate.bam
@PG ID:MarkDuplicates VN:2.27.5 CL:MarkDuplicates INPUT=[/scratch/tbiswas/IITK-P6-BD_fixmate_sorted.bam] OUTPUT=/scratch/tbiswas/IITK-P6-BD_fixmate_sorted_duprm.bam METRICS_FILE=/scratch/tbiswas/IITK-P6-BD_fixmate_sorted_duprminformation.txt REMOVE_DUPLICATES=true TMP_DIR=[/scratch/tbiswas/tmp] VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false FLOW_MODE=false FLOW_QUALITY_SUM_STRATEGY=false USE_END_IN_UNPAIRED_READS=false USE_UNPAIRED_CLIPPED_END=false UNPAIRED_END_UNCERTAINTY=0 FLOW_SKIP_FIRST_N_FLOWS=0 FLOW_Q_IS_KNOWN_END=false FLOW_EFFECTIVE_QUALITY_THRESHOLD=15 ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
@PG ID:GATK ApplyBQSR VN:4.2.5.0 CL:ApplyBQSR --output /scratch/tbiswas/IITK-P6-BD_recal.bam --bqsr-recal-file /scratch/tbiswas/IITK-P6-BD_recal_data.table --input /scratch/tbiswas/IITK-P6-BD_fixmate_sorted_duprm.bam --reference /home/tbiswas/hg38/hg38.fa --preserve-qscores-less-than 6 --use-original-qualities false --quantize-quals 0 --round-down-quantized false --emit-original-quals false --global-qscore-prior -1.0 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false PN:GATK ApplyBQSR
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.10 CL:samtools view -H /scratch/tbiswas/IITK-P6-BD_recal.bam
@PG ID:samtools.2 PN:samtools PP:MarkDuplicates VN:1.10 CL:samtools view -H /scratch/tbiswas/IITK-P6-BD_recal.bam
@PG ID:samtools.3 PN:samtools PP:GATK ApplyBQSR VN:1.10 CL:samtools view -H /scratch/tbiswas/IITK-P6-BD_recal.bam
[tbiswas@un01 ~]$
Please let me know how to solve this.
-
Hi Tanay Biswas,
I think this issue is coming from your read group line for your IITK-P6-BD_recal.bam sample:
@RG ID:normal PL:ILLUMINA LB:TruSeq SM:normal PI:200
The sample name for the -normal argument should correspond to the sample name in the read group. The sample name in your read group is normal, but it should be IITK-P6-BD, to match your command line.
You can fix your read group with the tool AddOrReplaceReadGroups: https://gatk.broadinstitute.org/hc/en-us/articles/360035532352-Errors-about-read-group-RG-information
Let me know if you have any other questions!
Best,
Genevieve
-
Hi Genevieve,
Thanks. I have checked the link but I don't know about Read-Group platform unit. Please let me know what should be specified at --RGPU,-PU <String> Read-Group platform unit (eg. run barcode) option while running picard and -SM should be IITK-P6_BD right?
Thanks.
Regards,
Tanay
-
Hi Tanay,
Yes, the -SM should be IITK-P6_BD. Here is a description of what should be in the Platform Unit, from the read groups article:
PU
= Platform UnitThePU
holds three types of information, the {FLOWCELL_BARCODE}.{LANE}.{SAMPLE_BARCODE}. The {FLOWCELL_BARCODE} refers to the unique identifier for a particular flow cell. The {LANE} indicates the lane of the flow cell and the {SAMPLE_BARCODE} is a sample/library-specific identifier. Although thePU
is not required by GATK but takes precedence overID
for base recalibration if it is present. In the example shown earlier, two read group fields,ID
andPU
, appropriately differentiate flow cell lane, marked by.2
, a factor that contributes to batch effects.Let me know if you have any further questions!
Best,
Genevieve
-
Hi Genevieve,
I have understood the description but I am not able to find anything related to PU. Can you suggest from where should I get the information?
Thank you.
Regards,
Tanay
-
Hi Tanay,
Usually you would get the PU information from wherever your sample was sequenced. However, if you do not have the information, PU is not required by GATK so you do not need to include it.
Best,
Genevieve
-
Hi Genevieve
Picard was giving error when PU was not mentioned. So I ran the below command:
[tbiswas@un02 ~]$ java -jar picard.jar AddOrReplaceReadGroups -I /scratch/tbiswas/IITK-P6-BD_fixmate_sorted_duprm.recal.bam -O /scratch/tbiswas/IITK-P6-BD_fixmate_sorted.duprm.recal_RG.bam -LB TruSeq -PL ILLUMINA -PU barcode -SM IITK-P6-BD --CREATE_INDEX true
This gave me the output. But the output file size is reduced by ~4GB, can you comment upon that? Now I'll run Mutect2 again. Lets see..
Thanks.
-
You should be able to run Mutect2 without the PU read group.
I don't know about why your file size is different, I don't have enough information to determine. If you share the complete stack trace from AddOrReplaceReadGroups, I can get a better idea of if it was successful.
-
Hi Genevieve
Mutect2 is running and I've understood that why the output file size was different.
Thank you.
Regards,
Tanay
-
Thanks for the update, Tanay! Glad it is working for you now!
Please sign in to leave a comment.
9 comments