use LearnReadOrientationModel to build artifact-prior.tar.gz
AnsweredExcuse me, I wrote to you to inquire some my confusion about the use of LearnReadOrientationModel, I have 20 paired samples (Tumor-Normal matched), in the step of mutation filtering, firstly, LearnReadOrientationModel was recommended to be used to build artifact-prior.tar.gz, the following step is what I have done:
Sample 1
gatk LearnReadOrientationModel
-I ~/mydata/03.gatk/b.SNP_Indel/normal1.tar.gz
-O normal1.artifact-prior.tar.gz
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumor1.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumor1.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumor1.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumor1.contamination.table \
--tumor-segmentation tumor1.segments.tsv \
--ob-priors normal1.artifact-prior.tar.gz \
-O tumor1.f1.vcf.gz \
Sample 2
gatk LearnReadOrientationModel
-I ~/mydata/03.gatk/b.SNP_Indel/normal2.tar.gz
-O normal2.artifact-prior.tar.gz
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumor2.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumor2.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumor2.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumor2.contamination.table \
--tumor-segmentation tumor2.segments.tsv \
--ob-priors normal2.artifact-prior.tar.gz \
-O tumor2.f1.vcf.gz \
the remaining steps in cycle
…..
Sample x
gatk LearnReadOrientationModel
-I ~/mydata/03.gatk/b.SNP_Indel/normalx.tar.gz
-O normalx.artifact-prior.tar.gz
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumorx.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumorx.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumorx.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumorx.contamination.table \
--tumor-segmentation tumox.segments.tsv \
--ob-priors normalx.artifact-prior.tar.gz \
-O tumorx.f1.vcf.gz \
I wonder whether those steps were right. OR as the following:
Fistly, formed a merged artifact-prior.tar.gz, then use the merged artifact-prior.tar.gz in FilterMutectCalls
gatk LearnReadOrientationModel
-I ~/mydata/03.gatk/b.SNP_Indel/normal1.tar.gz
-I ~/mydata/03.gatk/b.SNP_Indel/normal2.tar.gz
-I ~/mydata/03.gatk/b.SNP_Indel/normal3.tar.gz
-I ~/mydata/03.gatk/b.SNP_Indel/normal4.tar.gz
-I ~/mydata/03.gatk/b.SNP_Indel/normal5.tar.gz
…
-I ~/mydata/03.gatk/b.SNP_Indel/normal20.tar.gz
-O merged.artifact-prior.tar.gz
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumorx.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumorx.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumorx.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumorx.contamination.table \
--tumor-segmentation tumox.segments.tsv \
--ob-priors merged.artifact-prior.tar.gz \
-O tumorx.f1.vcf.gz \
I don’t know which of the abovementioned two pipelines is right. I need someone’s help. Hope response. Appreciate!
-
Hi chenglei,
Thanks for writing in with your question! We can definitely help you figure this out.
Have you seen this tutorial? (How to) Call somatic mutations using GATK4 Mutect2. There is a section titled A step-by-step guide to the new Mutect2 Read Orientation Artifacts Workflow. There is a great description covering the steps you are describing here in your post.
Please let me know if you have follow up questions after checking out the tutorial.
Best,
Genevieve
-
Thank your response. But I am still confused. As it was pointed out in “A step-by-step guide to the new Mutect2 Read Orientation Artifacts Workflow”, “When multiple tumor samples are specified, you only need a single --f1r2-tar-gz output, which contains data for each tumor sample.”
Do it means that this single --f1r2-tar-gz output will be used in “gatk LearnReadOrientationModel -I f1r2.tar.gz -O read-orientation-model.tar.gz” and generated a single read-orientation-model.tar.gz, and then this single read-orientation-model.tar.gz will be used in gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumorx.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumorx.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumorx.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumorx.contamination.table \
--tumor-segmentation tumox.segments.tsv \
--ob-priors read-orientation-model.tar.gz \
-O tumorx.f1.vcf.gz \
But I have 20 paired tumor-normal matched samples, each sample will generate one f1r2.tar.gz, and each f1r2.tar.gz will generate one corresponding read-orientation-model.tar.gz, the following is what I do:
Sample 1
gatk Mutect2 \
-R ~/mydata/01.index/Homo_sapiens_assembly38.fasta \
-I ~/mydata/03.gatk/a.BQSR/normal1.MarkDuplicates.BQSR.bam \
-I ~/mydata/03.gatk/a.BQSR/tumor1.MarkDuplicates.BQSR.bam \
-normal normal1 \
--germline-resource ~/mydata/genome/hg38/af-only-gnomad.hg38.vcf.gz \
--panel-of-normals ~/mydata/03.gatk/b.SNP_Indel/panel_of_normal/pon.vcf.gz \
--f1r2-tar-gz tumor1.f1r2.tar.gz \
-L ~/mydata/genome/hg38/hg38-genomefa/intervallist/S07604514_Regions.bed \
-O tumor1.somatic_unfilterd.vcf.gz \
Sample 2
gatk Mutect2 \
-R ~/mydata/01.index/Homo_sapiens_assembly38.fasta \
-I ~/mydata/03.gatk/a.BQSR/normal2.MarkDuplicates.BQSR.bam \
-I ~/mydata/03.gatk/a.BQSR/tumor2.MarkDuplicates.BQSR.bam \
-normal normal2 \
--germline-resource ~/mydata/genome/hg38/af-only-gnomad.hg38.vcf.gz \
--panel-of-normals ~/mydata/03.gatk/b.SNP_Indel/panel_of_normal/pon.vcf.gz \
--f1r2-tar-gz tumor2.f1r2.tar.gz \
-L ~/mydata/genome/hg38/hg38-genomefa/intervallist/S07604514_Regions.bed \
-O tumor2.somatic_unfilterd.vcf.gz \
Sample x
gatk Mutect2 \
-R ~/mydata/01.index/Homo_sapiens_assembly38.fasta \
-I ~/mydata/03.gatk/a.BQSR/normalx.MarkDuplicates.BQSR.bam \
-I ~/mydata/03.gatk/a.BQSR/tumorx.MarkDuplicates.BQSR.bam \
-normal normalx \
--germline-resource ~/mydata/genome/hg38/af-only-gnomad.hg38.vcf.gz \
--panel-of-normals ~/mydata/03.gatk/b.SNP_Indel/panel_of_normal/pon.vcf.gz \
--f1r2-tar-gz tumorx.f1r2.tar.gz \
-L ~/mydata/genome/hg38/hg38-genomefa/intervallist/S07604514_Regions.bed \
-O tumorx.somatic_unfilterd.vcf.gz \
The abovementiones steps will generate x counts .f1r2.tar.gz files,
gatk LearnReadOrientationModel -I tumor1. f1r2.tar.gz -O tumor1.read-orientation-model.tar.gz
gatk LearnReadOrientationModel -I tumor2. f1r2.tar.gz -O tumor2.read-orientation-model.tar.gz
gatk LearnReadOrientationModel -I tumorx. f1r2.tar.gz -O tumorX.read-orientation-model.tar.gz
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumor1.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumor1.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumor1.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumor1.contamination.table \
--tumor-segmentation tumor1.segments.tsv \
--ob-priors tumor1.read-orientation-model.tar.gz \
-O tumorx.f1.vcf.gz \
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumor2.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumor2.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumor2.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumor2.contamination.table \
--tumor-segmentation tumor2.segments.tsv \
--ob-priors tumor2.read-orientation-model.tar.gz \
-O tumor2.f1.vcf.gz \
gatk FilterMutectCalls \
-R ~/mydata/genome/hg38/hg38-genomefa/Homo_sapiens_assembly38.fasta\
-V ~/mydata/03.gatk/b.SNP_Indel/tumorX.unfiltered.somatic.vcf.gz \
--stats ~/mydata/03.gatk/b.SNP_Indel/tumorX.unfiltered.somatic.vcf.gz.stats \
--filtering-stats tumorX.f1.vcf.gz.stats \
--contamination-table ~/mydata/03.gatk/contamination/tumorX.contamination.table \
--tumor-segmentation tumorx.segments.tsv \
--ob-priors tumorX.read-orientation-model.tar.gz \
-O tumor2.f1.vcf.gz \
What I have did is right? So, I don’t know what “When multiple tumor samples are specified, you only need a single --f1r2-tar-gz output, which contains data for each tumor sample.” mean. I need your help. Hope your reponse. Thank you
-
Are your tumor samples all from the same individual or from different individuals?
-
my tumor sample are from different individuals
-
Ok! The sentence you were referring to: "When multiple tumor samples are specified, you only need a single
--f1r2-tar-gz
output, which contains data for each tumor sample" only applies if you are running Mutect2 in multisample mode, which is when you have multiple samples from the same individual.Since you have different individuals, you will run Mutect2 separately for each sample and LearnReadOrientationModel separately for each sample.
Please let me know if I can clarify this further for you or if you have any remaining questions.
Please sign in to leave a comment.
5 comments