Mutect2 produces 0 results for somatic mutation calling of paired-end reads (BAM)
I am trying to identify variants of tumour paired-end RNA-seq data using Mutect2. Please ignore more advanced operations such as using tumour and normal matched sample pair or PoN creation. I will use a simple example here.
I used MACS2 to map a pair of fastq files of R1 and R2 on the genome (I used CDS sequences as 'genome', thus MACS2 might able to be used for RNA-seq) and generated a bam file containing paired-end reads. As you can see in the image below it has many variants relative to the genome (e.g. BAM image in IGV: changed from G to A at position 156).
But when I try to use mutant2 to identify variants such as position 156 (covering the entire genome):
java -Xmx1t -jar gatk-package-4.2.6.1-local.jar Mutect2 -R current.genome.fa -I cancer_T1.sorted.bam --native-pair-hmm-threads 50 --max-mnp-distance 0 -O test_cancer_N1.sorted.vcf.gz --bam-output test.bam
The output is 0 variant found:
$ zcat test_cancer_N1.sorted.vcf.gz | head
##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=FAD,Number=R,Type=Integer,Description="Count of fragments supporting each allele.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
$ zcat test_cancer_N1.sorted.vcf.gz | tail
##contig=<ID=CCDS87784.1|Hs109|chrX,length=2697>
##contig=<ID=CCDS87786.1|Hs109|chrX,length=1002>
##contig=<ID=CCDS87789.1|Hs109|chrX,length=471>
##contig=<ID=CCDS87792.1|Hs109|chrX,length=1737>
##contig=<ID=CCDS87793.1|Hs109|chrX,length=1110>
##contig=<ID=CCDS87800.1|Hs109|chrX,length=714>
##filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
##source=Mutect2
##tumor_sample=cancer_N1
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT cancer_N1
All other paired-end RNA-seq data, regardless of species, produced almost no variant results like the above example. When splitting R1 and R2 into single-end and calling variants separately, a large number of variant results will be produced for each of them.
I don't understand what I'm doing wrong with paired-end RNA-seq data, or that Mutect2 was not originally designed for paired-end RNA-seq data.
I know it is wrong to split paired-end into two single-ends and run them separately, it will miss a lot of information. Ask for any valid suggestions to get paired-end RNA-seq data working properly in Mutect2.
-
Thank you for your post, Di Pan ! I want to let you know we have received your question. We'll get back to you if we have any updates or follow up questions.
Please see our Support Policy for more details about how we prioritize responding to questions.
Please sign in to leave a comment.
1 comment