We did 30x coverage whole genome sequencing of 8 inbred DBA/2J mice that showed phenotypic differences between Group 82 (4 mice) and Group 87 (4 mice), in order to discover if there are any genotypic differences that could explain the phenotypic differences. After going through the Sentieon GATK pipeline, we used the vcf files to select some variants of interest to look at more closely.
However, when we looked at our first region of interest in IGV using the recaled.bam files (loaded recaled.bam file into IGV → zoomed in to 41bp window at region of interest → exported alignments → searched for sequence of interest), we saw that there was no difference in the ratios of REF:ALT between the Group 82 and Group 87 mice (Table 1), contradicting the vcf file which called the Group 82 mice as 0/1 and the Group 87 mice as 0/0 (all with GQ>20).
A separate discrepancy also occurred when we looked at our second region of interest (Fig. 1): the top left small windows show the joint vcf file calling Mouse B82_1’s AD as “26,3” and Mouse B87_1’s AD as “24,4”. However, the bottom right small windows show the recaled.bam files showing different allele depths -Mouse B82_1 is “26,9” and Mouse B87_1 is “24,5”. The same can be seen when we look at our third region of interest (Fig. 2). From what I understand of the GATK pipeline, recaled.bam files contain analysis ready reads, so I am not sure why this discrepancy is occurring -is there some other filtering step between recaled.bam and generating the joint vcf file that removes some of the reads, resulting in different allele depths being called? If yes, which raw reads should we look at to validate which of our variants of interests could be true variants to proceed with further Sanger sequencing validation?
Finally, based on these discrepancies, we have some questions:
- Are these problems caused by using the less well-studied DBA/2J genome? Would it be better if we used a cleaned up reference genome like C57/BL6? However, if we use a different reference genome, we are worried that that will cause some reads to not align well, because our mice are DBA/2J, not C57/BL6.
- Is it common to have mRNA contamination interfere with read alignment and variant calling?
We would appreciate any other suggestions or comments, thank you!
Please sign in to leave a comment.