GATK Mutect2 "--germline-resource" and "--panel-of-normals" for reference hg19
Dear GATK,
I am using GATK Mutect2 (Gatk4) for variant calling having files with reference hg19. After reading extensively about this topic I understood that b37 is equivalente to hg19 thereby using GERM and PON relative to b37. Yet these files are not compatible with hg19.fa.
export GENOME="/PATH/Manuel/FILES/HUMAN_REFERENCES/hg19.fa"
export GERM="/PATH/Manuel/FILES/HUMAN_REFERENCES/af-only-
gnomad.raw.sites.vcf"
export PON="/PATH/Manuel/FILES/HUMAN_REFERENCES/Mutect2-WGS-
panel-b37.vcf"
export VCF="${RECALIBRATED%.bam}.vcf"
srun /mnt/beegfs/apptainer/images/gatk4.sif gatk Mutect2 \
-R $GENOME \
-I $RECALIBRATED \
--germline-resource $GERM \
--panel-of-normals $PON \
-O $VCF
Resulting in the error displayed bellow:
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No
overlapping contigs found.
reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13,
chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5,
chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1,
chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random,
chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212,
chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random,
chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211,
chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221,
chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random,
chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230,
chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240,
chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238,
chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246,
chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247,
chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231,
chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT,
GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1,
GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1,
GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1,
GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1,
GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1,
GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1,
GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1,
GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1,
GL000194.1, GL000225.1, GL000192.1, NC_007605]
This error can be fixed partially by renaming the contigs from 1,2,3,... to chr1,chr2,... . However both lists have different sizes so the mapping is not 1 to 1.
I know this is debatable topic, but is there any other reference for GERM and PON for hg19 and not b37 that I can use (searched extensively over several resource bundles)?
What would the GATK team recommend doing in this case given that I cannot change the reference, due to the fact that I am starting from sorted.bam files?
Best regards, Manuel Sokolov Ravasqueira
-
Hi Manuel Sérgio Sokolov Ravasqueira
Since feature files and your reference do not have compatible contigs the best solution would be to remap your reads to the compatible B37 reference or liftover your feature files from B37 to hg19. If processing will cost more time and money then using a liftover for your feature files may help.
I am aware that such file exists and it looks like you found it here already.
Once the liftover is compete you should be able to use your feature files with your reference genome.
Regards.
-
Hi Manuel Sérgio Sokolov Ravasqueira,
I'll add that it is not completely true that "b37 is equivalent to hg19" -- there are actually minor sequence differences. You can see in this article on "human reference discrepancies" that chromosomes 3, Y and MT have different MD5 hash values for B37 and HG19, meaning that at least one base is different on those contigs. These differences are very minor, and most likely related to IUPAC bases and/or masking, but a proper liftover is the right option here, as Gökalp Çelik suggested above.
Regards,
David
-
Thank you both. I converted both files with https://github.com/broadgsa/gatk/blob/master/public/chainFiles/b37tohg19.chain. And GATK Mutect2 ran with no problems.
Please sign in to leave a comment.
3 comments