GATK Mutect2 "--germline-resource" and "--panel-of-normals" for reference hg19
Dear GATK,
I am using GATK Mutect2 (Gatk4) for variant calling having files with reference hg19. After reading extensively about this topic I understood that b37 is equivalente to hg19 thereby using GERM and PON relative to b37. Yet these files are not compatible with hg19.fa.
export GENOME="/PATH/Manuel/FILES/HUMAN_REFERENCES/hg19.fa"
export GERM="/PATH/Manuel/FILES/HUMAN_REFERENCES/af-only-
gnomad.raw.sites.vcf"
export PON="/PATH/Manuel/FILES/HUMAN_REFERENCES/Mutect2-WGS-
panel-b37.vcf"
export VCF="${RECALIBRATED%.bam}.vcf"
srun /mnt/beegfs/apptainer/images/gatk4.sif gatk Mutect2 \
-R $GENOME \
-I $RECALIBRATED \
--germline-resource $GERM \
--panel-of-normals $PON \
-O $VCF
Resulting in the error displayed bellow:
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No
overlapping contigs found.
reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13,
chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5,
chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1,
chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random,
chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212,
chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random,
chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211,
chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221,
chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random,
chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230,
chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240,
chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238,
chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246,
chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247,
chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231,
chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT,
GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1,
GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1,
GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1,
GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1,
GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1,
GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1,
GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1,
GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1,
GL000194.1, GL000225.1, GL000192.1, NC_007605]
This error can be fixed partially by renaming the contigs from 1,2,3,... to chr1,chr2,... . However both lists have different sizes so the mapping is not 1 to 1.
I know this is debatable topic, but is there any other reference for GERM and PON for hg19 and not b37 that I can use (searched extensively over several resource bundles)?
What would the GATK team recommend doing in this case given that I cannot change the reference, due to the fact that I am starting from sorted.bam files?
Best regards, Manuel Sokolov Ravasqueira
-
Hi Manuel Sérgio Sokolov Ravasqueira
Since feature files and your reference do not have compatible contigs the best solution would be to remap your reads to the compatible B37 reference or liftover your feature files from B37 to hg19. If processing will cost more time and money then using a liftover for your feature files may help.
I am aware that such file exists and it looks like you found it here already.
Once the liftover is compete you should be able to use your feature files with your reference genome.
Regards.
-
Hi Manuel Sérgio Sokolov Ravasqueira,
I'll add that it is not completely true that "b37 is equivalent to hg19" -- there are actually minor sequence differences. You can see in this article on "human reference discrepancies" that chromosomes 3, Y and MT have different MD5 hash values for B37 and HG19, meaning that at least one base is different on those contigs. These differences are very minor, and most likely related to IUPAC bases and/or masking, but a proper liftover is the right option here, as Gökalp Çelik suggested above.
Regards,
David
-
Thank you both. I converted both files with https://github.com/broadgsa/gatk/blob/master/public/chainFiles/b37tohg19.chain. And GATK Mutect2 ran with no problems.
-
Dear Manuel Sérgio Sokolov Ravasqueira
I'm having a similar problem to yours, but I'm not quite sure how to convert convert a B37 PON file to hg19,I am downloading the following two files:
I would like to ask exactly how to convert or if you can send me the hg19 PON file.
-
You can use the LiftoverVcf tool from GATK or Picard to convert b37 PoN to hg19.
Regards.
-
Dear Manuel Sérgio Sokolov Ravasqueira
Thank you very much.I successfully converted the file using the command gatk liftovervcf!
Please sign in to leave a comment.
6 comments