MerVcfs did not work
when I used gatk MergeVcfs module to merge multiple vcf files, several errors were gave as following:
gatk MergeVcfs -I SCA_957_N.germline_raw.vcf -I SCA_917_N.germline_raw.vcf -O merge.vcf
Using GATK jar /pub/anaconda3/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /pub/anaconda3/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar MergeVcfs -I SCA_957_N.germline_raw.vcf -I SCA_917_N.germline_raw.vcf -O merge.vcf
09:56:29.779 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/pub/anaconda3/share/gatk4-4.2.0.0-1/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Nov 11 09:56:29 CST 2021] MergeVcfs --INPUT SCA_957_N.germline_raw.vcf --INPUT SCA_917_N.germline_raw.vcf --OUTPUT merge.vcf --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Nov 11, 2021 9:56:29 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Thu Nov 11 09:56:29 CST 2021] Executing as ug0416@gs72 on Linux 4.15.0-153-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_282-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.0.0
[Thu Nov 11 09:56:30 CST 2021] picard.vcf.MergeVcfs done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2155872256
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.IllegalArgumentException: Input file /home/ug0416/mydata/05.gatk/g.germline/SCA_917_N.germline_raw.vcf has sample entries that don't match the other files.
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:203)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
my vcf files were successfully generated by gatk HaplotypeCaller aiming analyze germline mutation:
cat config | while read id
do
echo "start HaplotypeCaller for ${id}" `date`
gatk --java-options "-Xmx4g -Djava.io.temdir=./tmp" HaplotypeCaller -ERC GVCF \
-R ~/mydata/genome/hg38/Homo_sapiens_assembly38.fasta \
-I ~/mydata/05.gatk/a.BQSR/geneplus/${id}.sort.mardup.BQSR.bam \
--dbsnp ~/mydata/genome/dbsnp_146.hg38.vcf.gz \
-L ~/mydata/genome/hg38_bed/xGen_Exome_Research_Panel.target-hg38.bed \
-O ${id}.germline_raw.vcf \
1>${id}.germline-step1.log 2>&1
echo "finish HaplotypeCaller for ${id}" `date`
done
I don't know what's the matter, I need your help! Hope your reponse! Appreciate
-
Hi chenglei,
MergeVcfs requires that all of your input vcf files have the exact same samples which you can read about in the tool documentation. Therefore, there most likely isn't anything wrong with the vcf files you created, but the "SCA_917_N.germline_raw.vcf" has samples in it that are not present in your other file.
Kind regards,
Pamela
Please sign in to leave a comment.
1 comment