Created panel of normals returns empty vcf
AnsweredHi
I am trying to make my own panel of normals for human exome using gatk v4.1.9.0 (installed with conda).
I am using a version of the genome where chromosomes are written just with the number (without the chr prefix).
I got the 1000g_ponhg38.vcf and af-only.gnomad.hg38.vcf from the google bucket gatk-best-practices, and edited the chr prefix to fit my mappings and reindexed them. If I mention to use any of these resources I am using the edited ones
Currently I have 68 normal samples and I expect to reach 100 or more as my cohort increases
I took this post as reference to create PoN
https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2
I have mapped my samples against grch38, and then extracted the variants in a loop that look like
gatk Mutect2 -R ${RefGenome} -I ${my_sample}.bam --max-mnp-distance 0 -O pon_normal_${my_sample}.vcf.gz
It worked, I got 3 files for each sample pon_normal_${my_sample}.vcf.gz + pon_normal_${my_sample}.vcf.gz.tbi + pon_normal_${my_sample}.vcf.gz.stats
the logs looked like:
20:44:29.473 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 18, 2021 8:44:29 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
20:44:29.560 INFO Mutect2 - ------------------------------------------------------------
20:44:29.560 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0
20:44:29.560 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
20:44:29.561 INFO Mutect2 - Executing as rafael@Ryzen on Linux v5.4.0-74-generic amd64
20:44:29.561 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-b11
20:44:29.561 INFO Mutect2 - Start Date/Time: June 18, 2021 8:44:29 PM CEST
20:44:29.561 INFO Mutect2 - ------------------------------------------------------------
20:44:29.561 INFO Mutect2 - ------------------------------------------------------------
20:44:29.561 INFO Mutect2 - HTSJDK Version: 2.23.0
20:44:29.561 INFO Mutect2 - Picard Version: 2.23.3
20:44:29.561 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
20:44:29.561 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
20:44:29.561 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20:44:29.561 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
20:44:29.561 INFO Mutect2 - Deflater: IntelDeflater
20:44:29.561 INFO Mutect2 - Inflater: IntelInflater
20:44:29.561 INFO Mutect2 - GCS max retries/reopens: 20
20:44:29.561 INFO Mutect2 - Requester pays: disabled
20:44:29.562 INFO Mutect2 - Initializing engine
20:44:30.444 INFO Mutect2 - Done initializing engine
20:44:30.460 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
20:44:30.461 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
20:44:30.488 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
20:44:30.488 INFO IntelPairHmm - Available threads: 32
20:44:30.488 INFO IntelPairHmm - Requested threads: 4
20:44:30.488 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
20:44:30.511 INFO ProgressMeter - Starting traversal
20:44:30.512 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
20:44:40.525 INFO ProgressMeter - 1:1072514 0.2 3780 22650.6
20:44:51.020 INFO ProgressMeter - 1:1626819 0.3 5800 16969.0
... lines skipped here ...
22:25:32.436 INFO ProgressMeter - X:153472272 101.0 10268280 101633.9
22:25:43.081 INFO ProgressMeter - X:153943796 101.2 10269980 101472.5
22:25:53.146 INFO ProgressMeter - X:154533377 101.4 10272030 101324.8
22:26:03.146 INFO ProgressMeter - Y:20483103 101.5 10345440 101881.5
22:26:13.154 INFO ProgressMeter - KI270387.1:901 101.7 10506070 103293.7
22:26:13.301 INFO Mutect2 - 3590243 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
3590243 total reads filtered
22:26:13.301 INFO ProgressMeter - KI270394.1:601 101.7 10506221 103292.7
22:26:13.301 INFO ProgressMeter - Traversal complete. Processed 10506221 total regions in 101.7 minutes.
22:26:13.340 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 8.675050335
22:26:13.340 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 853.076693996
22:26:13.340 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 683.80 sec
22:26:13.341 INFO Mutect2 - Shutting down engine
[June 18, 2021 10:26:13 PM CEST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 101.73 minutes.
Runtime.totalMemory()=12513705984
Using GATK jar /home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar
Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar Mutect2 -R /home/rafael/Documents/genomes/human/Homo_sapiens.GRCh38.p13.rel.102.dna.primary_assembly.fa -I /media/rafael/DATA/WES/map//344852_M_NoDup_filt.bam --max-mnp-distance 0 -O /media/rafael/DATA/WES/som/pon/pon_normal_344852_M.vcf.gz
Then, I tried to build the database with GenomicsDBImport. At the beggining I used just 2 samples to try out, but after a couple of days it was processing and I realized that for exomes, -L with a lot of intervals, I had to use --merge-input-intervals true
I also used --sample-name-map directing to ${my_samples} file with the samples names and path
${my_samples} is a tab separated file with the names of the 68 normals and looks like:
a_new_id_1 /path/to/the/vcf.files/pon_normal_sample_id_1.vcf.gz
a_new_id_2 /path/to/the/vcf.files/pon_normal_sample_id_1.vcf.gz
... lines skipped here ...
a_new_id_68 /path/to/the/vcf.files/pon_normal_sample_id_68.vcf.gz
The command was:
gatk --java-options "-Xmx96g -Xms64g" GenomicsDBImport \
--genomicsdb-workspace-path ${my_SOMDB} -R ${my_RefGenome} \
-L ${my_WES} --merge-input-intervals true \
--tmp-dir ${my_temp} --sample-name-map ${my_samples}
It seem to work pretty fast with --merge-input-intervals true (perhaps too fast)
probably I did not created the database properly, but i do not know what is missing
The log looked like:
15:30:39.267 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 02, 2021 3:30:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:30:39.798 INFO GenomicsDBImport - ------------------------------------------------------------
15:30:39.798 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
15:30:39.798 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
15:30:39.798 INFO GenomicsDBImport - Executing as rafael@Ryzen on Linux v5.4.0-77-generic amd64
15:30:39.798 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-b11
15:30:39.798 INFO GenomicsDBImport - Start Date/Time: July 2, 2021 3:30:39 PM CEST
15:30:39.798 INFO GenomicsDBImport - ------------------------------------------------------------
15:30:39.798 INFO GenomicsDBImport - ------------------------------------------------------------
15:30:39.799 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
15:30:39.799 INFO GenomicsDBImport - Picard Version: 2.23.3
15:30:39.799 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:30:39.799 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:30:39.799 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:30:39.799 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:30:39.799 INFO GenomicsDBImport - Deflater: IntelDeflater
15:30:39.799 INFO GenomicsDBImport - Inflater: IntelInflater
15:30:39.799 INFO GenomicsDBImport - GCS max retries/reopens: 20
15:30:39.799 INFO GenomicsDBImport - Requester pays: disabled
15:30:39.799 INFO GenomicsDBImport - Initializing engine
15:30:40.061 INFO FeatureManager - Using codec BEDCodec to read file file:///home/rafael/Documents/genomes/references/WES_targets/SureSelect_v6_hg38_noChr.bed
15:30:41.226 INFO IntervalArgumentCollection - Processing 60507855 bp from intervals
15:30:41.353 INFO GenomicsDBImport - Done initializing engine
15:30:41.735 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
15:30:41.759 INFO GenomicsDBImport - Vid Map JSON file will be written to /media/rafael/DATA/WES/som/pon/db/vidmap.json
15:30:41.759 INFO GenomicsDBImport - Callset Map JSON file will be written to /media/rafael/DATA/WES/som/pon/db/callset.json
15:30:41.759 INFO GenomicsDBImport - Complete VCF Header will be written to /media/rafael/DATA/WES/som/pon/db/vcfheader.vcf
15:30:41.759 INFO GenomicsDBImport - Importing to workspace - /media/rafael/DATA/WES/som/pon/db
15:30:41.759 INFO ProgressMeter - Starting traversal
15:30:41.759 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
15:30:44.311 INFO GenomicsDBImport - Importing batch 1 with 68 samples
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:14574
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:13273
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:12807
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:12783
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:12807
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:13273
15:30:46.379 INFO GenomicsDBImport - Importing batch 1 with 68 samples
...
most lines were errors, with an ocasional Import
...
E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at Y:11332409
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at Y:5623220
15:31:23.811 INFO ProgressMeter - 1:12081 0.7 1 1.4
15:31:23.811 INFO GenomicsDBImport - Done importing batch 1/1
15:31:23.811 INFO ProgressMeter - 1:12081 0.7 1 1.4
15:31:23.811 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 0.7 minutes.
15:31:23.811 INFO GenomicsDBImport - Import completed!
15:31:23.811 INFO GenomicsDBImport - Shutting down engine
[July 2, 2021 3:31:23 PM CEST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.74 minutes.
Runtime.totalMemory()=81669390336
Using GATK jar /home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar
Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx96g -Xms64g -jar /home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar GenomicsDBImport --genomicsdb-workspace-path /media/rafael/DATA/WES/som/pon/db -R /home/rafael/Documents/genomes/human/Homo_sapiens.GRCh38.p13.rel.102.dna.primary_assembly.fa -L /home/rafael/Documents/genomes/references/WES_targets/SureSelect_v6_hg38_noChr.bed --merge-input-intervals true --tmp-dir /media/rafael/DATA/WES/som/pon/temp --sample-name-map /home/rafael/Documents/WES/WESout/somatic_sample_map.txt
inside the db folder one can find 23 folders with the chrmosome names plus some stuff for example 1$12081$248937047 or X$284106$156027976
other files also present were callset.json, vcfheader.vcf, vidmap.json and __tiledb_workspace.tdb
finally I tried to create my pon
gatk CreateSomaticPanelOfNormals -R ${my_RefGenome} -V gendb://db -O pon.vcf.gz
the folders were created empty, and the log indicated problems:
16:14:52.452 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 02, 2021 4:14:52 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:14:52.585 INFO CreateSomaticPanelOfNormals - ------------------------------------------------------------
16:14:52.585 INFO CreateSomaticPanelOfNormals - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:14:52.585 INFO CreateSomaticPanelOfNormals - For support and documentation go to https://software.broadinstitute.org/gatk/
16:14:52.585 INFO CreateSomaticPanelOfNormals - Executing as rafael@Ryzen on Linux v5.4.0-77-generic amd64
16:14:52.585 INFO CreateSomaticPanelOfNormals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-b11
16:14:52.585 INFO CreateSomaticPanelOfNormals - Start Date/Time: July 2, 2021 4:14:52 PM CEST
16:14:52.585 INFO CreateSomaticPanelOfNormals - ------------------------------------------------------------
16:14:52.585 INFO CreateSomaticPanelOfNormals - ------------------------------------------------------------
16:14:52.585 INFO CreateSomaticPanelOfNormals - HTSJDK Version: 2.23.0
16:14:52.586 INFO CreateSomaticPanelOfNormals - Picard Version: 2.23.3
16:14:52.586 INFO CreateSomaticPanelOfNormals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:14:52.586 INFO CreateSomaticPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:14:52.586 INFO CreateSomaticPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:14:52.586 INFO CreateSomaticPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:14:52.586 INFO CreateSomaticPanelOfNormals - Deflater: IntelDeflater
16:14:52.586 INFO CreateSomaticPanelOfNormals - Inflater: IntelInflater
16:14:52.586 INFO CreateSomaticPanelOfNormals - GCS max retries/reopens: 20
16:14:52.586 INFO CreateSomaticPanelOfNormals - Requester pays: disabled
16:14:52.586 WARN CreateSomaticPanelOfNormals -
[1m[31m !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: CreateSomaticPanelOfNormals is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!![0m
16:14:52.586 INFO CreateSomaticPanelOfNormals - Initializing engine
16:14:53.011 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
16:14:53.022 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field AS_UNIQ_ALT_READ_COUNT - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.022 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field CONTQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.022 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field ECNT - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.022 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field GERMQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.022 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field MBQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.022 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field MFRL - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field MMQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field MPOS - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field NALOD - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field NCount - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field NLOD - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field OCM - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field PON - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field POPAF - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field ROQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field RPA - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field RU - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field SEQQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field STR - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field STRANDQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field STRQ - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.023 info NativeGenomicsDB - pid=2455 tid=2469 No valid combination operation found for INFO field TLOD - the field will NOT be part of INFO fields in the generated VCF records
16:14:53.047 INFO CreateSomaticPanelOfNormals - Done initializing engine
16:14:53.061 INFO ProgressMeter - Starting traversal
16:14:53.061 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),0.0,Cpu time(s),0.0
16:14:54.444 INFO ProgressMeter - unmapped 0.0 0 0.0
16:14:54.444 INFO ProgressMeter - Traversal complete. Processed 0 total variants in 0.0 minutes.
16:14:54.447 INFO CreateSomaticPanelOfNormals - Shutting down engine
[July 2, 2021 4:14:54 PM CEST] org.broadinstitute.hellbender.tools.walkers.mutect.CreateSomaticPanelOfNormals done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2194669568
Using GATK jar /home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/rafael/miniconda3/share/gatk4-4.1.9.0-0/gatk-package-4.1.9.0-local.jar CreateSomaticPanelOfNormals -R /home/rafael/Documents/genomes/human/Homo_sapiens.GRCh38.p13.rel.102.dna.primary_assembly.fa -V gendb://db -O pon.vcf.gz
-
Hi Rafael Viana,
I think there might have been an issue with your mapping step first because many reads were filtered out during Mutect2:
3590243 read(s) filtered by: MappingQualityReadFilter
It looks like the reads are not mapped well to the reference you are using. Is there an issue with the reference you edited and are using?
Best,
Genevieve
-
Hi Genevieve,
Thank you for the response.To produce the exome we used commercial kits (Agilent Sure select v6).
Reads were mapped to Homo Sapiens GRCh38p13.rel102.dna.primary_assembly.
Qualimap finds that the x30 coverage in the WES region is around 97% for all cases, and
the mean mapping quality is consistently between 57.8 and 57.95
These samples are healthy individuals that were used in germinal trios (parents of sick kids).
A pipeline using other gatk tools like "HaplotypeCaller" -> "VariantRecalibrator" -> "VariantFiltration" was succesful.
The amount of final mapped reads are around 40 million (pair end 150bp), so I think losing 3M should not be so dramatic.
any idea about what van I do? lower quality thresholds?
Thank you for your time.
Kind regards -
Thanks for the clarification Rafael Viana. So then, it looks like the issue is coming up with GenomicsDBImport since there are no variants present in the input for CreateSomaticPanelofNormals:
16:14:54.444 INFO ProgressMeter - Traversal complete. Processed 0 total variants in 0.0 minutes.
Here is our usage guide for GenomicsDBImport: https://gatk.broadinstitute.org/hc/en-us/articles/360056138571-GDBI-usage-and-performance-guidelines
I would recommend decreasing the batch size and using the --genomicsdb-shared-posixfs-optimizations true argument if you are using a cluster or other shared file system. Did it run successfully without the --merge-input-intervals? If so, I would recommend trying it again with those other arguments and without merging intervals.
Best,
Genevieve
Please sign in to leave a comment.
3 comments