ERROR MESSAGE: No gender data for sample:sample7
hello
for Glycine.max,a self-pollinated pants and has no genders.
when run svdiscovery pipeline,always error:No gender data,but when i add -genderMapFile $genderfile ,no error ,
but when i run svdiscovery pipeline for some sample with no -genderMapFile,no error
so -genderMapFile is required or not ,
There is a script about rice, also self-bred, and then add gender,like this ,
LGW1.reheader 1
LGW2.reheader 1
I am very confused,why error for gender ,and then ? thanks
###gender.map
Z_045 1
Z_046 1
Z_047 1
Z_048 1
##script
runDir=./chr$i/RUN
SV_TMPDIR=./chr$i/tmpdir
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx32g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R $reference \
-ploidyMapFile $mapfile \
-genomeMaskFile $svmask \
-genderMapFile $genderfile \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-jobLogDir ${runDir}/logs \
-I $inputfile \
-L $i \
-bamFilesAreDisjoint false \
-run
###script.log
ERROR 14:38:00,870 FunctionEdge - Error: 'java' '-Xmx3072m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/home/termius/data/svtoolkit/test2/chr1/tmpdir' '-cp' '/home/termius/software/svtoolkit//lib/SVToolkit.jar:/home/termius/software/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/home/termius/software/svtoolkit//lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVDiscovery '-T' 'SVDiscoveryWalker' '-R' '/home/termius/data/svtoolkit/prepare/Glycine_max.Glycine_max_v2.0.dna.chromosome.1.fa' '-I' '/home/termius/data/svtoolkit/test2/input.list' '-O' '/home/termius/data/svtoolkit/test2/chr1/disc/chr1.unfiltered.vcf' '-disableGATKTraversal' 'true' '-md' './chr1/RUN/metadata' '-configFile' '/home/termius/software/svtoolkit/conf/genstrip_parameters.txt' '-runDirectory' './chr1/disc' '-genderMapFile' './chr1/RUN/metadata/sample_gender.report.txt' '-L' '1' '-runFilePrefix' 'chr1' '-searchLocus' '1' '-searchWindow' '1' '-searchMinimumSize' '100' '-searchMaximumSize' '1000000' '-storeReadPairFile' 'true'
ERROR 14:38:00,879 FunctionEdge - Contents of /home/termius/data/svtoolkit/test2/chr1/disc/logs/SVDiscovery-1.out:
INFO 14:37:36,523 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 14:37:36,524 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7.GS-r1941-0-gb493839, Compiled 2020/01/21 11:34:26
INFO 14:37:36,525 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 14:37:36,525 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 14:37:36,525 HelpFormatter - [Wed Aug 05 14:37:36 CST 2020] Executing on Linux 2.6.32-642.el6.x86_64 amd64
INFO 14:37:36,525 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14
INFO 14:37:36,528 HelpFormatter - Program Args: -T SVDiscoveryWalker -R /home/termius/data/svtoolkit/prepare/Glycine_max.Glycine_max_v2.0.dna.chromosome.1.fa -O /home/termius/data/svtoolkit/test2/chr1/disc/chr1.unfiltered.vcf -disableGATKTraversal true -md ./chr1/RUN/metadata -configFile /home/termius/software/svtoolkit/conf/genstrip_parameters.txt -runDirectory ./chr1/disc -genderMapFile ./chr1/RUN/metadata/sample_gender.report.txt -L 1 -runFilePrefix chr1 -searchLocus 1 -searchWindow 1 -searchMinimumSize 100 -searchMaximumSize 1000000 -storeReadPairFile true
INFO 14:37:36,530 HelpFormatter - Executing as ldl20190322@compute35 on Linux 2.6.32-642.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14.
INFO 14:37:36,531 HelpFormatter - Date/Time: 2020/08/05 14:37:36
INFO 14:37:36,531 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 14:37:36,531 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 14:37:36,535 05-Aug-2020 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:37:36,603 05-Aug-2020 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 14:37:36,621 05-Aug-2020 IntervalUtils - Processing 56831395 bp from intervals
INFO 14:37:36,667 05-Aug-2020 GenomeAnalysisEngine - Preparing for traversal
INFO 14:37:36,670 05-Aug-2020 GenomeAnalysisEngine - Done preparing for traversal
INFO 14:37:36,670 05-Aug-2020 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 14:37:36,670 05-Aug-2020 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 14:37:36,671 05-Aug-2020 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
INFO 14:37:36,671 05-Aug-2020 SVDiscovery - Initializing SVDiscovery ...
INFO 14:37:36,671 05-Aug-2020 SVDiscovery - Reading configuration file ...
INFO 14:37:36,674 05-Aug-2020 SVDiscovery - Read configuration file.
INFO 14:37:36,674 05-Aug-2020 SVDiscovery - Opening reference sequence ...
INFO 14:37:36,674 05-Aug-2020 SVDiscovery - Opened reference sequence.
INFO 14:37:36,674 05-Aug-2020 SVDiscovery - Initializing input data set ...
INFO 14:37:36,694 05-Aug-2020 SVDiscovery - Initialized data set: 4 files, 4 read groups, 4 samples.
INFO 14:37:36,695 05-Aug-2020 MetaData - Opening metadata ...
INFO 14:37:36,695 05-Aug-2020 MetaData - Adding metadata location ./chr1/RUN/metadata ...
INFO 14:37:36,696 05-Aug-2020 MetaData - Opened metadata.
INFO 14:37:36,697 05-Aug-2020 SVDiscovery - Opened metadata.
INFO 14:37:36,701 05-Aug-2020 MetaData - Loading insert size distributions ...
INFO 14:37:36,743 05-Aug-2020 SVDiscovery - Processing locus: 1:0-0:100-1000000
INFO 14:37:36,743 05-Aug-2020 SVDiscovery - Locus search window: 1:0-0
INFO 14:37:58,209 05-Aug-2020 SVDiscovery - Discovery alt home filtering is disabled.
INFO 14:37:58,846 05-Aug-2020 SVDiscovery - Processing clusters ...
##### ERROR --
##### ERROR stack trace
java.lang.RuntimeException: No gender data for sample: Z_047
at org.broadinstitute.sv.discovery.ClusterMembershipModule.init(ClusterMembershipModule.java:180)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.createMembershipModule(DeletionDiscoveryAlgorithm.java:1485)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.initClusterModules(DeletionDiscoveryAlgorithm.java:1362)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processClusters(DeletionDiscoveryAlgorithm.java:436)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:222)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:110)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:41)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:145)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:95)
at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)
-
Hi termius, what version of GATK are you running? And what tool are you running? We recommend that you submit GATK commands using the gatk wrapper script: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
-
using svdiscovery pipeline for version svtoolkit_2.00.1949
-
Bob Handsaker this question is for you
-
The high level answer is that I believe the genderMapFile is not required, but if you provide one, then it must have an entry for every sample in your cohort.
If you are getting an error when providing no genderMapFile, feel free to post it and I will take a look.
If you want to provide a "fake" gender map file for your plant samples, that is fine too. It looks like you are using ./chr1/RUN/metadata/sample_gender.report.txt. What are the contents of that file?
-
for ./chr1/RUN/metadata/sample_gender.report.txt ,It's an empty file, nothing.
-
I would try removing this argument and not passing an (empty) gender map.
-
Hello Bob Handsaker,
I am encountering the same error: ERROR stack trace
java.lang.RuntimeException: No gender data for sample: 20My gender.map file is actually not empty and my assumption is that I'm using the wrong gender map file.
I was wondering if there was a gender map file available for the HG002.hs37d5 file that I could use?
Thank you for your help!
-
According to Coriell, HG002 (NA24385) is male.
https://www.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24385&Product=DNA
-
What would be the format of the gender map file for human data & How would a male gender map file look like? I have an example of a gender map file for mouse data and has two columns with ones & twos -- does 1 represent male & 2 for female?
I appreciate your help!
-
I thought this was documented somewhere, but looks like it isn't!
The file should be tab delimited, with a header containing at least two column names "SAMPLE" and "GENDER" (they can be any of the columns and other columns can be included, when GS generates one via CallSampleGender it includes extra columns).
The values for GENDER an be M, Male or 1 for male, F, Female or 2 for female. NA is also allowed for samples with unknown or atypical sex chromosome status.
-
The SAMPLE column must match the sample name (from the @RG:SM tag) in the input bam/cram files.
-
This is very helpful! Thank you so much!
-
Hi Bob Handsaker,
I created the map file following your instructions. Here's a view of it:
HG002 1
However, I'm still getting the same error: ERROR stack trace
java.lang.RuntimeException: No gender data for sample: 20.Do you have any suggestions on what I can do? Appreciate your help!
-
Presumably you also included a header line in the file, as described.
Do you have sample with ID "20" ? That's the sample the error is about, not "HG002".
-
I was able to resolve my issue! Thank you for your help.
Please sign in to leave a comment.
15 comments