gCNV SVLEN question
Hi I added this in github but this may be a better place for it:
---------------------
I am running GermlineCNVCaller and PostprocessGermlineCNVCalls (GATK v4.2.5) for CNV analysis on our targeted capture.
My output segment vcfs have no SVLEN or SVTYPE values although those are described in their headers.
Info from header includes:
##INFO=<ID=AC_Orig,Number=A,Type=Integer,Description="Original AC">
##INFO=<ID=AF_Orig,Number=A,Type=Float,Description="Original AF">
##INFO=<ID=AN_Orig,Number=1,Type=Integer,Description="Original AN">
##INFO=<ID=END,Number=1,Type=Integer,Description="End coordinate of the variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
But the actual vcf output only has the END variable.
Example output:
13 32839931 CNV_13_32839931_32945267 N . 3076.53 . END=32945267 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:63:169:3077:523:342
13 32950659 CNV_13_32950659_32954345 N <DEL> 3076.53 . END=32954345 GT:CN:NP:QA:QS:QSE:QSS 0/1:1:7:709:3077:709:831
13 32968699 CNV_13_32968699_73961012 N . 3076.53 . END=73961012 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:14:210:3077:295:630
14 24883828 CNV_14_24883828_94854954 N . 3076.53 . END=94854954 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:72:100:3077:287:299
15 32992921 CNV_15_32992921_91535389 N . 3076.53 . END=91535389 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:35:102:3077:198:331
Commands running are below:
docker run -v /home/dnanexus/inputs:/data $GATK_image gatk GermlineCNVCaller \
-L /data/beds/filtered.interval_list -imr OVERLAPPING_ONLY \
--annotated-intervals /data/beds/annotated_intervals.tsv \
--run-mode COHORT \
$batch_input \
--contig-ploidy-calls /data/ploidy-dir/ploidy-calls/ \
--output-prefix CNV \
-O /data/gCNV-dir
parallel --jobs 8 '/usr/bin/time -v docker run -v /home/dnanexus/inputs:/data $GATK_image \
gatk PostprocessGermlineCNVCalls \
--sample-index {} \
--autosomal-ref-copy-number 2 \
--allosomal-contig X \
--allosomal-contig Y \
--contig-ploidy-calls /data/ploidy-dir/ploidy-calls \
--calls-shard-path /data/gCNV-dir/CNV-calls \
--model-shard-path /data/gCNV-dir/CNV-model \
--output-genotyped-intervals /data/vcfs/sample_{}_intervals.vcf \
--output-genotyped-segments /data/vcfs/sample_{}_segments.vcf \
--output-denoised-copy-ratios /data/vcfs/sample_{}_denoised_copy_ratios.tsv
Are there any options to output SVLEN or is it expected to be in the vcf output?
I understand it is the different between ref and alt described in the header but will it ever be outputted? The ref in our output is always N.
Many thanks,
Adriana :)
-
Thank you for your post, Adriana ! I want to let you know we have received your question. We'll get back to you if we have any updates or follow up questions.
Please see our Support Policy for more details about how we prioritize responding to questions.
-
Hi Adriana!
It looks like I was the one who added those to the GCNV segments VCF header. :-) PostprocessGermlineCNVCalls also gets used in the GCNV joint calling pipeline, which is where those annotations actually have informative values. (They get added by the JointGermlineCNVSegmentation tool.) So, no, you shouldn't expect those annotations to show up if you're just running on a single sample. I'll see if there's a way I can limit the header output to just the multi-sample model to avoid confusion in the future. I'm glad to see GCNV is getting some use outside the Broad!
Best,
Laura
-
P.S. I think you change those Ns to the actual reference base at that position if you supply a reference fasta with the -R argument. N is showing up in the REF column of the VCF and I expect the INFO field to have just "END"s.
-
Hi Adriana,
We haven't heard from you in a while so we're going to close out this ticket. If you still require assistance, simply respond to this email and we'll be happy to pick up where we left off!
Kind regards,
Anthony
Please sign in to leave a comment.
4 comments