Need help with converting VCF to SEG file for CNV visualization
AnsweredHi,
I have gotten to the last step of the germline CNV calling and am now trying to visualize the results. It was recommended in the After CNV calling considerations to create a SEG file to color the copy number states. I am trying to follow the Unix shell commands suggested to convert to SEG format, but I am getting errors that I am not sure how to interpret. It is obviously a problem with gt given by the error but since I'm not sure exactly what the commands are doing I do not know how to fix the issue. I used GATK 4.2.0.0 to generate the files.
b) Exact command used:
sampleName=$(zcat genotyped-segments-cohort_47_RUN2.vcf.gz | grep -v '##' | head -n1 | cut -f10)
awk -v sampleName=$sampleName 'BEGIN {FS=OFS="\t"} {print sampleName, $0}' genotyped-segments-cohort_47_RUN2.table.txt> genotyped-segments-cohort_47_RUN2.seg; head genotyped-segments-cohort_47_RUN2.seg
c) Entire program log:
[1] 63227
S000021_S5041Nr53 CHROM POS END S000021_S5041Nr53.NP S000021_S5041Nr53.CN
S000021_S5041Nr53 chr1 923176 2430909 567 2
< gives all CNVs called >
gt: error: neither tool nor script specified; option -help lists possible tools
[1]+ Done awk -v sampleName=$sampleName 'BEGIN {FS=OFS="\t"} {print sampleName, $0}' genotyped-segments-cohort_47_RUN2.table.txt
genotyped-segments-cohort_47_RUN2.seg: command not found
head: cannot open 'genotyped-segments-cohort_47_RUN2.seg' for reading: No such file or directory
-
Hi Sophia
I'm not a bash ninja, but I think there's a format conversion bug in the code on the forum. I suspect ">" is supposed to be ">" so that the output of awk is directed to an output file called genotyped-segments-cohort_47_RUN2.seg and then we run "head" on that file to double check that it looks as expected. The fact that it's outputting all the CNVs to stdout is more support for this theory. Try converting any ">" instances to ">"s and let me know what happens. If that fix works then we'll update the gCNV article.
-Laura
-
That seems to have worked! Here's the exact code I used to be clear:
sampleName=$(zcat genotyped-segments-cohort_47_RUN2.vcf.gz | grep -v '##' | head -n1 | cut -f10)
awk -v sampleName=$sampleName 'BEGIN {FS=OFS="\t"} {print sampleName, $0}' genotyped-segments-cohort_47_RUN2.table.txt > genotyped-segments-cohort_47_RUN2.seg; head genotyped-segments-cohort_47_RUN2.seg
I want to point out I had to use "zcat" instead of "gzcat" as the article says to use, but that may be because I am on a Linux system.
Thanks again for the help!
-
Great news! I will make the changes in the gCNV article so this doesn't happen to anyone else.
Please sign in to leave a comment.
3 comments