Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Need help with converting VCF to SEG file for CNV visualization

Answered
0

3 comments

  • Avatar
    Laura Gauthier

    Hi Sophia

    I'm not a bash ninja, but I think there's a format conversion bug in the code on the forum.  I suspect "&gt" is supposed to be ">" so that the output of awk is directed to an output file called genotyped-segments-cohort_47_RUN2.seg and then we run "head" on that file to double check that it looks as expected.  The fact that it's outputting all the CNVs to stdout is more support for this theory.  Try converting any "&gt" instances to ">"s and let me know what happens.  If that fix works then we'll update the gCNV article.

    -Laura

    0
    Comment actions Permalink
  • Avatar
    Sophia

    That seems to have worked! Here's the exact code I used to be clear:

    sampleName=$(zcat genotyped-segments-cohort_47_RUN2.vcf.gz | grep -v '##' | head -n1 | cut -f10)

    awk -v sampleName=$sampleName 'BEGIN {FS=OFS="\t"} {print sampleName, $0}' genotyped-segments-cohort_47_RUN2.table.txt > genotyped-segments-cohort_47_RUN2.seg; head genotyped-segments-cohort_47_RUN2.seg

    I want to point out I had to use "zcat" instead of "gzcat" as the article says to use, but that may be because I am on a Linux system.

    Thanks again for the help!

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    Great news!  I will make the changes in the gCNV article so this doesn't happen to anyone else.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk