Introduction
The GATK-SV pipeline outputs structural variant records in VCF format. A structural variant (SV) VCF is very similar to a standard short variant VCF file with some key differences in order to fully describe the complexity of structural variants. We have a great article on our site that goes over the basics of a VCF: VCF - Variant Call Format. In this article, we are going to focus on the differences you will see in a SV VCF as compared to a short variant VCF.
GATK-SV follows VCF 4.2 specifications but has some differences. We are aiming to make a VCF that can be easy to understand with a human eye but also meets the spec and is machine readable. Note that there are multiple ways to represent breakend, complex events, and translocations. The format that meets the VCF specifications represents these events as multiple records. In our output, you will also see these events sometimes as one record with END
tags to describe the structural variant.
We are continuing to develop the pipeline and improve the VCF output so there might be changes in the future that are not fully reflected in this article. Please write to us on the GATK forum if you have questions about the current state of the GATK-SV VCF.
While VCFs are commonly used to describe structural variants with many different tools, please keep in mind that this article only applies to GATK-SV. You can read more about GATK-SV and how to get started on our Github repo.
Header
The header of the SV VCF has standard descriptions regarding what you will find in the VCF file. Here is an example of a SV VCF header:
Example of a VCF header
fileformat=VCFv4.2 ALT=<ID=BND,Description="Translocation"> ALT=<ID=CNV,Description="Copy Number Polymorphism"> ALT=<ID=CPX,Description="Complex SV"> ALT=<ID=CTX,Description="Reciprocal chromosomal translocation"> ALT=<ID=DEL,Description="Deletion"> ALT=<ID=DUP,Description="Duplication"> ALT=<ID=INS,Description="Insertion"> ALT=<ID=INS:ME,Description="Mobile element insertion of unspecified ME class"> ALT=<ID=INS:ME:ALU,Description="Alu element insertion"> ALT=<ID=INS:ME:LINE1,Description="LINE1 element insertion"> ALT=<ID=INS:ME:SVA,Description="SVA element insertion"> ALT=<ID=INS:UNK,Description="Sequence insertion of unspecified origin"> ALT=<ID=INV,Description="Inversion"> CPX_TYPE_INS_iDEL="Insertion with deletion at insertion site." CPX_TYPE_INVdel="Complex inversion with 3' flanking deletion." CPX_TYPE_INVdup="Complex inversion with 3' flanking duplication." CPX_TYPE_dDUP="Dispersed duplication." CPX_TYPE_dDUP_iDEL="Dispersed duplication with deletion at insertion site." CPX_TYPE_delINV="Complex inversion with 5' flanking deletion." CPX_TYPE_delINVdel="Complex inversion with 5' and 3' flanking deletions." CPX_TYPE_delINVdup="Complex inversion with 5' flanking deletion and 3' flanking duplication." CPX_TYPE_dupINV="Complex inversion with 5' flanking duplication." CPX_TYPE_dupINVdel="Complex inversion with 5' flanking duplication and 3' flanking deletion." CPX_TYPE_dupINVdup="Complex inversion with 5' and 3' flanking duplications." CPX_TYPE_piDUP_FR="Palindromic inverted tandem duplication, forward-reverse orientation." CPX_TYPE_piDUP_RF="Palindromic inverted tandem duplication, reverse-forward orientation." FILTER=<ID=BOTHSIDES_SUPPORT,Description="Variant has read-level support for both sides of breakpoint"> FILTER=<ID=HIGH_SR_BACKGROUND,Description="High number of SR splits in background samples indicating messy region"> FILTER=<ID=MULTIALLELIC,Description="Multiallelic site"> FILTER=<ID=PASS,Description="All filters passed"> FILTER=<ID=PESR_GT_OVERDISPERSION,Description="High PESR dispersion count"> FILTER=<ID=UNRESOLVED,Description="Variant is unresolved"> FORMAT=<ID=CN,Number=1,Type=Integer,Description="Predicted copy state"> FORMAT=<ID=CNQ,Number=1,Type=Integer,Description="Read-depth genotype quality"> FORMAT=<ID=EV,Number=1,Type=String,Description="Classes of evidence supporting final genotype"> FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> FORMAT=<ID=PE_GQ,Number=1,Type=Integer,Description="Paired-end genotype quality"> FORMAT=<ID=PE_GT,Number=1,Type=Integer,Description="Paired-end genotype"> FORMAT=<ID=RD_CN,Number=1,Type=Integer,Description="Predicted copy state"> FORMAT=<ID=RD_GQ,Number=1,Type=Integer,Description="Read-depth genotype quality"> FORMAT=<ID=SR_GQ,Number=1,Type=Integer,Description="Split read genotype quality"> FORMAT=<ID=SR_GT,Number=1,Type=Integer,Description="Split-read genotype"> INFO=<ID=ALGORITHMS,Number=.,Type=String,Description="Source algorithms"> INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate"> INFO=<ID=CPX_INTERVALS,Number=.,Type=String,Description="Genomic intervals constituting complex variant."> INFO=<ID=CPX_TYPE,Number=1,Type=String,Description="Class of complex variant."> INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant"> INFO=<ID=END2,Number=1,Type=Integer,Description="Position of breakpoint on CHR2"> INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend"> INFO=<ID=EVIDENCE,Number=.,Type=String,Description="Classes of random forest support."> INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends"> INFO=<ID=PREDICTED_BREAKEND_EXONIC,Number=.,Type=String,Description="Gene(s) for which the SV breakend is predicted to fall in an exon."> INFO=<ID=PREDICTED_COPY_GAIN,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a copy-gain effect."> INFO=<ID=PREDICTED_DUP_PARTIAL,Number=.,Type=String,Description="Gene(s) which are partially overlapped by an SV's duplication, but the transcription start site is not duplicated."> INFO=<ID=PREDICTED_INTERGENIC,Number=0,Type=Flag,Description="SV does not overlap coding sequence."> INFO=<ID=PREDICTED_INTRAGENIC_EXON_DUP,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to result in intragenic exonic duplication without breaking any coding sequences."> INFO=<ID=PREDICTED_INTRONIC,Number=.,Type=String,Description="Gene(s) where the SV was found to lie entirely within an intron."> INFO=<ID=PREDICTED_INV_SPAN,Number=.,Type=String,Description="Gene(s) which are entirely spanned by an SV's inversion."> INFO=<ID=PREDICTED_LOF,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a loss-of-function effect."> INFO=<ID=PREDICTED_MSV_EXON_OVERLAP,Number=.,Type=String,Description="Gene(s) on which the multiallelic SV would be predicted to have a LOF, INTRAGENIC_EXON_DUP, COPY_GAIN, DUP_PARTIAL, or PARTIAL_EXON_DUP annotation if the SV were biallelic."> INFO=<ID=PREDICTED_NEAREST_TSS,Number=.,Type=String,Description="Nearest transcription start site to intragenic variants."> INFO=<ID=PREDICTED_NONCODING_BREAKPOINT,Number=.,Type=String,Description="Class(es) of noncoding elements disrupted by SV breakpoint."> INFO=<ID=PREDICTED_NONCODING_SPAN,Number=.,Type=String,Description="Class(es) of noncoding elements spanned by SV."> INFO=<ID=PREDICTED_PARTIAL_EXON_DUP,Number=.,Type=String,Description="Gene(s) where the duplication SV has one breakpoint in the coding sequence."> INFO=<ID=PREDICTED_PROMOTER,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to overlap the promoter region."> INFO=<ID=PREDICTED_TSS_DUP,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to duplicate the transcription start site."> INFO=<ID=PREDICTED_UTR,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to disrupt a UTR."> INFO=<ID=SOURCE,Number=1,Type=String,Description="Source of inserted sequence."> INFO=<ID=STRANDS,Number=1,Type=String,Description="Breakpoint strandedness [++,+-,-+,--]"> INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length"> INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> INFO=<ID=UNRESOLVED_TYPE,Number=1,Type=String,Description="Class of unresolved variant."> bcftools_annotateCommand=annotate -a bad_ends.txt.gz -c CHROM,POS,REF,ALT,END integ_test.vcf.gz; Date=Wed Dec 15 22:00:46 2021 bcftools_annotateVersion=1.7+htslib-1.7 bcftools_concatCommand=concat -a --allow-overlaps --output-type z --file-list /cromwell_root/broad-methods-cromwell-exec-bucket-v51/GATKSVPipelineBatch/4b12afc9-8170-4fd3-8303-121bfe5671d7/call-Module0506/Module05_06/4657bd7d-24a5-452e-a3c6-4f7f07e23635/call-ConcatCleanedVcfs/write_lines_383e83615e0d0bd1e4649b6066165641.tmp --output ref_panel_1kg_v1.cleaned.vcf.gz; Date=Tue Oct 27 12:39:48 2020 bcftools_concatVersion=1.9+htslib-1.9 contig=<ID=chr1,length=248956422> contig=<ID=chr10,length=133797422> contig=<ID=chr11,length=135086622> contig=<ID=chr12,length=133275309> contig=<ID=chr13,length=114364328> contig=<ID=chr14,length=107043718> contig=<ID=chr15,length=101991189> contig=<ID=chr16,length=90338345> contig=<ID=chr17,length=83257441> contig=<ID=chr18,length=80373285> contig=<ID=chr19,length=58617616> contig=<ID=chr2,length=242193529> contig=<ID=chr20,length=64444167> contig=<ID=chr21,length=46709983> contig=<ID=chr22,length=50818468> contig=<ID=chr3,length=198295559> contig=<ID=chr4,length=190214555> contig=<ID=chr5,length=181538259> contig=<ID=chr6,length=170805979> contig=<ID=chr7,length=159345973> contig=<ID=chr8,length=145138636> contig=<ID=chr9,length=138394717> contig=<ID=chrX,length=156040895> contig=<ID=chrY,length=57227415>
Structure of variant call records - site level
CHROM and POS
The contig and position are in the same format of a standard short variant VCF file. A structural variant is represented as a closed interval and the start position represents the start of the interval. The end position of the structural variant appears as the END
annotation in the INFO
field.
For deletions and duplications, the position is the first base base of the deleted or duplicated interval. For insertions and inversions, the position is the base before the insertion or inversion. The end position is the base after the inserted sequence.
In single sample mode, translocations are broken up into 4 records describing the position of the translocation at all breakpoints. In cohort mode, a translocation is described with tags to describe the start and end positions on each chromosome. If the translocation involves an insertion, the position will be the base before the insertion. If the translocation involves a deletion, the position will be the first base of the deleted interval.
For breakends and complex variants, the structural variant types included in these variants will determine what the position will look like. A breakend could contain a second position if there is a translocation as a part of the breakend. For complex variants, the most reliable way to see the intervals involved is to look at the CPX_INTERVALS
tag. If the complex variant involves an insertion, this information will be in the position as the start and the end tag where the interval ends.
ID
GATK-SV outputs an ID
for each variant record. The ID
gives the sample or cohort name, the SV type, the chromosome, and number representing the count on the chromosome. These are all separated by underscores. If there is an M1
, M2
, etc, this is to indicate the mate records when single sample mode breaks up structural variants into multiple records. This is what the ID
looks like:
<sample or cohort name>_<SV type>_<chromosome>_<number>
REF and ALT
The reference field in the VCF has the allele before the start position of the structural variant. However, this is not a highly informative or used field and often has an N. In single sample mode, GATK-SV goes through the VCF and corrects the REF field, but this is not the case for cohort mode.
The ALT
field contains information about the structural variant type. The alt allele indicates the structural variant type. The structural variant types are described in the header and in our glossary document. In single sample mode when breakends and translocations are broken up into multiple records, the ALT
allele will display a symbolic ALT
allele. The symbolic allele format is described in detail in the VCF specifications.
FILTER
We are still actively working on the filters, so keep in mind that this information can change! The FILTER
field contains information regarding the structural variant. Some filters are more negative - they indicate weak support for the structural variants. Some variants indicate strong support for the structural variants. We are working on getting these positive filters integrated into the INFO
field soon. If the structural variant passes all the negative filters, there will be a PASS
in the filter column.
INFO
The INFO
field of the structural variant VCF contains variant level annotations. The information in the INFO
field is key for understanding the structural variant. All the descriptions for the INFO
field annotations are contained in the header but some important annotations are the endcap details in the END
, CHR2
, and END2
, the algorithm that originally called the variant with ALGORITHMS
, and the structural variant length and type in SVLEN
and SVTYPE
.
You can see examples of the INFO
field for various SV types in the example sites-only VCF file below. A sites-only VCF file contains the site level information and the header information but does not contain the genotype and sample-level information. The associated header for this sites-only VCF is the above header example.
Example of SV sites only VCF
CHROM POS ID REF ALT QUAL FILTER INFO chr2 86263976 ref_panel_1kg_v1_CTX_chr2_1 N <CTX> 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CHR2=chr19;CPX_TYPE=CTX_PP/QQ;END=86263977;END2=424309;EVIDENCE=PE,SR;PREDICTED_LOF=REEP1,SHC2;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;SVLEN=-1;SVTYPE=CTX chr2 86263976 ref_panel_1kg_v1_CTX_chr2_1_M1 G G]chr19:424309] 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CPX_TYPE=CTX_PP/QQ;END2=424309;EVENT=ref_panel_1kg_v1_CTX_chr2_1;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_CTX_chr2_1_M2;PREDICTED_LOF=REEP1;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;SVTYPE=BND chr2 86263977 ref_panel_1kg_v1_CTX_chr2_1_M3 A [chr19:424310[A 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CPX_TYPE=CTX_PP/QQ;END2=424309;EVENT=ref_panel_1kg_v1_CTX_chr2_1;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_CTX_chr2_1_M4;PREDICTED_LOF=REEP1;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;SVTYPE=BND chr19 424309 ref_panel_1kg_v1_CTX_chr2_1_M2 T T]chr2:86263976] 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CPX_TYPE=CTX_PP/QQ;END2=424309;EVENT=ref_panel_1kg_v1_CTX_chr2_1;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_CTX_chr2_1_M2;PREDICTED_LOF=SHC2;PREDICTED_NONCODING_BREAKPOINT=DNase;SVTYPE=BND chr19 424310 ref_panel_1kg_v1_CTX_chr2_1_M4 C [chr2:86263977[C 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CPX_TYPE=CTX_PP/QQ;END2=424309;EVENT=ref_panel_1kg_v1_CTX_chr2_1;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_CTX_chr2_1_M3;PREDICTED_LOF=SHC2;PREDICTED_NONCODING_BREAKPOINT=DNase;SVTYPE=BND chr19 21647331 ref_panel_1kg_v1_INV_chr19_3 N <INV> 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CHR2=chr19;END=22062458;EVIDENCE=PE,SR;PREDICTED_INV_SPAN=ZNF100,ZNF208,ZNF43;PREDICTED_LOF=ZNF257;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase,HAR;SVLEN=415127;SVTYPE=INV chr21 21407599 ref_panel_1kg_v1_DUP_chr21_85 N <DUP> 999 BOTHSIDES_SUPPORT ALGORITHMS=manta,wham;CHR2=chr21;END=21410401;EVIDENCE=PE,SR;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_PARTIAL_EXON_DUP=NCAM2;SVLEN=2802;SVTYPE=DUP chr21 26001843 ref_panel_1kg_v1_INV_chr21_1 N <INV> 999 BOTHSIDES_SUPPORT;PESR_GT_OVERDISPERSION ALGORITHMS=manta;CHR2=chr21;END=26002391;EVIDENCE=PE,SR;PREDICTED_INTRONIC=APP;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;SVLEN=548;SVTYPE=INV chr21 33504254 ref_panel_1kg_v1_INS_chr21_191 N <INS:ME:LINE1> 154 PASS ALGORITHMS=melt;CHR2=chr21;END=33504305;EVIDENCE=SR;PREDICTED_LOF=GART;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;SVLEN=6019;SVTYPE=INS chr21 39309541 ref_panel_1kg_v1_CPX_chr21_6 N <CPX> 346 HIGH_SR_BACKGROUND ALGORITHMS=manta;CHR2=chr21;CPX_INTERVALS=DUP_chr21:39309541-39309780,INV_chr21:39309541-39889653;CPX_TYPE=dupINV;END=39889653;EVIDENCE=PE;PREDICTED_INTRONIC=BRWD1;PREDICTED_INV_SPAN=B3GALT5,GET1,HMGN1,IGSF5,LCA5L,SH3BGR;PREDICTED_LOF=BRWD1,PCP4;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase,Enhancer;SVLEN=580112;SVTYPE=CPX chr21 46169277 ref_panel_1kg_v1_CNV_chr21_25 N <CNV> 973 MULTIALLELIC ALGORITHMS=manta;CHR2=chr21;END=46170977;EVIDENCE=PE;PREDICTED_INTRONIC=SPATC1L;PREDICTED_NONCODING_BREAKPOINT=DNase;SVLEN=1700;SVTYPE=CNV chr22 10510000 ref_panel_1kg_v1_DEL_chr22_1 N <DEL> 999 PASS ALGORITHMS=depth;CHR2=chr22;END=10694100;EVIDENCE=RD;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=OR11H1;PREDICTED_NONCODING_SPAN=DNase;SVLEN=184100;SVTYPE=DEL chr22 10717890 ref_panel_1kg_v1_BND_chr22_1 N <BND> 999 BOTHSIDES_SUPPORT;PESR_GT_OVERDISPERSION;UNRESOLVED ALGORITHMS=wham;CHR2=chr22;END=10717890;EVIDENCE=PE,SR;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=OR11H1;PREDICTED_NONCODING_BREAKPOINT=DNase;STRANDS=-+;SVLEN=5170;SVTYPE=BND;UNRESOLVED_TYPE=MIXED_BREAKENDS chr22 17404365 ref_panel_1kg_v1_DEL_chr22_59 N <DEL> 878 BOTHSIDES_SUPPORT ALGORITHMS=manta,wham;CHR2=chr22;END=17404672;EVIDENCE=PE,RD,SR;PREDICTED_INTRONIC=CECR2;SVLEN=307;SVTYPE=DEL chr22 17567669 ref_panel_1kg_v1_INS_chr22_9 N <INS:ME:ALU> 999 HIGH_SR_BACKGROUND ALGORITHMS=melt;CHR2=chr22;END=17567720;EVIDENCE=SR;PREDICTED_INTRONIC=SLC25A18;SVLEN=281;SVTYPE=INS chr22 17577704 ref_panel_1kg_v1_BND_chr22_6 N <BND> 487 BOTHSIDES_SUPPORT;UNRESOLVED ALGORITHMS=manta;CHR2=chr22;END=17577704;END2=20098034;EVIDENCE=PE,SR;PREDICTED_INTRONIC=DGCR8,SLC25A18;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=++;SVLEN=2520330;SVTYPE=BND;UNRESOLVED_TYPE=INVERSION_SINGLE_ENDER_++ chr22 17636024 ref_panel_1kg_v1_BND_chr22_7 N <BND> 666 HIGH_SR_BACKGROUND;UNRESOLVED ALGORITHMS=manta;CHR2=chr22;END=17636024;EVIDENCE=SR;PREDICTED_INTRONIC=BCL2L13;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;STRANDS=+-;SVLEN=10709;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+- chr22 18081154 ref_panel_1kg_v1_DUP_chr22_12 N <DUP> 912 BOTHSIDES_SUPPORT ALGORITHMS=manta,wham;CHR2=chr22;END=18081258;EVIDENCE=RD,SR;PREDICTED_INTRONIC=PEX26;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;SVLEN=104;SVTYPE=DUP chr22 18176010 ref_panel_1kg_v1_DUP_chr22_13 N <DUP> 139 PASS ALGORITHMS=depth;CHR2=chr22;END=18239129;EVIDENCE=RD;PREDICTED_DUP_PARTIAL=USP18;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;SVLEN=63119;SVTYPE=DUP chr22 18488800 ref_panel_1kg_v1_DUP_chr22_19 N <DUP> 999 PASS ALGORITHMS=depth;CHR2=chr22;END=18645500;EVIDENCE=RD;PREDICTED_COPY_GAIN=GGTLC3,RIMBP3,TMEM191B;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;SVLEN=156700;SVTYPE=DUP chr22 18971159 ref_panel_1kg_v1_CPX_chr22_1 N <CPX> 999 PASS ALGORITHMS=manta;CHR2=chr22;CPX_INTERVALS=INV_chr22:20267228-20267614,DUP_chr22:20267228-20267614;CPX_TYPE=dDUP;END=18971435;EVIDENCE=PE;PREDICTED_INTRONIC=RTN4R;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;SOURCE=DUP_chr22:20267228-20267614;SVLEN=386;SVTYPE=CPX chr22 19448434 ref_panel_1kg_v1_INS_chr22_24 N <INS:ME:SVA> 384 HIGH_SR_BACKGROUND ALGORITHMS=melt;CHR2=chr22;END=19448485;EVIDENCE=SR;PREDICTED_INTERGENIC;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_PROMOTER=C22orf39;SVLEN=405;SVTYPE=INS chr22 21415564 ref_panel_1kg_v1_DUP_chr22_37 N <DUP> 139 PASS ALGORITHMS=depth;CHR2=chr22;END=21423564;EVIDENCE=RD;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;PREDICTED_TSS_DUP=HIC2;SVLEN=8000;SVTYPE=DUP chr22 21921187 ref_panel_1kg_v1_DUP_chr22_40 N <DUP> 974 BOTHSIDES_SUPPORT ALGORITHMS=wham;CHR2=chr22;END=21921270;EVIDENCE=RD,SR;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;PREDICTED_UTR=PPM1F;SVLEN=83;SVTYPE=DUP chr22 22120897 ref_panel_1kg_v1_BND_chr22_14 N <BND> 447 UNRESOLVED ALGORITHMS=manta;CHR2=chrX;END=22120897;END2=126356858;EVIDENCE=PE;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=DCAF12L2,VPREB1;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=++;SVLEN=-1;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_++ chr22 22322969 ref_panel_1kg_v1_DEL_chr22_119 N <DEL> 363 BOTHSIDES_SUPPORT ALGORITHMS=manta;CHR2=chr22;END=22904989;EVIDENCE=BAF,PE,RD,SR;PREDICTED_LOF=GGTLC2,IGLL5,PRAME,ZNF280A,ZNF280B;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;SVLEN=582020;SVTYPE=DEL chr22 22486323 ref_panel_1kg_v1_DEL_chr22_124 N <DEL> 999 BOTHSIDES_SUPPORT ALGORITHMS=manta,wham;CHR2=chr22;END=22486414;EVIDENCE=SR;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;PREDICTED_UTR=ZNF280B;SVLEN=91;SVTYPE=DEL chr22 22636515 ref_panel_1kg_v1_BND_chr22_27 N <BND> 302 UNRESOLVED ALGORITHMS=manta;CHR2=chr22;END=22636515;EVIDENCE=PE;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;PREDICTED_UTR=BCR;STRANDS=-+;SVLEN=679426;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_-+ chr22 22857058 ref_panel_1kg_v1_BND_chr22_33 N <BND> 710 BOTHSIDES_SUPPORT;UNRESOLVED ALGORITHMS=manta;CHR2=chr22;END=22857058;EVIDENCE=PE,SR;PREDICTED_BREAKEND_EXONIC=IGLL5;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;STRANDS=+-;SVLEN=36722;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+- chr22 22857058 ref_panel_1kg_v1_BND_chr22_33_M1 A A[chr22:22893780[ 710 BOTHSIDES_SUPPORT;UNRESOLVED ALGORITHMS=manta;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_BND_chr22_33_M2;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=IGLL5;PREDICTED_NONCODING_BREAKPOINT=DNase,Tommerup_TADanno;STRANDS=+-;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+- chr22 22893780 ref_panel_1kg_v1_BND_chr22_33_M2 G ]chr22:22857058]G 710 BOTHSIDES_SUPPORT;UNRESOLVED ALGORITHMS=manta;EVIDENCE=PE,SR;MATEID=ref_panel_1kg_v1_BND_chr22_33_M1;PREDICTED_BREAKEND_EXONIC=IGLL5;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;STRANDS=+-;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+- chr22 23620990 ref_panel_1kg_v1_DUP_chr22_48 N <DUP> 830 PASS ALGORITHMS=manta;CHR2=chr22;END=23625251;EVIDENCE=PE;PREDICTED_INTRAGENIC_EXON_DUP=DRICH1;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;SVLEN=4261;SVTYPE=DUP chr22 29767548 ref_panel_1kg_v1_DEL_chr22_204 N <DEL> 999 BOTHSIDES_SUPPORT ALGORITHMS=manta,wham;CHR2=chr22;END=29769676;EVIDENCE=PE,SR;PREDICTED_LOF=UQCR10;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;PREDICTED_PROMOTER=ZMAT5;SVLEN=2128;SVTYPE=DEL chr22 36533058 ref_panel_1kg_v1_CPX_chr22_3 N <CPX> 999 BOTHSIDES_SUPPORT ALGORITHMS=manta;CHR2=chr22;CPX_INTERVALS=DUP_chr22:36533058-36533299,INV_chr22:36533058-36538234;CPX_TYPE=dupINV;END=36538234;EVIDENCE=PE,SR;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=EIF3D;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;SVLEN=5176;SVTYPE=CPX chr22 42123564 ref_panel_1kg_v1_CNV_chr22_16 N <CNV> 999 MULTIALLELIC ALGORITHMS=depth;CHR2=chr22;END=42140000;EVIDENCE=BAF,RD;PREDICTED_MSV_EXON_OVERLAP=CYP2D6;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_NONCODING_SPAN=DNase;SVLEN=16436;SVTYPE=CNV chr22 45240793 ref_panel_1kg_v1_INS_chr22_174 N <INS> 154 PASS ALGORITHMS=manta;CHR2=chr22;END=45240836;EVIDENCE=SR;PREDICTED_NONCODING_BREAKPOINT=Tommerup_TADanno;PREDICTED_UTR=KIAA0930;SVLEN=133;SVTYPE=INS
Interpreting genotype and other sample-level information
The genotype fields in the record will depend on the SV type and the evidence categories supporting the SV. Descriptions of the annotations at the genotype level will all be contained in the VCF header. Some of these annotations are similar to short variant annotations. Others are specific to a certain evidence category, such as paired-end genotype (PE_GT
) and paired-end genotype quality (PE_GQ
). These annotations are the genotype and genotype quality when only looking at the specific evidence type.
Notes regarding genotyping Multiallelic CNVs: Because multiple combinations of alleles can give rise to the same copy state, it is challenging to ascertain the genotype for MCNVs. For example, a diploid site exhibiting 4 copies could be explained by a reference allele with a 3-copy duplication allele (1 + 3 = 4), two 2-copy duplication alleles (2 + 2 = 4), or even a 4-copy duplication allele with a deletion allele (4 + 0 = 4). This is why we only report a copy number in the CN field for MCNVs rather than a genotype. See the Structural Variant Glossary article for more information.
0 comments
Please sign in to leave a comment.